Monday, March 08, 2010

ORA-15041 - ASM Disk Group Unbalanced

Use ASM Disk Group NORMAL redundancy (11gR2) and found error while create tablespace.
SQL> create tablespace test datafile '+DATA' size 5G;
create tablespace test datafile '+DATA' size 5G
*
ERROR at line 1:
ORA-01119: error in creating database file '+DATA'
ORA-17502: ksfdcre:4 Failed to create file +DATA
ORA-15041: diskgroup "DATA" space exhausted
What Wrong??? - I have free size on "DATA", Check ASM Disk Group.
SQL> select name, total_mb, free_mb, required_mirror_free_mb, usable_file_mb ,type from v$asm_diskgroup where name='DATA';

NAME TOTAL_MB FREE_MB REQUIRED_MIRROR_FREE_MB USABLE_FILE_MB TYPE
------------------------------ ---------- ---------- ----------------------- -------------- ------
DATA 257807 112233 23437 44398 NORMAL
Check (ASM) alert log file.
kfdpDumpBg()
kfdpDumpBg() - Done
Check on ASM Instance and Investigate the problem.
SQL> select path, free_mb,total_mb,state from v$asm_disk where group_number in (select group_number from v$asm_diskgroup where name='DATA');

PATH FREE_MB TOTAL_MB STATE
------------------- ---------- ---------- --------
/dev/sdl 1017 23437 NORMAL
/dev/sdb 7867 23437 NORMAL
/dev/sdk 7988 23437 NORMAL
/dev/sdq 23211 23437 NORMAL
/dev/sdf 7864 23437 NORMAL
/dev/sdg 14819 23437 NORMAL
/dev/sdc 13827 23437 NORMAL
/dev/sdi 6971 23437 NORMAL
/dev/sde 14829 23437 NORMAL
/dev/sdj 1 23437 NORMAL
/dev/sdd 13839 23437 NORMAL
Disk Group UNBALANCE, then
SQL> alter diskgroup data rebalance power 11;

Diskgroup altered.
Check.
SQL> select path, free_mb,total_mb,state from v$asm_disk where group_number in (select group_number from v$asm_diskgroup where name='DATA');

PATH FREE_MB TOTAL_MB STATE
------------------- ---------- ---------- --------
/dev/sdl 1018 23437 NORMAL
/dev/sdb 7867 23437 NORMAL
/dev/sdk 7988 23437 NORMAL
/dev/sdq 23209 23437 NORMAL
/dev/sdf 7864 23437 NORMAL
/dev/sdg 14820 23437 NORMAL
/dev/sdc 13827 23437 NORMAL
/dev/sdi 6971 23437 NORMAL
/dev/sde 14830 23437 NORMAL
/dev/sdj 0 23437 NORMAL
/dev/sdd 13839 23437 NORMAL
Nothing to resolve... Check on v$operation.
SQL> select group_number, operation, state, error_code from v$asm_operation;

GROUP_NUMBER OPERA STAT ERROR_CODE
------------ ----- ---- --------------------------------------------
1 REBAL ERRS ORA-15041
Try to check & repair ASM Disk Group.
SQL> alter diskgroup data check all repair;

Diskgroup altered.

SQL> select group_number, operation, state, error_code from v$asm_operation;

GROUP_NUMBER OPERA STAT ERROR_CODE
------------ ----- ---- --------------------------------------------
1 REBAL ERRS ORA-15041

SQL> select path, free_mb,total_mb,state from v$asm_disk where group_number in (select group_number from v$asm_diskgroup where name='DATA');

PATH FREE_MB TOTAL_MB STATE
------------------- ---------- ---------- --------
/dev/sdl 1018 23437 NORMAL
/dev/sdb 7867 23437 NORMAL
/dev/sdk 7988 23437 NORMAL
/dev/sdq 23209 23437 NORMAL
/dev/sdf 7864 23437 NORMAL
/dev/sdg 14820 23437 NORMAL
/dev/sdc 13827 23437 NORMAL
/dev/sdi 6971 23437 NORMAL
/dev/sde 14830 23437 NORMAL
/dev/sdj 0 23437 NORMAL
/dev/sdd 13839 23437 NORMAL
Nothing to resolve... - "DATA" ASM Disk Group still unbalance, and v$asm_operation show error.
think & make Idea: Move some files or resize some files on "DATA" Disk Group, so login database and make something.
Example:
SQL> alter database tempfile '+DATA/orcl/tempfile/temp.263.712603677' resize 1G;

Database altered.
Check on ASM again.
SQL> select path, free_mb,total_mb,state from v$asm_disk where group_number in (select group_number from v$asm_diskgroup where name='DATA');

PATH FREE_MB TOTAL_MB STATE
------------------- ---------- ---------- --------
/dev/sdl 1685 23437 NORMAL
/dev/sdb 8322 23437 NORMAL
/dev/sdk 8357 23437 NORMAL
/dev/sdq 22855 23437 NORMAL
/dev/sdf 8324 23437 NORMAL
/dev/sdg 15013 23437 NORMAL
/dev/sdc 14043 23437 NORMAL
/dev/sdi 7386 23437 NORMAL
/dev/sde 15011 23437 NORMAL
/dev/sdj 711 23437 NORMAL
/dev/sdd 14054 23437 NORMAL
Check v$asm_operation.
SQL> select group_number, operation, state, power, error_code from v$asm_operation;

GROUP_NUMBER OPERA STAT ERROR_CODE
------------ ----- ---- --------------------------------------------
1 REBAL RUN
ASM instance... "DATA" Disk Group rebalancing, So Wait... and check
SQL> select group_number, operation, state, power, error_code from v$asm_operation;

no rows selected
SQL> select path, free_mb,total_mb,state from v$asm_disk where group_number in (select group_number from v$asm_diskgroup where name='DATA');

PATH FREE_MB TOTAL_MB STATE
------------------- ---------- ---------- --------
/dev/sdl 10524 23437 NORMAL
/dev/sdb 10524 23437 NORMAL
/dev/sdk 10528 23437 NORMAL
/dev/sdq 10532 23437 NORMAL
/dev/sdf 10521 23437 NORMAL
/dev/sdg 10523 23437 NORMAL
/dev/sdc 10522 23437 NORMAL
/dev/sdi 10526 23437 NORMAL
/dev/sde 10522 23437 NORMAL
/dev/sdj 10522 23437 NORMAL
/dev/sdd 10523 23437 NORMAL

SQL> alter diskgroup data rebalance power 1;

Diskgroup altered
Disks rebalanced, Try to create tablespace again.
SQL> select name, total_mb, free_mb, required_mirror_free_mb, usable_file_mb ,type from v$asm_diskgroup where name='DATA';

NAME TOTAL_MB FREE_MB REQUIRED_MIRROR_FREE_MB USABLE_FILE_MB TYPE
------------------------------ ---------- ---------- ----------------------- -------------- ------
DATA 257807 115767 12916 51425 NORMAL

SQL> create tablespace test datafile '+DATA' size 30G;

Tablespace created.

SQL> select path, free_mb,total_mb,state from v$asm_disk where group_number in (select group_number from v$asm_diskgroup where name='DATA');

PATH FREE_MB TOTAL_MB STATE
------------------- ---------- ---------- --------
/dev/sdl 4936 23437 NORMAL
/dev/sdb 4931 23437 NORMAL
/dev/sdk 4931 23437 NORMAL
/dev/sdq 4932 23437 NORMAL
/dev/sdf 4922 23437 NORMAL
/dev/sdg 4933 23437 NORMAL
/dev/sdc 4936 23437 NORMAL
/dev/sdi 4940 23437 NORMAL
/dev/sde 4933 23437 NORMAL
/dev/sdj 4928 23437 NORMAL
/dev/sdd 4938 23437 NORMAL
What this told me? Make Sure ASM Disk Group no problem (REBALANCE).

20 comments:

sandeep said...

Hi Surachart,

Thanks a lot for sharing the information.

I have a similar situation.
I was trying to create Physical Standby Database using
RMAN DUPLICATE command.. and i encountered


creating datafile fno=2 name=+UNDO
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 03/09/2010 16:14:18
RMAN-03015: error occurred in stored script Memory Script
RMAN-06136: ORACLE error from auxiliary database: ORA-01119: error in creating d
atabase file '+UNDO'
ORA-17502: ksfdcre:4 Failed to create file +UNDO
ORA-15041: diskgroup space exhausted


SQL> select path, free_mb,total_mb from v$asm_disk where group_number in (select
group_number from v$asm_diskgroup where name='UNDO');

PATH FREE_MB TOTAL_MB
-------------------- ---------- ----------
\\.\ORCLDISKUNDO0 0 21508


The diskgroups on standby are
+DATA/
+FLASH/
+UNDO/
+IDX/

Disks are with this naming convention:

+DATA/prim
+FLASH/prim
+UNDO/prim
+IDX/prim

+DATA/stdby
+FLASH/stdby
+UNDO/stdby
+IDX/stdby

Kindly suggest me how to recover from this.

I need some more clarification,
The procedure that you followed should be on which disk ??

-Thanks
Sandeep

Surachart said...

We will point at +UNDO Disk Group. I think you are using External redundancy.

From your query, that show FREE_MB=0

Your ASM Disk Group Full (UNDO)

please query

select name, total_mb, free_mb, required_mirror_free_mb, usable_file_mb ,type from v$asm_diskgroup where name='UNDO';

i'd like to make sure UNDO full...

Another IDEA... use "asmcmd"
$ export ORACLE_SID=+ASM
$ asmcmd
ASMCMD> cd UNDO
ASMCMD> du

and
$ asmcmd lsdg

if that show UNDO FREE_MB=0 that mean UNDO diskgroup full. you have to add disk to this disk group.

sandeep said...

Hi Surachart,
Appreciate the quick response.

These are the result of the queries you asked for :

Yes you are right !! Redundancy is External.

SQL> select name, total_mb, free_mb, required_mirror_free_mb, usable_file_mb ,ty
pe from v$asm_diskgroup where name='UNDO';

NAME TOTAL_MB FREE_MB REQUIRED_MIRROR_FREE_MB
------------------------------ ---------- ---------- -----------------------
USABLE_FILE_MB TYPE
-------------- ------
UNDO 21508 2857 0
2857 EXTERN


ASMCMD> lsdg
State Type Rebal Unbal Sector Block AU Total_MB Free_MB Req_mi
r_free_MB Usable_file_MB Offline_disks Name
MOUNTED EXTERN N N 512 4096 1048576 71680 23610
0 23610 0 DATA/
MOUNTED EXTERN N N 512 4096 1048576 10652 10041
0 10041 0 FLASH/
MOUNTED EXTERN N N 512 4096 1048576 51207 38820
0 38820 0 IDX/
MOUNTED EXTERN N N 512 4096 1048576 21508 2857
0 2857 0 UNDO/

ASMCMD> pwd
+
ASMCMD> cd UNDO
ASMCMD> du
Used_MB Mirror_used_MB
18600 18600
ASMCMD> ls
LSD/
ORC1/
ASMCMD> cd LSD
ASMCMD> du
Used_MB Mirror_used_MB
18600 18600
ASMCMD> pwd
+UNDO/LSD

One more thing I would like to mention :

This disk +UNDO/lsd is created for my physicalstandby(lsd) database which I was recreating.

I can afford to drop and recreate the disk, but will it be safe and not affect my primary(orc1).


If adding disk is the only option ... what is the procedure as the SYSTEM ADMIN tasks also will be done by me :( and I'm a newbie in ASM.

-Thanks a ton again
Sandeep

Surachart said...

I can afford to drop and recreate the disk, but will it be safe and not affect my primary(orc1).

> You have to make sure no database files on +UNDO before you'll drop/recreate.


If adding disk is the only option ... what is the procedure as the SYSTEM ADMIN tasks also will be done by me :( and I'm a newbie in ASM.)

> You should ask system admin to add disks for you ...

sandeep said...

Hi Surachart,

This is my structure now.

ASMCMD> pwd
+
ASMCMD> cd UNDO
ASMCMD> pwd
+UNDO
ASMCMD> cd lsd
ASMCMD> ls
CONTROLFILE/
DATAFILE/
ONLINELOG/
TEMPFILE/
===========
ASMCMD> cd datafile
ASMCMD> ls
MHDORA0.261.652294143
UNDOTBS1.257.652291953
UNDOTBS1.260.652293001
UNDOTBS1.267.713189713
UNDOTBS2.258.652292093
UNDOTBS2.259.652293001

==================
ASMCMD> pwd
+UNDO/lsd/controlfile
ASMCMD> ls
current.256.651779583
===============
ASMCMD> ls
group_14.264.713104839
group_16.265.713104845
group_19.266.713104855
====
ASMCMD> pwd
+UNDO/lsd/tempfile
ASMCMD> ls
MHDTEMP.262.652301213
SFATEMP.263.652301213
ASMCMD>


But I think adding disk is the ideal option for me.


Thanks a lot again!!Gr8 blog :)

I'll keep you posted about the status.

Thnaks ,
Sandeep

Surachart said...

>ASMCMD> cd datafile
>ASMCMD> ls
>MHDORA0.261.652294143
>UNDOTBS1.257.652291953
>UNDOTBS1.260.652293001
>UNDOTBS1.267.713189713
>UNDOTBS2.258.652292093
>UNDOTBS2.259.652293001

Can you migrate some datafiles to other DISKGROUPs?

Example:
http://surachartopun.com/2009/10/moving-datafile-system-from-one-asm.html

>But I think adding disk is the ideal option for me.
Good Idea .. don't forgot use ...
SQL> alter diskgroup UNDO add disk '/.../' REBALANCE POWER 5 ;


Good Luck

sandeep said...

One more thing I would like to let you know.

RMAN> report schema
2> ;

Report of database schema

List of Permanent Datafiles
===========================
File Size(MB) Tablespace RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1 1024 SYSTEM *** +DATA/orc1/datafile/system.259.647315
029
2 3072 UNDOTBS1 *** +UNDO/orc1/datafile/undotbs1.257.6473
15049
3 1350 SYSAUX *** +DATA/orc1/datafile/sysaux.260.647315
131
4 10532 UNDOTBS2 *** +UNDO/orc1/datafile/undotbs2.258.6473
15143


I have 2 UNDO tablespaces UNDOTBS1 and UNDOTBS2 , is it feasible to drop UNDOTBS2 and make UNDOTBS1 as default.

I can recover around 15GB of space if I do that.

Can you suggest me on that?

Surachart said...

2 -3 choices
1. make +UNDO diskgroup has enough space.
- resize undo datafiles
- move undo datafile to another DISKGROP
- check files in +UNDO diskgroup, drop/move/....

2. add disk to +UNDO diskgroup.'

3. Duplicate database, make newname undo files to other asm diskgroups.


Good Luck

Ramanath said...

Hi Surachart,

I am doing the cloning of a database from HOTBACKUP.
Able to create teh controlfile and then recover the database successfully.

but there after when i try to open the database with resetlogs option it is failing with the message
"SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
"

In the alert log i observed that

SMON: enabling cache recovery
Wed Mar 10 13:38:28 2010
Errors in file /dboracle/app/ora102/10gR2/rdbms/log/udump/bscsis_ora_4318.trc:
ORA-00704: bootstrap process failure
ORA-39700: database must be opened with UPGRADE option
Wed Mar 10 13:38:28 2010
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 4318
ORA-1092 signalled during: alter database open resetlogs...



What could be the reason for this ?
How can i overcome this issue ?

Please respond at the earliest. Your feedback is highly appreciated.

Regards,
Madhu Karthik

Surachart said...

What about Oracle version on production (HOTBACKUP) and CLONDB?

Oracle should be similar version.

Or make sure HOTBACKUP, that is oracle version in the same on clonedb...

Please check.

Good Luck

sandeep said...

SQL> select path, free_mb,total_mb,state from v$asm_disk where group_number in (
select group_number from v$asm_diskgroup where name='UNDO');

PATH
--------------------------------------------------------------------------------

FREE_MB TOTAL_MB STATE
---------- ---------- --------
\\.\ORCLDISKUNDO0
18081 21508 NORMAL

ASMCMD> lsdg
State Type Rebal Unbal Sector Block AU Total_MB Free_MB Req_mi
r_free_MB Usable_file_MB Offline_disks Name
MOUNTED EXTERN N N 512 4096 1048576 71680 23610
0 23610 0 DATA/
MOUNTED EXTERN N N 512 4096 1048576 10652 10041
0 10041 0 FLASH/
MOUNTED EXTERN N N 512 4096 1048576 51207 38820
0 38820 0 IDX/
MOUNTED EXTERN N N 512 4096 1048576 21508 18081
0 18081 0 UNDO/



channel ORA_AUX_DISK_1: starting datafile backupset restore
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00001 to +DATA
restoring datafile 00002 to +UNDO
restoring datafile 00003 to +DATA
restoring datafile 00004 to +UNDO
restoring datafile 00005 to +DATA
restoring datafile 00006 to +DATA/lsd/datafile/mhdauditorc1.dbf
restoring datafile 00007 to +DATA/lsd/datafile/mhdcoatorc1.dbf
...................
....................
.....................
channel ORA_AUX_DISK_1: reading from backup piece \\ORANT\RAC_BACKUP\RMAN_DB
F_20100310_1053.BKP
ORA-19870: error reading backup piece \\ORANT\RAC_BACKUP\RMAN_DBF_20100310_1
053.BKP
ORA-19504: failed to create file "+UNDO"
ORA-17502: ksfdcre:4 Failed to create file +UNDO
ORA-15041: diskgroup space exhausted
failover to previous backup


Now I have disk space , why is this error showing up again??

i'm confused now !!

sandeep said...

RMAN> show all
2> ;

RMAN configuration parameters are:
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 15 DAYS;
CONFIGURE BACKUP OPTIMIZATION OFF; # default
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '\\orant\RAC_BACKUP\%F';
CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO COMPRESSED BACKUPSET;
CONFIGURE CHANNEL DEVICE TYPE DISK FORMAT '\\orant\RAC_BACKUP\%U';

I think I found the issue.RMAN is unable to find the backupset.

I had not moved the rman backup to the standby server.

The RMAN backups are configured on a shared disk \\orant\rac_backup\

do i need to copy the backup files to the standby server and then
do i need to unset this configuration ??

i find RMAN is picking all the backupsets from this configured location(\\orant\rac_backup)


-Thanks
Sandeep

Ramanath said...

Yes, even i verified and observed that the versions of the databases are different.

Source - 10.2.0.1
Destination - 10.2.0.4

But is there any way that we can bypass and clone it further.

Thanks a lot for your quick response

Regards,
Madhu Karthik

Ramanath said...

Yes, there is a version mismtach with the databases.

Sources - 10.2.0.1
Destination - 10.2.0.4

Is there any way that we can bypass and proceed with the clone.

Thanks for a quick response.

Regards,
Madhu Karthik

Surachart said...

@sandeep
you have to copy backupset to standby on \\orant\rac_backup\ PATH

just make folder ... .


@Ramanath
you should have 10.2.0.1 on destination

Or try to
SQL>alter database open resetlogs upgrade;
and then run

@catupgrd.sql
@utlrp.sql

read about upgrade database (readme.html) in PATCH Folder.

Good Luck

you can send to surachart at gmail dot com

sandeep said...

Yes as i described ,

RMAN doesn't copy the files into the server before restoring them as i assumed earlier

This is a Windows 2003 environment.
The path provided has following naming convention:

\\\
so

\\orant\RAC_BACKUP\

server_name=orant
folder_name=RAC_BACKUP

this is the place where we take our RMAN backups

now i copied the RMAN full database backup to my standby server where i created a folder called

RAC_BACKUP and made it sharing.

so now it's location on standby should be

\\localstandby\RAC_BACKUP\

but when i ran RMAN duplicate it doesn't pick up \\localstandby\RAC_BACKUP\

is it that we have to catalog the files as they are on Non-ASM disk ???
using
RMAN>Catalog command

-Thanks again !!

Surachart said...

@sandeep

not sure with windows path.
if you need to catallog specific backup piece(this is ). you can...

CATALOG BACKUPPIECE '\\localstandby\RAC_BACKUP\';

reference:
http://download.oracle.com/docs/cd/B19306_01/backup.102/b14194/rcmsynta011.htm

Mahendran Ponnusamy said...

Hi,

Where did you find error_code in v$asm_operation?

SQL> desc v$asm_operation
Name Null? Type
----------------------------------------------------- -------- ------------------------------------
GROUP_NUMBER NUMBER
OPERATION VARCHAR2(5)
STATE VARCHAR2(4)
POWER NUMBER
ACTUAL NUMBER
SOFAR NUMBER
EST_WORK NUMBER
EST_RATE NUMBER
EST_MINUTES NUMBER

Surachart said...

@Mahendran
On V$ASM_OPERATION view

SQL> desc v$asm_operation
Name Null? Type
----------------------------------------- -------- ----------------------------
GROUP_NUMBER NUMBER
OPERATION CHAR(5)
STATE VARCHAR2(4)
POWER NUMBER
ACTUAL NUMBER
SOFAR NUMBER
EST_WORK NUMBER
EST_RATE NUMBER
EST_MINUTES NUMBER
ERROR_CODE VARCHAR2(44)

You should find "ERROR_CODE"

Please check
> select VIEW_DEFINITION from V$FIXED_VIEW_DEFINITION where view_name='GV$ASM_OPERATION';

Anonymous said...

You really saved me with the resize or temp file. otherwise rebal was stuck.
And so was production.
Thanks!!

L