Friday, April 03, 2009

Why My Oracle Cluster could not start?


When rebooted Oracle RAC Server or..., Oracle Cluster couldn't start anyway;)
that doesn't happen often..., But If!

This just show my idea to solve about it...  If We find out How to resolve ...we can find many... articles on Internet;)

Oracle Cluster wasn't started, and  that we will not find anything at ORA_CRS_HOME/log/HOSTNAME/* to help... So, just find out to Operation System Logs(*.info).
# /u01/oracle/product/crs/bin/crsctl check  crs
Failure 1 contacting Cluster Synchronization Services daemon
Cannot communicate with Cluster Ready Services
Cannot communicate with Event Manager
# /u01/oracle/product/crs/bin/crs_stat 
CRS-0184: Cannot communicate with the CRS daemon.
Start to investigate from Checking init processes;)
# ps -aef | grep "init\."
root      4124     1  0 12:08 ?        00:00:00 /bin/sh /etc/init.d/init.evmd run
root      4125     1  0 12:08 ?        00:00:00 /bin/sh /etc/init.d/init.cssd fatal
root      4126     1  0 12:08 ?        00:00:00 /bin/sh /etc/init.d/init.crsd run
root      4710  4124  0 12:08 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root      5031  4125  0 12:08 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root      5289  4126  0 12:08 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
that show some processes was checking to start...

If your crs 's disabled you just find:
# ps -aef | grep "init\."
root      4166     1  0 12:33 ?        00:00:00 /bin/sh /etc/init.d/init.evmd run
root      4167     1  0 12:33 ?        00:00:00 /bin/sh /etc/init.d/init.cssd fatal
root      4168     1  0 12:33 ?        00:00:00 /bin/sh /etc/init.d/init.crsd run
After make sure Cluster have the problem, So, check messages log on (Linux)/var/log/messages

 Apr  3 12:11:59 oratest01 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.5031.
Apr  3 12:11:59 oratest01 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.5289.

After that, Check /tmp/crsctl.* files
# cat /tmp/crsctl.5289
Oracle Cluster Registry initialization failed accessing Oracle Cluster Registry device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]

If your crs 's disabled, that dont' find anything on /var/log/messages file.
Example: Upon problem about OCR File

# cat /etc/oracle/ocr.loc
ocrconfig_loc=/dev/raw/raw11
ocrmirrorconfig_loc=/dev/raw/raw12
local_only=FALSE

that show that use rawdevice ;), So check rawdevice services:

# /etc/init.d/rawdevices status 

Nothing to show, So start rawdevices.

# /etc/init.d/rawdevices start
Assigning devices: 
 .
 .
 .
           /dev/raw/raw11  -->   /dev/loop1
/dev/raw/raw11: bound to major 7, minor 1
           /dev/raw/raw12  -->   /dev/loop2
/dev/raw/raw12: bound to major 7, minor 2
.
.
.

# /etc/init.d/rawdevices status
.
.
.
/dev/raw/raw11: bound to major 7, minor 1
/dev/raw/raw12: bound to major 7, minor 2
.
.
.

That should to resovle this case;)

# /u01/oracle/product/crs/bin/crsctl check  crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy
 >>> What is that mean? I try to show when Oracle Cluster could not start... How can I do ? How can I think to do?

- Check init process 

$ ps -aef | grep "init\."
.
.
.

If don't find any process about cluster, make sure have some scripts on  /etc/inittab file.
>>>
h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1
h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1
h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1
>>>

- Check Error on messages log (/var/log/messages) and /tmp/crsctl.*

Another Case;) to make idea
# /u01/oracle/product/crs/bin/crsctl check  crs
Failure 1 contacting Cluster Synchronization Services daemon
Cannot communicate with Cluster Ready Services
Cannot communicate with Event Manager
# /u01/oracle/product/crs/bin/crs_stat 
CRS-0184: Cannot communicate with the CRS daemon.
# ps -aef | grep "init\."
root      4166     1  0 12:33 ?        00:00:00 /bin/sh /etc/init.d/init.evmd run
root      4167     1  0 12:33 ?        00:00:00 /bin/sh /etc/init.d/init.cssd fatal
root      4168     1  0 12:33 ?        00:00:00 /bin/sh /etc/init.d/init.crsd run
root      9579  4166  0 12:46 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root     10566  4167  0 12:46 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root     10585  4168  0 12:46 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
Oracle cluster was checking to start. So, check ... check and check! finally check messages log:
Apr  3 12:47:32 oratest01 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.10566.
Apr  3 12:47:32 oratest01 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.10585.
found something "logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.10585" and then 
# cat /tmp/crsctl.10585
Failed 3 to bind listening endpoint: (ADDRESS=(PROTOCOL=tcp)(HOST=oratest01-priv))

That mean, it had the problem about "oratest01-priv" (InterConnect) , hostname or ..., So check and resolve;)

It's a good thing DBA should to do... check any error on messages log [when cluster could not start or cluster rebooted...].

DBA should to know about Operation System... that helpful to be DBA;)


3 comments:

Jaspreet Singh said...

Thanks a lot man
very good artical.keep it up!
I am facing the same kind of error.
i have istalled 11gr2 on asm .
i can see asm disk on both nodes.after sucessfully install.
i reebot server.I can see asm process on one node .crs is up .but on node 2 crs not coming up.i run ./ocrcheck file on both node ocrcheck is giving the same output.when i try ./crs_start is give error
u01/oracle/product/crs/bin/crsctl check crs
Failure 1 contacting Cluster Synchronization Services daemon
Cannot communicate with Cluster Ready Services
Cannot communicate with Event Manager

# /u01/grid/bin/crs_stat
CRS-0184: Cannot communicate with the CRS daemon.

pls suggest
Thanks is advance

Surachart said...

point on node 2.

check HAS

$/u01/oracle/product/crs/bin/crsctl check has

and check resource profile about AUTO_START
$/u01/oracle/product/crs/bin/crsctl stat res -p

If error, try to stop/start has

# /u01/oracle/product/crs/bin/crsctl stop has
# /u01/oracle/product/crs/bin/crsctl start has

and check log on GRID Folder. $GRID_HOME/log/node_name/ PATH

and /var/log/messages file

Anonymous said...

Check your hearbeat interface
ifconfig -a
/etc/hostname.*
/etc/hosts