Troubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]


		Troubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]
 
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
Information in this document applies to any platform.
PURPOSE
This note is to provide reference to troubleshoot 11gR2 and 12c Grid Infrastructure clusterware startup issues. It applies to issues in both new environments (during root.sh or rootupgrade.sh) and unhealthy existing environments.  To look specifically at root.sh issues, see note 1053970.1 for more information. 
SCOPE
This document is intended for Clusterware/RAC Database Administrators and Oracle support engineers. 
DETAILS
Start up sequence:
In a nutshell, the operating system starts ohasd, ohasd starts agents to start up daemons (gipcd, mdnsd, gpnpd, ctssd, ocssd, crsd, evmd asm etc), and crsd starts agents that start user resources (database, SCAN, listener etc).
 
For detailed Grid Infrastructure clusterware startup sequence, please refer to note 1053147.1
 
Cluster status
 
To find out cluster and daemon status:
 
$GRID_HOME/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
 
$GRID_HOME/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rac1                  Started
ora.crsd
      1        ONLINE  ONLINE       rac1
ora.cssd
      1        ONLINE  ONLINE       rac1
ora.cssdmonitor
      1        ONLINE  ONLINE       rac1
ora.ctssd
      1        ONLINE  ONLINE       rac1                  OBSERVER
ora.diskmon
      1        ONLINE  ONLINE       rac1
ora.drivers.acfs
      1        ONLINE  ONLINE       rac1
ora.evmd
      1        ONLINE  ONLINE       rac1
ora.gipcd
      1        ONLINE  ONLINE       rac1
ora.gpnpd
      1        ONLINE  ONLINE       rac1
ora.mdnsd
      1        ONLINE  ONLINE       rac1
 
For 11.2.0.2 and above, there will be two more processes:
 
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       rac1
ora.crf
      1        ONLINE  ONLINE       rac1
For 11.2.0.3 onward in non-Exadata, ora.diskmon will be offline:
 
ora.diskmon
      1        OFFLINE  OFFLINE       rac1
For 12c onward, ora.storage is introduced: 
ora.storage
1 ONLINE ONLINE racnode1 STABLE
 
 
To start an offline daemon - if ora.crsd is OFFLINE:
 
$GRID_HOME/bin/crsctl start res ora.crsd -init
 
 
As ohasd.bin is responsible to start up all other cluserware processes directly or indirectly, it needs to start up properly for the rest of the stack to come up. If ohasd.bin is not up, when checking it's status, CRS-4639 (Could not contact Oracle High Availability Services) will be reported; and if ohasd.bin is already up, CRS-4640 will be reported if another start up attempt is made; if it fails to start, the following will be reported:
 
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
 
 
Automatic ohasd.bin start up depends on the following:
 
1. OS is at appropriate run level:
 
OS need to be at specified run level before CRS will try to start up.
 
To find out at which run level the clusterware needs to come up:
 
cat /etc/inittab|grep init.ohasd
h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
 
 
Above example shows CRS suppose to run at run level 3 and 5; please note depend on platform, CRS comes up at different run level.
 
To find out current run level:
 
who -r
 
 
2. "init.ohasd run" is up
 
On Linux/UNIX, as "init.ohasd run" is configured in /etc/inittab, process init (pid 1, /sbin/init on Linux, Solaris and hp-ux, /usr/sbin/init on AIX) will start and respawn "init.ohasd run" if it fails. Without "init.ohasd run" up and running, ohasd.bin will not start:
 
ps -ef|grep init.ohasd|grep -v grep
root      2279     1  0 18:14 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
Note: Oracle Linux 6 (OL6) or Red Hat Linux 6 (RHEL6) has deprecated inittab, rather, init.ohasd will be configured in upstart in /etc/init, however, the process ""/etc/init.d/init.ohasd run" should still be up.
 
If any rc Snncommand script (located in rcn.d, example S98gcstartup) stuck, init process may not start "/etc/init.d/init.ohasd run"; please engage OS vendor to find out why relevant Snncommand script stuck.
 
Error "[ohasd(<pid>)] CRS-0715:Oracle High Availability Service has timed out waiting for init.ohasd to be started." may be reported of init.ohasd fails to start on time. 
 
If SA can not identify the reason why init.ohasd is not starting, the following can be a very short term workaround:
 cd <location-of-init.ohasd>
 nohup ./init.ohasd run &
 
 
 
3. Cluserware auto start is enabled - its enabled by default
 
By default CRS is enabled for auto start upon node reboot, to enable:
 
$GRID_HOME/bin/crsctl enable crs
 
To verify whether its currently enabled or not:
 
$GRID_HOME/bin/crsctl co