Email: service@parnassusdata.com 7 x 24 online support!
Troubleshooting Oracle Clusterware GI CRS startup issues CRS-4535 Cannot communicatew with Cluster Ready Services
Purpose
After rebooting a node, the CRS stack fails to start. This bulletin offers a list of things to check while troubleshooting the cause.
Using CRSCTL to analyze the problem is generally not useful, and results in the following:
$ crsctl check crs
CRS-4638: Oracle High Availability Servers is online
CRS-4535: Cannot communicatew with Cluster Ready Services
CRS-4533: Event Manager is online
# crsctl start resource crsd
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Start failed, or completed with errors.
# crsctl stop cluster –all
CRS-2673: Attempting to stop ‘ora.crsd’ on ‘racnode1’
CRS-2673: Attempting to stop ‘ora.crsd’ on ‘racnode2’
CRS-4548: Unable to connect to CRSD
CRS-2675: Stop of ‘ora.crsd’ on ‘racnode1’ failed
CRS-2679: Attempting to clean ‘ora.crsd’ on ‘racnode1’
CRS-4548: Unable to connect to CRSD
CRS-2678: ‘ora.crsd’ on ‘racnode1’ has experienced an unrecoverable failure
CRS-0267: Human intervention required to resume its availability
CRS-4548: Unable to connect to CRSD
CRS-2675: Stop of ‘ora.crsd’ on ‘racnode2’ failed
CRS-2679: Attempting to clean ‘ora.crsd’ on ‘racnode2’
CRS-4548: Unable to connect to CRSD
CRS-2678: ‘ora.crsd’ on ‘racnode2’ has experienced an unrecoverable failure
CRS-0267: Human intervention required to resume its availability
Instructions for the Reader
A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.
1. Check the processes currently running on the node from the Grid home. For example, if you use /u01/grid for the Grid
home, you might use the following:
ps -ef | grep grid
Determine which processes did not start. The startup dependencies are:
CRSD --> EVMD and CTSSD
CTSSD --> CSSD
CSSD --> CSSDMONITOR, DISKMON, and GPNPD
GPNPD --> MDNSD and GIPCD
2. Review the clusterware alert log and the trace files for the processes that did not start. For example, if CRSD, CSSD, and DISKMON did not start, check the trace files at the lowest process level first, which would be DISKMON.
3. If you use ASM for storing the Oracle Clusterware files (voting disks/OCR), make sure the ASM instance is started. Use SQL*Plus (from the Grid home) to start the ASM instance if it is not started, and resolve any errors that occur.
4. Make sure the ASM instance or Oracle Clusterware user has access to the disks used to store the Oracle Clusterware files. Configure UDEV if required.
5. If the disks were stamped with ASMLIB, make sure all the ASMLIB RPMs were installed. If you are missing the oracleasmlib RPM, it will appear that the disks are marked for use with ASM, but there is no library for ASM to interface with.
oracleasm-2.6 debug
oracleasm-support-2.1.2
oracleasmlib-2.0.4
oracleasm-2.6.18