7 x 24 在线支持!
Oracle ASM ORA-15063 / ORA-15042 - TROUBLESHOOTING STEPS BEFORE OPENING a SR to Oracle Support
APPLIES TO:
Oracle Database - Enterprise Edition
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Cloud Exadata Service - Version N/A and later
Information in this document applies to any platform.
PURPOSE
Self-debugging steps when a diskgroup cannot be mounted due to error ORA-15063:
ORA-15063: ASM discovered an insufficient number of disks for diskgroup s%
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "%" is missing
TROUBLESHOOTING STEPS
SECTION A - Getting started
Start by refering NOTE 452770.1 "TROUBLESHOOTING - ASM disk not found/visible/discovered issues "
Firstly identify all disks being part of the affected diskgroup by looking at last successful mount in alert_+ASM*.log.
You should search for a section as below:
SQL> ALTER DISKGROUP <DGNAME1> MOUNT /* asm agent *//* {0:0:214} */
NOTE: cache registered group DATA number=1 incarn=0x44bef6bb
NOTE: cache began mount (not first) of group DATA number=1 incarn=0x44bef6bb
NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
NOTE: Assigning number (1,0) to disk (ORCL:DATA01P)
NOTE: Assigning number (1,1) to disk (ORCL:DATA02P)
NOTE: Assigning number (1,2) to disk (ORCL:DATA03P)
NOTE: Assigning number (1,3) to disk (ORCL:DATA04P)
NOTE: Assigning number (1,4) to disk (ORCL:DATA05P)
..
NOTE: cache opening disk 0 of grp 1: DATA01P label:DATA01P
NOTE: cache opening disk 1 of grp 1: DATA02P label:DATA02P
..
SUCCESS: DISKGROUP <DGNAME1> was mounted
NOTE: When ASMLIB is not used the path to ASM disk is specified within the mount section:
NOTE: cache opening disk 1 of grp 1: REDO3_0001 path:/dev/mpath/3600601600ba12c00d4b784363e69e211
NOTE: cache opening disk 2 of grp 1: REDO3_0002 path:/dev/mpath/3600601600ba12c00d4b784363e69e212
...
Isolate the device(s) reported as "missing" as note 452770.1 suggested.
Finally start your checks as follow:
A1) If there is any IO/storage/multipathing errors reported in OS logs - investigate and fix them.
This step is mandatory as usually ORA-15063/ORA-15042 are caused by underlying IO/storage errors .
A2) If devices used by ASM disks are properly presented and configured at OS level.
If additionally "ORA-15075: disk(s) are not visible cluster-wide" is reported, make sure that all devices are cluster-wide visible.
A3) If all ASM disks have appropriate permissions (eg: they should be owned by grid owner)
If ownership of ASM disk(s) has been changed for whatever reason, please correct that.
A4) If/how the "missing" device(s) is reported when querying v$asm_disks
-----------------------------------------------------------------------------------
If the device(s) is reported with status:
=> "PROVISIONED/CANDIDATE" - this means the header of ASM disk(s) is damaged.
-> investigate the IO problems behind the corruption - see step A1. Oracle never wipes out its metadata!! A checksum is made for every write before being accepted.
-> check the header status, in order to confirm the damage:
$> kfed read <path_to_your_missing_devices>
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
....
-> try to repair the header and see if diskgroup can be mounted:
$> kfed repair <path_to_your_missing_devices>
-> check the if there is additional corruptions reported by ASM (eg ORA-15196) or by your database - as IO/storage problems could affect more than one block.
If any corruption is seen please open a SR to Oracle Support.
NOTE:
1) When non-default AU size is used AUSZ=<au_size> must be specified with each KFED command.
2) "kfed repair" works for 11g ONLY!
=> "UNKNOWN/IGNORED" - this means the ASM disk(s) is not seen at OS level.
-> review steps A1,A2 and A3:
-----------------------------------------------------------------------------------
A5) If asm_diskstring is still properly set.
On Windows configuration, you can also refer NOTE 880061.1 "ASM Is Unable To Detect SCSI Disks On Windows"
SECTION B - ASMLIB is used
When ASMLIB is used, follow the above steps (section A) and also check the errors associated with ORA-15063:
B1) ORA-15183 Unable to initialize the ASMLIB in oracle/ORA-15183: ASMLIB initialization error [driver/agent not installed]
Refer: NOTE 340519.1 Cannot Start ASM Ora-15063/ORA-15183
B2) ORA-15186: ASMLIB error function = [asm_open], error = [1], mesg = [Operation not permitted]
Check your ASMLIB health.
=> correctness of installed rpm's
=> correctness of symlinks - all nodes should show:
# ls -l /etc/sysconfig/oracleasm
lrwxrwxrwx 1 root root 24 Sep 18 22:10 /etc/sysconfig/oracleasm -> oracleasm-_dev_oracleas
=> correctness of ASMLIB configuration (/etc/sysconfig/oracleasm) - when multipathing is used:
# ORACLEASM_SCANORDER: Matching patterns to order disk scanning
ORACLEASM_SCANORDER="dm"
# ORACLEASM_SCANEXCLUDE: Matching patterns to exclude disks from scan
ORACLEASM_SCANEXCLUDE="sd"
B3) Check if ASMLIB disks are listed under /dev/oracleasm/disks
=> devices under /dev/oracleasm/disks/* must be reported as dm devices on all nodes (not single path device -sd*-).If not, please correct that! (see step B2)
$> ls -al /dev/oracleasm/disks
brw-rw---- 1 grid dba 253, 29 Feb 12 11:44 /dev/oracleasm/disks/DATA01P
brw-rw---- 1 grid dba 253, 35 Feb 12 11:44 /dev/oracleasm/disks/DATA02P
brw-rw---- 1 grid dba 253, 27 Feb 15 16:04 /dev/oracleasm/disks/DATA03P
brw-rw---- 1 grid dba 253, 24 Feb 12 11:44 /dev/oracleasm/disks/DATA04P
brw-rw---- 1 grid dba 253, 25 Feb 12 11:44 /dev/oracleasm/disks/DATA05P
=> If one of your ASMLIB disk(s) is missing from the above output, first try to re-scan devices, as root:
# /etc/init.d/oracleasm scandisks
=> If ASMLIB disk(s) is still missing from /dev/oracleasm/disks, engage your sysadmin to investigate this (see steps A1, A2, A3).
B4) Check if ASMLIB disk(s) has the correct ASMLIB stamp and status:
$> kfed read <ASMLIB_device> |grep provstr
kfdhdb.driver.provstr: ORCLDISK<diskname> ; 0x000: length=20
$> kfed read <ASMLIB_device> | egrep 'kfbh.type|kfdhdb.dskname|kfdhdb.hdrsts'
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfdhdb.dskname: DATA01P ; 0x028: length=14
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
=> If the output is "kfdhdb.driver.provstr: ORCLCLRD" (but kfdhdb.hdrsts= MEMBER and kfbh.type=KFBTYP_DISKHEAD) then your disk was deleted using "oracleasm deletedisk".
=> If kfbh.type = KFBTYP_INVALID -> see step A4) and check if "kfed repair" could fix the problem.
B5)Refer also the below documents:
NOTE: 398622.1 ORA-15186: ASMLIB error function = [asm_open], error = [1], mesg = [Operation not permitted]
NOTE: 1384504.1 Mount ASM Disk Group Fails : ORA-15186, ORA-15025, ORA-15063
NOTE: 967461.1 "Multipath: error getting device" seen in OS log causes ASM/ASMlib to shutdown by itself
NOTE: 1526920.1 ORA-15186 ORA-15063 on node 2
SECTION C - Additional notes to review
If the above checks are done, but error still persists, please review also the below notes, depending on your configuration/situation:
NOTE: 577526.1 ORA-15063 ASM Discovered An Insufficient Number Of Disks For Diskgroup using NetApp Storage
NOTE: 784776.1 ORA-15063 When Mounting a Diskgroup After Storage Cloning ( BCV / Split Mirror / SRDF / HDS / Flash Copy )
NOTE: 555918.1 ORA-15038 On Diskgroup Mount After Node Eviction
NOTE: 1484723.1 ASM Candidate Raw Device Is Not Presented As A RAC Cluster Wide Shared character Devices On Unix.
NOTE: 1534211.1 ORA-15017 and ORA-15063 errors for unused diskgroups in 11.2
NOTE: 1487443.1 Mounting Diskgroup Fails With ORA-15063 and V$ASM_DISK Shows PROVISIONED
NOTE: 742832.1 AIX:After changing Multipathing drivers from RDAC to MPIO ASM discovered an insufficient number of disks
NOTE: 1276913.1 Unable to discover or use raw devices for ASM in HP-UX Itanium in 11.2.0.2 ( ORA-15063 )
SECTION D - Information to be collected when are you going to open a SR
If you are not able to fix the problem on your own, please collect the below information and raise a SR to Oracle Support
D1) alert_+ASM*.log (from all nodes if RAC)
D2) script#1 from NOTE 470211.1 How To Gather/Backup ASM Metadata In A Formatted Manner version 10.1, 10.2, 11.1 & 11.2?
D3) KFED reports
#! /bin/sh
rm /tmp/kfed_DH.out /tmp/kfed_BK.out
for i in `ls <your_path_to_asm_disks>`
do
echo $i >> /tmp/kfed_DH.out
kfed read $i >> /tmp/kfed_DH.out
echo $i >> /tmp/kfed_BK.out
kfed read $i aun=1 blkn=254 >> /tmp/kfed_BK.out
done
Run kfed.sh in as GRID/ASM owner. Upload /tmp/kfed_DH.out, /tmp/kfed_BK.out
! Pay attention to non-default AU size - if a non-default AU size is used the you must specify it. (see note 1485597.1 "ASM tools used by Support : KFOD, KFED, AMDU")
D4) ASMLIB information
NOTE : 869526.1 Collecting The Required Information For Support To Troubleshot ASM/ASMLIB Issues.
D5) List of your ASM devices
$> ls -al <path_to_ASM_devices>
D6) OS logs (from all nodes if this is RAC configuration)
SECTION E - Disk is reported as MISSING after a failed disk addition
If you are facing ORA-15063 after a failed disk addition, please collect the below information and raise a SR to Oracle Support
E1) alert_+ASM*.log (from all nodes if RAC)
E2) script#1 from NOTE 470211.1 How To Gather/Backup ASM Metadata In A Formatted Manner version 10.1, 10.2, 11.1 & 11.2?
E3) KFED reports
#! /bin/sh
rm /tmp/kfed_*.out
for i in `ls <your_path_to_asm_disks>`
do
echo $i >> /tmp/kfed_DH.out
kfed read $i >> /tmp/kfed_DH.out
echo $i >> /tmp/kfed_BK.out
kfed read $i aun=1 blkn=254 >> /tmp/kfed_BK.out
echo $i >> /tmp/kfed_PST.out
kfed read $i aun=1 blkn=2 >> /tmp/kfed_PST.out
echo $i >> /tmp/kfed_FS.out
kfed read $i blkn=1 >> /tmp/kfed_FS.out
echo $i >> /tmp/kfed_FD.out
kfed read $i aun=2 blkn=1 >> /tmp/kfed_FD.out
echo $i >> /tmp/kfed_DD.out
kfed read $i aun=2 blkn=0 >> /tmp/kfed_DD.out ##there might be more than one block needed if a large number of disks -> this might be asked later by Oracle Support
done
Run kfed.sh in as GRID/ASM owner. Upload /tmp/kfed_*.out
! Pay attention to non-default AU size - if a non-default AU size is used the you must specify it. (see note 1485597.1 "ASM tools used by Support : KFOD, KFED, AMDU")
E4) AMDU output
amdu -diskstring '<ASM_DISKSTRING>' -dump '<DISKGROUP_NAME>' -noimage
amdu -diskstring '<ASM_DISKSTRING>' -print <DISKGROUP_NAME>.F2.V0.C2 > DG.amdu
####F2.V0.C2 --> This will only extract up to 16 disks information. If there is a large number of disks, a larger output is needed