Email: service@parnassusdata.com 7 x 24 online support!
Oracle Database Block Corruption in ASM
we got Block corruption in the database and found the following alert log entry during that time.
we need to find why block corruption occur. what is the culprit OS/Storage/DB ?
Mon Aug 25 19:48:37 2014
WARNING: cache read a corrupt block: group=1(DATA) fn=281 indblk=16 disk=8 (ASM_DATA12) incarn=3491799612 au=28481 blk=16 count=6
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11370.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]
NOTE: a corrupted block from group DATA was dumped to /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11370.trc
WARNING: cache read (retry) a corrupt block: group=1(DATA) fn=281 indblk=16 disk=8 (ASM_DATA12) incarn=3491799612 au=28481 blk=16 count=1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11370.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]
ERROR: cache failed to read group=1(DATA) fn=281 indblk=16 from disk(s): 8(ASM_DATA12)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]
NOTE: cache initiating offline of disk 8 group DATA
NOTE: process _user11370_+asm1 (11370) initiating offline of disk 8.3491799612 (ASM_DATA12) with mask 0x7e in group 1
NOTE: initiating PST update: grp = 1, dsk = 8/0xd020a23c, mask = 0x6a, op = clear
Mon Aug 25 19:48:41 2014
GMON updating disk modes for group 1 at 52 for pid 41, osid 11370
ERROR: Disk 8 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
Mon Aug 25 19:48:42 2014
NOTE: cache dismounting (not clean) group 1/0x4AA052EA (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 11956, image: oracle@DB01 (B000)
WARNING: Offline for disk ASM_DATA12 in mode 0x7f failed.
Mon Aug 25 19:48:42 2014
NOTE: halting all I/Os to diskgroup 1 (DATA)
Mon Aug 25 19:48:43 2014
NOTE: LGWR doing non-clean dismount of group 1 (DATA)
NOTE: LGWR sync ABA=182.3456 last written ABA 182.3456
Mon Aug 25 19:48:44 2014
kjbdomdet send to inst 2
detach from dom 1, sending detach message to inst 2
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11370.trc (incident=144329):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "ASM_DATA12" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]
Incident details in: /u01/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_144329/+ASM1_ora_11370_i144329.trc
Mon Aug 25 19:48:45 2014
List of instances:
1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 4)
Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 1 invalid = TRUE
Mon Aug 25 19:48:45 2014
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
1416 GCS resources traversed, 0 cancelled
ERROR: ORA-15130 thrown in RBAL for group number 1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
ORA-15130: diskgroup "DATA" is being dismounted
Dirty Detach Reconfiguration complete
Mon Aug 25 19:48:46 2014
WARNING: dirty detached from domain 1
NOTE: cache dismounted group 1/0x4AA052EA (DATA)
SQL> alter diskgroup DATA dismount force /* ASM SERVER:1252020970 */
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
ORA-15130: diskgroup "DATA" is being dismounted
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
ORA-15130: diskgroup "DATA" is being dismounted
Mon Aug 25 19:48:52 2014
Dumping diagnostic data in directory=[cdmp_20140825194852], requested by (instance=1, osid=11370), summary=[incident=144329].
Mon Aug 25 19:48:53 2014
System State dumped to trace file /u01/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_144329/+ASM1_ora_11370_i144329.trc
Mon Aug 25 19:48:53 2014
Sweep [inc][144329]: completed
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
ORA-15130: diskgroup "DATA" is being dismounted
Mon Aug 25 19:48:58 2014
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
ORA-15130: diskgroup "DATA" is being dismounted
Mon Aug 25 19:48:58 2014
Sweep [inc2][144329]: completed
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
ORA-15130: diskgroup "DATA" is being dismounted
Mon Aug 25 19:49:01 2014
NOTE: ASM client PROD_1:PROD disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_17812.trc
NOTE: cache deleting context for group DATA 1/0x4aa052ea
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
ORA-15130: diskgroup "" is being dismounted
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
ORA-15130: diskgroup "" is being dismounted
Mon Aug 25 19:49:07 2014
NOTE: AMDU dump of disk group DATA created at /u01/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_144329
Mon Aug 25 19:49:10 2014
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
ORA-15130: diskgroup "" is being dismounted
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
ORA-15130: diskgroup "" is being dismounted
Mon Aug 25 19:49:15 2014
GMON dismounting group 1 at 53 for pid 43, osid 11956
Mon Aug 25 19:49:15 2014
NOTE: Disk ASM_DATA01 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA02 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA03 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA07 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA08 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA09 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA19 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA11 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA12 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA13 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA14 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA15 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA16 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA17 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA18 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA21 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA22 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA23 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA24 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA10 in mode 0x7f marked for de-assignment
NOTE: Disk ASM_DATA20 in mode 0x7f marked for de-assignment
SUCCESS: diskgroup DATA was dismounted
SUCCESS: alter diskgroup DATA dismount force /* ASM SERVER:1252020970 */
SUCCESS: ASM-initiated MANDATORY DISMOUNT of group DATA
Mon Aug 25 19:49:16 2014
NOTE: diskgroup resource ora.DATA.dg is offline
Mon Aug 25 19:51:34 2014
SQL> ALTER DISKGROUP DATA MOUNT /* asm agent *//* {1:51111:41484} */
NOTE: cache registered group DATA number=1 incarn=0x4aa09b74
NOTE: cache began mount (not first) of group DATA number=1 incarn=0x4aa09b74
NOTE: Assigning number (1,0) to disk (ORCL:ASM_DATA01)
NOTE: Assigning number (1,1) to disk (ORCL:ASM_DATA02)
NOTE: Assigning number (1,2) to disk (ORCL:ASM_DATA03)
NOTE: Assigning number (1,3) to disk (ORCL:ASM_DATA07)
NOTE: Assigning number (1,4) to disk (ORCL:ASM_DATA08)
NOTE: Assigning number (1,5) to disk (ORCL:ASM_DATA09)
NOTE: Assigning number (1,19) to disk (ORCL:ASM_DATA10)
NOTE: Assigning number (1,7) to disk (ORCL:ASM_DATA11)
NOTE: Assigning number (1,8) to disk (ORCL:ASM_DATA12)
NOTE: Assigning number (1,9) to disk (ORCL:ASM_DATA13)
NOTE: Assigning number (1,10) to disk (ORCL:ASM_DATA14)
NOTE: Assigning number (1,11) to disk (ORCL:ASM_DATA15)
NOTE: Assigning number (1,12) to disk (ORCL:ASM_DATA16)
NOTE: Assigning number (1,13) to disk (ORCL:ASM_DATA17)
NOTE: Assigning number (1,14) to disk (ORCL:ASM_DATA18)
NOTE: Assigning number (1,6) to disk (ORCL:ASM_DATA19)
NOTE: Assigning number (1,20) to disk (ORCL:ASM_DATA20)
NOTE: Assigning number (1,15) to disk (ORCL:ASM_DATA21)
NOTE: Assigning number (1,16) to disk (ORCL:ASM_DATA22)
NOTE: Assigning number (1,17) to disk (ORCL:ASM_DATA23)
NOTE: Assigning number (1,18) to disk (ORCL:ASM_DATA24)
Mon Aug 25 19:51:34 2014
GMON querying group 1 at 55 for pid 27, osid 8831
NOTE: cache opening disk 0 of grp 1: ASM_DATA01 label:ASM_DATA01
NOTE: F1X0 found on disk 0 au 2 fcn 0.7050972
NOTE: cache opening disk 1 of grp 1: ASM_DATA02 label:ASM_DATA02
NOTE: cache opening disk 2 of grp 1: ASM_DATA03 label:ASM_DATA03
NOTE: cache opening disk 3 of grp 1: ASM_DATA07 label:ASM_DATA07
NOTE: cache opening disk 4 of grp 1: ASM_DATA08 label:ASM_DATA08
NOTE: cache opening disk 5 of grp 1: ASM_DATA09 label:ASM_DATA09
NOTE: cache opening disk 6 of grp 1: ASM_DATA19 label:ASM_DATA19
NOTE: cache opening disk 7 of grp 1: ASM_DATA11 label:ASM_DATA11
NOTE: cache opening disk 8 of grp 1: ASM_DATA12 label:ASM_DATA12
NOTE: cache opening disk 9 of grp 1: ASM_DATA13 label:ASM_DATA13
NOTE: cache opening disk 10 of grp 1: ASM_DATA14 label:ASM_DATA14
NOTE: cache opening disk 11 of grp 1: ASM_DATA15 label:ASM_DATA15
NOTE: cache opening disk 12 of grp 1: ASM_DATA16 label:ASM_DATA16
NOTE: cache opening disk 13 of grp 1: ASM_DATA17 label:ASM_DATA17
NOTE: cache opening disk 14 of grp 1: ASM_DATA18 label:ASM_DATA18
NOTE: cache opening disk 15 of grp 1: ASM_DATA21 label:ASM_DATA21
NOTE: cache opening disk 16 of grp 1: ASM_DATA22 label:ASM_DATA22
NOTE: cache opening disk 17 of grp 1: ASM_DATA23 label:ASM_DATA23
NOTE: cache opening disk 18 of grp 1: ASM_DATA24 label:ASM_DATA24
NOTE: cache opening disk 19 of grp 1: ASM_DATA10 label:ASM_DATA10
NOTE: cache opening disk 20 of grp 1: ASM_DATA20 label:ASM_DATA20
NOTE: cache mounting (not first) external redundancy group 1/0x4AA09B74 (DATA)
Mon Aug 25 19:51:35 2014
kjbdomatt send to inst 2
Mon Aug 25 19:51:35 2014
NOTE: attached to recovery domain 1
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Mon Aug 25 19:51:35 2014
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATA)
NOTE: LGWR found thread 1 closed at ABA 182.3456
NOTE: LGWR mounted thread 1 for diskgroup 1 (DATA)
NOTE: LGWR opening thread 1 at fcn 0.11665696 ABA 183.3457
NOTE: cache mounting group 1/0x4AA09B74 (DATA) succeeded
NOTE: cache ending mount (success) of group DATA number=1 incarn=0x4aa09b74
Mon Aug 25 19:51:35 2014
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup DATA was mounted
SUCCESS: ALTER DISKGROUP DATA MOUNT /* asm agent *//* {1:51111:41484} */
Mon Aug 25 19:51:48 2014
NOTE: client PROD_1:PROD registered, osid 21529, mbr 0x1
Mon Aug 25 19:55:53 2014
NOTE: ASM client PROD_1:PROD disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_21529.trc
Mon Aug 25 19:57:42 2014
NOTE: client PROD_1:PROD registered, osid 2546, mbr 0x1
Unfortunately, such analysis and recommendations cannot be accomplished in the space of communities forum. To try to determine the root cause for a corruption of ASM metadata requires the collection of a large amount of data for analysis. This will need to be addressed in a formal service request to Oracle support to see if we are able to determine the root cause for the corruption of your metadata.
Please open a service request to Oracle support to conduct root cause analysis of an ASM disk header metadata corruption. When you open the service request you will want to have already collected all of the following to be uploaded to the service request for analysis.
1.
Please upload the text version of the ASM alert.log If RAC from all ASM instances in the
cluster. This file should be in the diagnostic destination trace directory for your Grid
Infrastructure installation and should be named alert_+ASM.log or if RAC the +ASM will be
appended with instance number.
2. Assuming that you are on 11.2 or higher version of Oracle software you will want to also collect the following as the root user even if this is a single instance configuration
Please review CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide (Doc ID 330358.1) From
this document collect the diagcollection script output as the root user. This script will
produce a minimum of 3 zip files per Grid Infrastructure home named crsDATA.zip, ocrDATA.zip
and osDATA.zip. There may be other files as well such as coreDATA.zip.
Please compress all of the zip files produced into a single zip file per Grid Infrastructure home.
3. Assuming that your ASM instance is presently up and running collect the following
Please collect and upload the output of scripts 1, 2 and 3 from
How To Gather/Backup ASM Metadata In A Formatted Manner version 10.1, 10.2, 11.1, 11.2 and 12.1? (Doc ID 470211.1)
4. Please use syntax similar to the following to get an AMDU dump of the impacted diskgroup so that we can determine the extent of the metadata corruption.
If you are not on 11.1 or higher refer to the next document to download the needed executable to run the command:
Placeholder for AMDU binaries and using with ASM 10g (Doc ID 553639.1)
Once you have the executable run the next command Note the directory created by the command and zip up the contents of the directory for upload to Oracle via the service request.
NOTE: This command is specific to your environment and affected diskgroup so will not work on other systems but can be modified to work by changing diskstring entry to the diskstring being used and changing the diskgroup name at the end of the line to your diskgroup name
$ amdu -diskstring '/dev/oracleasm/disks/*' -dump 'DATA'
We will also need the output of the next command. Note that this too is specific to your particular situation. Do not be alarmed, the command provided only takes a full binary copy of the first 50 MB of the disk and creates a file to be uploaded for our analysis.
dd if=/dev/oracleasm/disks/ASM_DATA12 of=/tmp/DATA12.dd bs=1048576 count=50
Please collect the information requested in the next document (I created it) and provide it:
Collecting The Required Information For Support To Validate & Troubleshooting ASM Diskgroup Corruptions. (Doc ID 1675152.1)
When the ASM corruption occurs, you need to collection AMDU and disk's first 50mb backup to find the root cause of the ASM disk corruption.
Below are the possible issues of ASM corruption
• Disks formatted at the OS level while it was used by ASM
• Disks assigned to a file system while used by ASM
• IO errors (stale writes)
• Usage of 3rd party software
But to check what has happened in the disk, we need AMDU and disk backup. without this information it is not possible to find the cause of the issue.
Once you collect these information, you can recreate the diskgroup and restore the database from backup.
In your current scenario, You didn't recreate the diskgroup, You restore and recovers the database from backup.
Now you need to run "alter diskgroup <dg_name> check all norepair to check the disk for corruption. Unless the ASM is not touching that corrupted block, it will not throw ORA_15196 error again. but if disk corruption exist, when you run check all norepair, ASM will crash when it touches that corrupted block.