Email: [email protected] 7 x 24 online support!

简体中文
English
日本語

Oracle Database Block Corruption in ASM

Posted by PDSERVICE on Apr 09, 2020 In

we got Block corruption in the database and found the following alert log entry during that time.

we need to find why block corruption occur. what is the culprit OS/Storage/DB ?

Mon Aug 25 19:48:37 2014

WARNING: cache read a corrupt block: group=1(DATA) fn=281 indblk=16 disk=8 (ASM_DATA12) incarn=3491799612 au=28481 blk=16 count=6

Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11370.trc:

ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]

NOTE: a corrupted block from group DATA was dumped to /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11370.trc

WARNING: cache read (retry) a corrupt block: group=1(DATA) fn=281 indblk=16 disk=8 (ASM_DATA12) incarn=3491799612 au=28481 blk=16 count=1

Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11370.trc:

ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]

ERROR: cache failed to read group=1(DATA) fn=281 indblk=16 from disk(s): 8(ASM_DATA12)

ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]

NOTE: cache initiating offline of disk 8 group DATA

NOTE: process _user11370_+asm1 (11370) initiating offline of disk 8.3491799612 (ASM_DATA12) with mask 0x7e in group 1

NOTE: initiating PST update: grp = 1, dsk = 8/0xd020a23c, mask = 0x6a, op = clear

Mon Aug 25 19:48:41 2014

GMON updating disk modes for group 1 at 52 for pid 41, osid 11370

ERROR: Disk 8 cannot be offlined, since diskgroup has external redundancy.

ERROR: too many offline disks in PST (grp 1)

Mon Aug 25 19:48:42 2014

NOTE: cache dismounting (not clean) group 1/0x4AA052EA (DATA)

NOTE: messaging CKPT to quiesce pins Unix process pid: 11956, image: oracle@DB01 (B000)

WARNING: Offline for disk ASM_DATA12 in mode 0x7f failed.

Mon Aug 25 19:48:42 2014

NOTE: halting all I/Os to diskgroup 1 (DATA)

Mon Aug 25 19:48:43 2014

NOTE: LGWR doing non-clean dismount of group 1 (DATA)

NOTE: LGWR sync ABA=182.3456 last written ABA 182.3456

Mon Aug 25 19:48:44 2014

kjbdomdet send to inst 2

detach from dom 1, sending detach message to inst 2

Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11370.trc (incident=144329):

ORA-15335: ASM metadata corruption detected in disk group 'DATA'

ORA-15130: diskgroup "DATA" is being dismounted

ORA-15066: offlining disk "ASM_DATA12" in group "DATA" may result in a data loss

ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]

Incident details in: /u01/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_144329/+ASM1_ora_11370_i144329.trc

Mon Aug 25 19:48:45 2014

List of instances:

1 2

Dirty detach reconfiguration started (new ddet inc 1, cluster inc 4)

Global Resource Directory partially frozen for dirty detach

* dirty detach - domain 1 invalid = TRUE

Mon Aug 25 19:48:45 2014

ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)

1416 GCS resources traversed, 0 cancelled

ERROR: ORA-15130 thrown in RBAL for group number 1

Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:

ORA-15130: diskgroup "DATA" is being dismounted

Dirty Detach Reconfiguration complete

Mon Aug 25 19:48:46 2014

WARNING: dirty detached from domain 1

NOTE: cache dismounted group 1/0x4AA052EA (DATA)

SQL> alter diskgroup DATA dismount force /* ASM SERVER:1252020970 */

ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)

ERROR: ORA-15130 thrown in RBAL for group number 1

Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:

ORA-15130: diskgroup "DATA" is being dismounted

ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)

ERROR: ORA-15130 thrown in RBAL for group number 1

Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:

ORA-15130: diskgroup "DATA" is being dismounted

Mon Aug 25 19:48:52 2014

Dumping diagnostic data in directory=[cdmp_20140825194852], requested by (instance=1, osid=11370), summary=[incident=144329].

Mon Aug 25 19:48:53 2014

System State dumped to trace file /u01/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_144329/+ASM1_ora_11370_i144329.trc

Mon Aug 25 19:48:53 2014

Sweep [inc][144329]: completed

ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)

ERROR: ORA-15130 thrown in RBAL for group number 1

Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:

ORA-15130: diskgroup "DATA" is being dismounted

Mon Aug 25 19:48:58 2014

ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)

ERROR: ORA-15130 thrown in RBAL for group number 1

Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:

ORA-15130: diskgroup "DATA" is being dismounted

Mon Aug 25 19:48:58 2014

Sweep [inc2][144329]: completed

ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)

ERROR: ORA-15130 thrown in RBAL for group number 1

Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:

ORA-15130: diskgroup "DATA" is being dismounted

Mon Aug 25 19:49:01 2014

NOTE: ASM client PROD_1:PROD disconnected unexpectedly.

NOTE: check client alert log.

NOTE: Trace records dumped in trace file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_17812.trc

NOTE: cache deleting context for group DATA 1/0x4aa052ea

ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)

ERROR: ORA-15130 thrown in RBAL for group number 1

Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:

ORA-15130: diskgroup "" is being dismounted

ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)

ERROR: ORA-15130 thrown in RBAL for group number 1

Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:

ORA-15130: diskgroup "" is being dismounted

Mon Aug 25 19:49:07 2014

NOTE: AMDU dump of disk group DATA created at /u01/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_144329

Mon Aug 25 19:49:10 2014

ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)

ERROR: ORA-15130 thrown in RBAL for group number 1

Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:

ORA-15130: diskgroup "" is being dismounted

ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)

ERROR: ORA-15130 thrown in RBAL for group number 1

Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:

ORA-15130: diskgroup "" is being dismounted

Mon Aug 25 19:49:15 2014

GMON dismounting group 1 at 53 for pid 43, osid 11956

Mon Aug 25 19:49:15 2014

NOTE: Disk ASM_DATA01 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA02 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA03 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA07 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA08 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA09 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA19 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA11 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA12 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA13 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA14 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA15 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA16 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA17 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA18 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA21 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA22 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA23 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA24 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA10 in mode 0x7f marked for de-assignment

NOTE: Disk ASM_DATA20 in mode 0x7f marked for de-assignment

SUCCESS: diskgroup DATA was dismounted

SUCCESS: alter diskgroup DATA dismount force /* ASM SERVER:1252020970 */

SUCCESS: ASM-initiated MANDATORY DISMOUNT of group DATA

Mon Aug 25 19:49:16 2014

NOTE: diskgroup resource ora.DATA.dg is offline

Mon Aug 25 19:51:34 2014

SQL> ALTER DISKGROUP DATA MOUNT /* asm agent *//* {1:51111:41484} */

NOTE: cache registered group DATA number=1 incarn=0x4aa09b74

NOTE: cache began mount (not first) of group DATA number=1 incarn=0x4aa09b74

NOTE: Assigning number (1,0) to disk (ORCL:ASM_DATA01)

NOTE: Assigning number (1,1) to disk (ORCL:ASM_DATA02)

NOTE: Assigning number (1,2) to disk (ORCL:ASM_DATA03)

NOTE: Assigning number (1,3) to disk (ORCL:ASM_DATA07)

NOTE: Assigning number (1,4) to disk (ORCL:ASM_DATA08)

NOTE: Assigning number (1,5) to disk (ORCL:ASM_DATA09)

NOTE: Assigning number (1,19) to disk (ORCL:ASM_DATA10)

NOTE: Assigning number (1,7) to disk (ORCL:ASM_DATA11)

NOTE: Assigning number (1,8) to disk (ORCL:ASM_DATA12)

NOTE: Assigning number (1,9) to disk (ORCL:ASM_DATA13)

NOTE: Assigning number (1,10) to disk (ORCL:ASM_DATA14)

NOTE: Assigning number (1,11) to disk (ORCL:ASM_DATA15)

NOTE: Assigning number (1,12) to disk (ORCL:ASM_DATA16)

NOTE: Assigning number (1,13) to disk (ORCL:ASM_DATA17)

NOTE: Assigning number (1,14) to disk (ORCL:ASM_DATA18)

NOTE: Assigning number (1,6) to disk (ORCL:ASM_DATA19)

NOTE: Assigning number (1,20) to disk (ORCL:ASM_DATA20)

NOTE: Assigning number (1,15) to disk (ORCL:ASM_DATA21)

NOTE: Assigning number (1,16) to disk (ORCL:ASM_DATA22)

NOTE: Assigning number (1,17) to disk (ORCL:ASM_DATA23)

NOTE: Assigning number (1,18) to disk (ORCL:ASM_DATA24)

Mon Aug 25 19:51:34 2014

GMON querying group 1 at 55 for pid 27, osid 8831

NOTE: cache opening disk 0 of grp 1: ASM_DATA01 label:ASM_DATA01

NOTE: F1X0 found on disk 0 au 2 fcn 0.7050972

NOTE: cache opening disk 1 of grp 1: ASM_DATA02 label:ASM_DATA02

NOTE: cache opening disk 2 of grp 1: ASM_DATA03 label:ASM_DATA03

NOTE: cache opening disk 3 of grp 1: ASM_DATA07 label:ASM_DATA07

NOTE: cache opening disk 4 of grp 1: ASM_DATA08 label:ASM_DATA08

NOTE: cache opening disk 5 of grp 1: ASM_DATA09 label:ASM_DATA09

NOTE: cache opening disk 6 of grp 1: ASM_DATA19 label:ASM_DATA19

NOTE: cache opening disk 7 of grp 1: ASM_DATA11 label:ASM_DATA11

NOTE: cache opening disk 8 of grp 1: ASM_DATA12 label:ASM_DATA12

NOTE: cache opening disk 9 of grp 1: ASM_DATA13 label:ASM_DATA13

NOTE: cache opening disk 10 of grp 1: ASM_DATA14 label:ASM_DATA14

NOTE: cache opening disk 11 of grp 1: ASM_DATA15 label:ASM_DATA15

NOTE: cache opening disk 12 of grp 1: ASM_DATA16 label:ASM_DATA16

NOTE: cache opening disk 13 of grp 1: ASM_DATA17 label:ASM_DATA17

NOTE: cache opening disk 14 of grp 1: ASM_DATA18 label:ASM_DATA18

NOTE: cache opening disk 15 of grp 1: ASM_DATA21 label:ASM_DATA21

NOTE: cache opening disk 16 of grp 1: ASM_DATA22 label:ASM_DATA22

NOTE: cache opening disk 17 of grp 1: ASM_DATA23 label:ASM_DATA23

NOTE: cache opening disk 18 of grp 1: ASM_DATA24 label:ASM_DATA24

NOTE: cache opening disk 19 of grp 1: ASM_DATA10 label:ASM_DATA10

NOTE: cache opening disk 20 of grp 1: ASM_DATA20 label:ASM_DATA20

NOTE: cache mounting (not first) external redundancy group 1/0x4AA09B74 (DATA)

Mon Aug 25 19:51:35 2014

kjbdomatt send to inst 2

Mon Aug 25 19:51:35 2014

NOTE: attached to recovery domain 1

NOTE: redo buffer size is 256 blocks (1053184 bytes)

Mon Aug 25 19:51:35 2014

NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATA)

NOTE: LGWR found thread 1 closed at ABA 182.3456

NOTE: LGWR mounted thread 1 for diskgroup 1 (DATA)

NOTE: LGWR opening thread 1 at fcn 0.11665696 ABA 183.3457

NOTE: cache mounting group 1/0x4AA09B74 (DATA) succeeded

NOTE: cache ending mount (success) of group DATA number=1 incarn=0x4aa09b74

Mon Aug 25 19:51:35 2014

NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1

SUCCESS: diskgroup DATA was mounted

SUCCESS: ALTER DISKGROUP DATA MOUNT /* asm agent *//* {1:51111:41484} */

Mon Aug 25 19:51:48 2014

NOTE: client PROD_1:PROD registered, osid 21529, mbr 0x1

Mon Aug 25 19:55:53 2014

NOTE: ASM client PROD_1:PROD disconnected unexpectedly.

NOTE: check client alert log.

NOTE: Trace records dumped in trace file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_21529.trc

Mon Aug 25 19:57:42 2014

NOTE: client PROD_1:PROD registered, osid 2546, mbr 0x1

Unfortunately, such analysis and recommendations cannot be accomplished in the space of communities forum. To try to determine the root cause for a corruption of ASM metadata requires the collection of a large amount of data for analysis. This will need to be addressed in a formal service request to Oracle support to see if we are able to determine the root cause for the corruption of your metadata.

Please open a service request to Oracle support to conduct root cause analysis of an ASM disk header metadata corruption. When you open the service request you will want to have already collected all of the following to be uploaded to the service request for analysis.

Please upload the text version of the ASM alert.log If RAC from all ASM instances in the

cluster. This file should be in the diagnostic destination trace directory for your Grid

Infrastructure installation and should be named alert_+ASM.log or if RAC the +ASM will be

appended with instance number.

2. Assuming that you are on 11.2 or higher version of Oracle software you will want to also collect the following as the root user even if this is a single instance configuration

Please review CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide (Doc ID 330358.1) From

this document collect the diagcollection script output as the root user. This script will

produce a minimum of 3 zip files per Grid Infrastructure home named crsDATA.zip, ocrDATA.zip

and osDATA.zip. There may be other files as well such as coreDATA.zip.

Please compress all of the zip files produced into a single zip file per Grid Infrastructure home.

3. Assuming that your ASM instance is presently up and running collect the following

Please collect and upload the output of scripts 1, 2 and 3 from

How To Gather/Backup ASM Metadata In A Formatted Manner version 10.1, 10.2, 11.1, 11.2 and 12.1? (Doc ID 470211.1)

4. Please use syntax similar to the following to get an AMDU dump of the impacted diskgroup so that we can determine the extent of the metadata corruption.

If you are not on 11.1 or higher refer to the next document to download the needed executable to run the command:

Placeholder for AMDU binaries and using with ASM 10g (Doc ID 553639.1)

Once you have the executable run the next command Note the directory created by the command and zip up the contents of the directory for upload to Oracle via the service request.

NOTE: This command is specific to your environment and affected diskgroup so will not work on other systems but can be modified to work by changing diskstring entry to the diskstring being used and changing the diskgroup name at the end of the line to your diskgroup name

$ amdu -diskstring '/dev/oracleasm/disks/*' -dump 'DATA'

We will also need the output of the next command. Note that this too is specific to your particular situation. Do not be alarmed, the command provided only takes a full binary copy of the first 50 MB of the disk and creates a file to be uploaded for our analysis.

dd if=/dev/oracleasm/disks/ASM_DATA12 of=/tmp/DATA12.dd bs=1048576 count=50

Please collect the information requested in the next document (I created it) and provide it:

Collecting The Required Information For Support To Validate & Troubleshooting ASM Diskgroup Corruptions. (Doc ID 1675152.1)

When the ASM corruption occurs, you need to collection AMDU and disk's first 50mb backup to find the root cause of the ASM disk corruption.

Below are the possible issues of ASM corruption

• Disks formatted at the OS level while it was used by ASM

• Disks assigned to a file system while used by ASM

• IO errors (stale writes)

• Usage of 3rd party software

But to check what has happened in the disk, we need AMDU and disk backup. without this information it is not possible to find the cause of the issue.

Once you collect these information, you can recreate the diskgroup and restore the database from backup.

In your current scenario, You didn't recreate the diskgroup, You restore and recovers the database from backup.

Now you need to run "alter diskgroup <dg_name> check all norepair to check the disk for corruption. Unless the ASM is not touching that corrupted block, it will not throw ORA_15196 error again. but if disk corruption exist, when you run check all norepair, ASM will crash when it touches that corrupted block.

You are here

Oracle Database Block Corruption in ASM

Oracle Database Block Corruption in ASM