You are here

You are here:

PDSERVICE's blog

Oracle - REQUEST TO FIX ASM METADATA CORRUPT BLOCK

Oracle - REQUEST TO FIX ASM METADATA CORRUPT BLOCK

Posted by PDSERVICE on Nov 09, 2016 In

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: [email protected]

DETAILED PROBLEM DESCRIPTION

============================

ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [43],

[8486], [44], [8486] when trying to mount the diskgroup.

@ EXPLAIN WHY YOU BELIEVE THIS TO BE A PRODUCT DEFECT

@ ===================================================

@ AMDU shows one corrupt block. Requesting to fix asm metadata corrupt block.

@ ---------------------------- SCANNING DISK N0001

@ -----------------------------

@ Disk N0001: '/dev/oracleasm/disks/DISK1'

@ AMDU-00209: Corrupt block found: Disk N0001 AU [33] block [91] type [0]

@ AMDU-00201: Disk N0001: '/dev/oracleasm/disks/DISK1'

@ Allocated AU's: 93584

@ Free AU's: 954991

@ AU's read for dump: 165

@ Block images saved: 30383

@ Map lines written: 165

@ Heartbeats seen: 0

@ Corrupt metadata blocks: 1

@ Corrupt AT blocks: 0

DIAGNOSTIC ANALYSIS

===================

AMDU, dd , alert.logs, trace files are uploaded.

@ Due to flood all system was shutdown from main switch and storage cache was

@ dirty. To online the disks dirty cache on the storage was cleared. Customer

@ does not have os backups so can not restore. The only option is to mount the

@ diskgroup. ASM instance crashes with the error ORA-00600: internal error

@ code, arguments: [kfrValAcd30], [DATA], [1], [43], [8486], [44], [8486].

@ Customer cannot run anything on it now. AMDU shows only one corrupt block.

@ kfed output shows valid asm header.

Fri Dec 28 07:19:06 2012

NOTE: attached to recovery domain 1

NOTE: starting recovery of thread=1 ckpt=43.8486 group=1 (DATA)

NOTE: starting recovery of thread=2 ckpt=47.24 group=1 (DATA)

Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11865.trc

(incident=32228):

ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [43],

[8486], [44], [8486], [], [], [], [], []

Incident details in:

/u01/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_32228/+ASM1_ora_11865_i322

28.trc

Abort recovery for domain 1

NOTE: crash recovery signalled OER-600

ERROR: ORA-600 signalled during mount of diskgroup DATA

ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [43],

[8486], [44], [8486], [], [], [], [], []

ERROR: ALTER DISKGROUP ALL MOUNT /* asm agent */

.

.

.

// From the OS log, there are some multipathd errors before ASM start.

.

.

Dec 28 06:32:34 srv-ural multipathd: asm!.asm_ctl_vbg0: add path (uevent)

Dec 28 06:32:34 srv-ural multipathd: asm!.asm_ctl_vbg0: failed to store path

info

Dec 28 06:32:34 srv-ural multipathd: uevent trigger error

Dec 28 06:32:34 srv-ural multipathd: asm!.asm_ctl_vbg1: add path (uevent)

Dec 28 06:32:34 srv-ural multipathd: asm!.asm_ctl_vbg1: failed to store path

info

Dec 28 06:32:35 srv-ural multipathd: uevent trigger error

Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg2: add path (uevent)

Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg2: failed to store path

info

Dec 28 06:32:35 srv-ural multipathd: uevent trigger error

Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg3: add path (uevent)

Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg3: failed to store path

info

Dec 28 06:32:35 srv-ural multipathd: uevent trigger error

Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg4: add path (uevent)

Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg4: failed to store path

info

Dec 28 06:32:35 srv-ural multipathd: uevent trigger error

Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg5: add path (uevent)

Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg5: failed to store path

info

Dec 28 06:32:35 srv-ural multipathd: uevent trigger error

Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg6: add path (uevent)

Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg6: failed to store path

info

Dec 28 06:32:35 srv-ural multipathd: uevent trigger error

Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg7: add path (uevent)

Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg7: failed to store path

info

Dec 28 06:32:35 srv-ural multipathd: uevent trigger error

Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg8: add path (uevent)

Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg8: failed to store path

info

Dec 28 06:32:35 srv-ural multipathd: uevent trigger error

Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg9: add path (uevent)

Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg9: failed to store path

info

node 2 - message

.

Dec 24 21:02:23 srv-aral CLSD: The clock on host srv-aral has been updated by

the Cluster Time Synchronization Service to be synchronous with the mean

cluster time.

Dec 24 21:51:35 srv-aral CLSD: The clock on host srv-aral has been updated by

the Cluster Time Synchronization Service to be synchronous with the mean

cluster time.

Dec 24 22:21:41 srv-aral kernel: qla2xxx 0000:06:00.0: LOOP DOWN detected (2

6 0).

Dec 24 22:21:42 srv-aral kernel: qla2xxx 0000:06:00.1: LOOP DOWN detected (2

6 0).

Dec 27 18:56:08 srv-aral syslogd 1.4.1: restart.

Dec 27 18:56:08 srv-aral kernel: klogd 1.4.1, log source = /proc/kmsg

started.

*** RVARDHIN 12/31/12 01:57 am ***

requested ct to set the below values

.

From

.

blacklist {

# wwid 26353900f02796769

devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"

devnode "^hd[a-z][[0-9]*]"

# devnode "^cciss!c[0-9]d[0-9]*"

# device {

# vendor "HP"

# product "OPEN-*"

# }

}

.

.

To,

.

blacklist {

# wwid 26353900f02796769

devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"

devnode "^hd[a-z][[0-9]*]"

# devnode "^cciss!c[0-9]d[0-9]*"

# device {

# vendor "HP"

# product "OPEN-*"

# wwid "*"

devnode "^asm/*" # Here <<<.

devnode "ofsctl" # Here <<<

.

# }

}

ct changed the multipatch.conf file. issue still exist

.

/u01/app/11.2.0/grid/bin/crsctl start cluster -all

.

CRS-2672: Attempting to start 'ora.asm' on 'srv-aral'

CRS-2672: Attempting to start 'ora.cssd' on 'srv-ural'

CRS-2672: Attempting to start 'ora.diskmon' on 'srv-ural'

CRS-2676: Start of 'ora.diskmon' on 'srv-ural' succeeded

ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [43],

[8486], [44], [8486], [], [], [], [], []

CRS-2674: Start of 'ora.asm' on 'srv-aral' failed

CRS-2679: Attempting to clean 'ora.asm' on 'srv-aral'

CRS-2681: Clean of 'ora.asm' on 'srv-aral' succeeded

CRS-2676: Start of 'ora.cssd' on 'srv-ural' succeeded

CRS-2672: Attempting to start 'ora.ctssd' on 'srv-ural'

CRS-2676: Start of 'ora.ctssd' on 'srv-ural' succeeded

CRS-2672: Attempting to start 'ora.asm' on 'srv-ural'

ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [43],

[8486], [44], [8486], [], [], [], [], []

CRS-2674: Start of 'ora.asm' on 'srv-ural' failed

CRS-2679: Attempting to clean 'ora.asm' on 'srv-ural'

CRS-2681: Clean of 'ora.asm' on 'srv-ural' succeeded

CRS-2673: Attempting to stop 'ora.ctssd' on 'srv-ural'

CRS-2677: Stop of 'ora.ctssd' on 'srv-ural' succeeded

CRS-2673: Attempting to stop 'ora.diskmon' on 'srv-ural'

CRS-2677: Stop of 'ora.diskmon' on 'srv-ural' succeeded

CRS-2673: Attempting to stop 'ora.cssd' on 'srv-ural'

CRS-2677: Stop of 'ora.cssd' on 'srv-ural' succeeded

- AMDU report shows only one Block as corrupted

158 ---------------------------- SCANNING DISK N0001

-----------------------------

159 Disk N0001: '/dev/oracleasm/disks/DISK1'

160 AMDU-00209: Corrupt block found: Disk N0001 AU [33] block [91] type [0]

161 AMDU-00201: Disk N0001: '/dev/oracleasm/disks/DISK1'

162 Allocated AU's: 93584

163 Free AU's: 954991

164 AU's read for dump: 165

165 Block images saved: 30383

166 Map lines written: 165

167 Heartbeats seen: 0

168 Corrupt metadata blocks: 1

169 Corrupt AT blocks: 0

- ORA 600 kfrValAcd30 is hit since ckpt ABA and current block ABA is not

matching as expected.

ASM1 alert log shows:

16747 ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1],

[43], [8486], [44], [8486], [], [], [], [], []

- It doesn't look like the ABA is wrong here. It is having expected value as

per ASM alert logs. (Still have to see why the assert is triggered)

ASM1 alert log shows:

16546 Thu Dec 20 18:14:17 2012

16547 NOTE: attached to recovery domain 1

16548 NOTE: starting recovery of thread=1 ckpt=43.8486 group=1 (DATA) <===

ABA value reported in ORA 600

16549 NOTE: advancing ckpt for thread=1 ckpt=43.8486

16550 NOTE: cache recovered group 1 to fcn 0.814009

16551 Thu Dec 20 18:14:17 2012

16552 NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATA)

16553 NOTE: LGWR found thread 1 closed at ABA 43.8485

16554 NOTE: LGWR mounted thread 1 for diskgroup 1 (DATA)

16555 NOTE: LGWR opening thread 1 at fcn 0.814009 ABA 44.8486 <=== ABA value

reported in ORA 600

16556 NOTE: cache mounting group 1/0x0C886DF4 (DATA) succeeded

16557 NOTE: cache ending mount (success) of group DATA number=1

incarn=0x0c886df4

- Though ABA is right as per alert logs, ASM throws assert. We need to check

why so.

.

SQL> alter diskgroup all mount;

alter diskgroup all mount

*

ERROR at line 1:

ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [43],

[8486], [44], [8486], [], [], [], [], []

performed below activity by customer, getting below errors

.

-bash-3.2$ sqlplus / as sysasm

.

SQL*Plus: Release 11.2.0.1.0 Production on Wed Jan 2 13:38:10 2013

.

Copyright (c) 1982, 2009, Oracle. All rights reserved.

.

.

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production

SQL> alter diskgroup all mount;

alter diskgroup all mount

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15130: diskgroup "DATA" is being dismounted

ORA-15066: offlining disk "DATA_0000" may result in a data loss

.

.

SQL>

1: Please explain what do you mean by remap here.

2: Lost write means the data is valid but old. ASM uses set of methods (like

magic, checksum, expected values of fields, etc) to validate whether it is

.

corrupted or not. With lost write, it is valid data but stale data. ASM

need not identify it as corrupted.

regarding remap please see:

11gR1 ASM New Feature (Doc ID 429098.1)

remap - You can remap and recover bad blocks on an ASM disk in normal or high

redundancy that have been reported by storage management tools such as disk

scrubbers. ASM reads from the good copy of an ASM mirror and rewrites these

blocks to an alternate location on disk.

.

What is the redundancy on this disk group?

.

Customer requirement outlined here is to mount the disk group and then

recover what data can be recovered at this time. Prabakaran is investigating

if a diagnostic patch can be made to mount the disk group without hitting the

assert.

.

points to note here:

The customer needs to review their backup policy

Looks like ct had external,I have conveyed the same to ct

.

[gridstgrac1 ~]$ kfed read disk1_dd.out | more

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

|

|

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL>>>>

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname: DATA_0000 ; 0x028: length=9

kfdhdb.grpname: DATA ; 0x048: length=4

kfdhdb.fgname: DATA_0000 ; 0x068: length=9

.

- As per ckpt record, ABA is 43.8486

16 kfbh.fcn.base: 0 ; 0x010: 0x00000000

17 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

29 kfracdc.ckpt.seq: 43 ; 0x018: 0x0000002b

30 kfracdc.ckpt.blk: 8486 ; 0x01c: 0x00002126

- While mounting a DG, ASM hit below assert:

805 Errors in file

/ade/b/1029623278/oracle/log/diag/asm/+asm/two0one1asm/trace/two0one1asm_ora_1

1194.trc (incident=4939):

806 ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA],

[1], [43], [8486], [44], [8486], [], [], [], [], []

7771 KSEORAASSERTNM6(FALSE,

7772 OERINM("kfrValAcd30"),

7773 kseidn(*thdCtx->grpName_kfrthdCtx),

7774 ksenrg(thdCtx->thread_kfrthdCtx),

7775

ksenrg(thdCtx->currAba_kfrthdCtx.seq_kfraba),

7776

ksenrg(thdCtx->currAba_kfrthdCtx.blk_kfraba),

7777 ksenrg(acdb->aba_kfracdbamd_kfraba),

7778 ksenrg(acdb->aba_kfracdb.blk_kfraba));

- In the previous mount, DG ckpt was 43.8486 which matches with thread

Context

16546 Thu Dec 20 18:14:17 2012

16547 NOTE: attached to recovery domain 1

16548 NOTE: starting recovery of thread=1 ckpt=43.8486 group=1 (DATA)

16549 NOTE: advancing ckpt for thread=1 ckpt=43.8486

16550 NOTE: cache recovered group 1 to fcn 0.814009

.

- Following the recovery, LGWR opened the thread with ABA 44.8486. Followin

it, ASM2 was crashed. Thread 2 recovery for DATA is performed.

16551 Thu Dec 20 18:14:17 2012

16552 NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATA)

16553 NOTE: LGWR found thread 1 closed at ABA 43.8485

16554 NOTE: LGWR mounted thread 1 for diskgroup 1 (DATA)

16555 NOTE: LGWR opening thread 1 at fcn 0.814009 ABA 44.8486

16556 NOTE: cache mounting group 1/0x0C886DF4 (DATA) succeeded

16557 NOTE: cache ending mount (success) of group DATA number=1

incarn=0x0c886df4

16574 Thu Dec 20 18:17:24 2012

16575 Reconfiguration started (old inc 4, new inc 6)

16576 List of instances:

16577 1 (myinst: 1)

16574 Thu Dec 20 18:17:24 2012

16575 Reconfiguration started (old inc 4, new inc 6)

16598 NOTE: SMON starting instance recovery for group DATA domain 1

(mounted)

16599 NOTE: F1X0 found on disk 0 au 2 fcn 0.0

16600 NOTE: starting recovery of thread=2 ckpt=45.10601 group=1 (DATA)

16601 NOTE: SMON waiting for thread 2 recovery enqueue

16602 NOTE: SMON about to begin recovery lock claims for diskgroup 1 (DATA)

16603 NOTE: advancing ckpt for thread=2 ckpt=45.10601 <=== recovery is

done

16604 NOTE: SMON did instance recovery for group DATA domain 1

- ASM alert log does not show any activity for next 8 days. In the next mount

(on Dec 28), kfrValAcd30 was triggered.

16621 Reconfiguration complete

16622 Fri Dec 28 07:18:51 2012

16623 NOTE: No asm libraries found in the system

16624 Starting ORACLE instance (normal)

16716 Fri Dec 28 07:18:59 2012

16717 SQL> ALTER DISKGROUP ALL MOUNT /* asm agent */

16718 NOTE: Diskgroup used for Voting files is:

16719 DATA

16740 Fri Dec 28 07:19:06 2012

16741 * allocate domain 1, invalid = TRUE

16742 Fri Dec 28 07:19:06 2012

16743 NOTE: attached to recovery domain 1

16744 NOTE: starting recovery of thread=1 ckpt=43.8486 group=1 (DATA)

16745 NOTE: starting recovery of thread=2 ckpt=47.24 group=1 (DATA)

16746 Errors in file

/u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11865.trc

(incident=32228):

16747 ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA],

[1], [43], [8486], [44], [8486], [], [], [], [], []

16748 Incident details in:

/u01/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_32228/+ASM1_ora_11865_i322

28.trc

.

- Excerpt from F3 dump is given below

There is no recovery with ABA of

43.8486 where ckpt is made

10 kfbh.hard: 130 ; 0x001: 0x82

11 kfbh.type: 7 ; 0x002: KFBTYP_ACDC

12 kfbh.datfmt: 1 ; 0x003: 0x01

29 kfracdc.ckpt.seq: 43 ; 0x018: 0x0000002b

30 kfracdc.ckpt.blk: 8486 ; 0x01c: 0x00002126

.

3542863 kfbh.block.blk: 8486 ; 0x004: T=0 NUMB=0x2126

3542864 kfbh.block.obj: 3 ; 0x008: TYPE=0x0 NUMB=0x3

3542870 kfracdb.aba.seq: 43 ; 0x000: 0x0000002b

3542871 kfracdb.aba.blk: 8485 ; 0x004: 0x00002125

.

3542942 kfbh.block.blk: 8487 ; 0x004: T=0 NUMB=0x2127

3542943 kfbh.block.obj: 3 ; 0x008: TYPE=0x0 NUMB=0x3

3542949 kfracdb.aba.seq: 44 ; 0x000: 0x0000002c

3542950 kfracdb.aba.blk: 8486 ; 0x004: 0x00002126

.

.