Parnassusdata Software Database Recovery Team
Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com
——–
CRS 10.2.0.3
ASM 10.2.0.3
RDBMS 10.2.0.2
I have a customer has 2 nodes cluster and ASM doesn’t mount the diskgroup
Alert.log ASM1
————–
Thu Feb 7 21:52:31 2008
NOTE: starting recovery of thread=1 ckpt=56.6552 group=1
ORA-15196: invalid ASM block header [kfr.c:6086] [endian_kfbh] [3] [6558] [0
!= 1]
NOTE: cache initiating offline of disk 0 group 1
WARNING: offlining disk 0.3916356260 (DATA_0000) with mask 0x3
NOTE: PST update: grp = 1, dsk = 0, mode = 0x6
…
Thu Feb 7 21:52:31 2008
@Errors in file /opt/oracle/admin/+ASM/bdump/+asm1_b000_17753.trc:
ORA-15001: diskgroup “DATA” does not exist or is not mounted
Alert.log ASM2
————–
Thu Feb 7 21:52:31 2008
@Errors in file /opt/oracle/admin/+ASM/udump/+asm2_ora_4385.trc:
ORA-17503: ksfdopn:DGOpenFile05 Failed to open file
+DATA/dwdev/parameterfile/spfiledwdev.ora
ORA-17503: ksfdopn:2 Failed to open file
+DATA/dwdev/parameterfile/spfiledwdev.ora
ORA-15001: diskgroup “DATA” does not exist or is not mounted
Patch 5554692 was applied to prevent problem in the future
.
Ct doesn’t have a valid backup
.
DISK Header
———–
dd if=/dev/sdc1 bs=8192 count=1 | od -c
0000000 001 202 001 001 \0 \0 \0 \0 \0 \0 \0 200 367 354 327 \
0000020 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000040 O R C L D I S K D A T A \0 \0 \0 \0
0000060 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000100 \0 \0 020 \n \0 \0 001 003 D A T A _ 0 0 0
0000120 0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000140 \0 \0 \0 \0 \0 \0 \0 \0 D A T A \0 \0 \0 \0
0000160 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000200 \0 \0 \0 \0 \0 \0 \0 \0 D A T A _ 0 0 0
0000220 0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000240 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
.
.
dd if=/dev/sdi1 bs=8192 count=1 | od -c
1+0 records in
1+0 records out
0000000 001 202 001 001 \0 \0 \0 \0 \0 \0 \0 200 367 354 327 \
0000020 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000040 O R C L D I S K D A T A \0 \0 \0 \0
0000060 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000100 \0 \0 020 \n \0 \0 001 003 D A T A _ 0 0 0
0000120 0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000140 \0 \0 \0 \0 \0 \0 \0 \0 D A T A \0 \0 \0 \0
0000160 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000200 \0 \0 \0 \0 \0 \0 \0 \0 D A T A _ 0 0 0
0000220 0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \
“alert_ASM1.log-new2.txt”:
.
last time DATA was mounted successfully:
.
Thu Feb 7 19:15:34 2008
NOTE: cache mounting group 2/0x554FBB0C (DATA) succeeded
SUCCESS: diskgroup DATA was mounted
.
…
.
Reconfiguration complete
Thu Feb 7 19:44:23 2008
Starting background process ASMB
ASMB started with pid=24, OS id=2278
Thu Feb 7 19:44:41 2008
WARNING: cache failed to read fn=453 indblk=0 from disk(s): 0
ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [453]
[2147483648] [0 != 1]
NOTE: a corrupted block was dumped to the trace file
System State dumped to trace file
/opt/oracle/admin/+ASM/udump/+asm1_ora_4636.trc
NOTE: cache initiating offline of disk 0 group 2
WARNING: offlining disk 0.3916384773 (DATA_0000) with mask 0x3
NOTE: PST update: grp = 2, dsk = 0, mode = 0x6
Thu Feb 7 19:44:42 2008
ERROR: too many offline disks in PST (grp 2)
.
.
alert_ASM2.log:
.
Thu Feb 7 19:50:04 2008
NOTE: cache mounting group 2/0x457D4099 (DATA) succeeded
SUCCESS: diskgroup DATA was mounted
.
.
econfiguration complete
Thu Feb 7 20:17:04 2008
Starting background process ASMB
ASMB started with pid=21, OS id=13439
Thu Feb 7 20:17:15 2008
WARNING: cache failed to read fn=453 indblk=0 from disk(s): 0
ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [453]
[2147483648] [0 != 1]
NOTE: a corrupted block was dumped to the trace file
System State dumped to trace file
/opt/oracle/admin/+ASM/udump/+asm2_ora_14194.trc
NOTE: cache initiating offline of disk 0 group 2
WARNING: offlining disk 0.3937251428 (DATA_0000) with mask 0x3
NOTE: PST update: grp = 2, dsk = 0, mode = 0x6
Thu Feb 7 20:17:18 2008
ERROR: too many offline disks in PST (grp 2)
Thu Feb 7 20:17:18 2008
NOTE: halting all I/Os to diskgroup DATA
NOTE: active pin found: 0x0x659fe548
Thu Feb 7 20:17:18 2008
ERROR: PST-initiated MANDATORY DISMOUNT of group DATA
.
.
1). Customer is using external redundancy. Since it’s external redundancy, ct
should check with their storage vendor to see if they can restore it. From
oracle perspetive, because we don’t store copy of data with exteranl
redundancy, the only way is to do diskgroup restore.
2). Please upload bdump/udump of all ASM instances
3). please provide the kfed output of disk 0 (DATA_0000)
4). please provide the first 80 M of DATA_0000
dd if=<target disk> of=<file> bs=4096 count=20480
.
[oracle@drac1 bin]$ kfed read /dev/databases |more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check: 1557654775 ; 0x00c: 0x5cd7ecf7
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISKDATA ; 0x000: length=12
kfdhdb.driver.reserved[0]: 1096040772 ; 0x008: 0x41544144
kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000
kfdhdb.compat: 168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum: 0 ; 0x024: 0x0000
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DATA_0000 ; 0x028: length=9
kfdhdb.grpname: DATA ; 0x048: length=4
kfdhdb.fgname: DATA_0000 ; 0x068: length=9
kfdhdb.capname: ; 0x088: length=0
kfdhdb.crestmp.hi: 32900398 ; 0x0a8: HOUR=0xe DAYS=0x9 MNTH=0x1
YEAR=0x7d8
kfdhdb.crestmp.lo: 224448512 ; 0x0ac: USEC=0x0 MSEC=0x34 SECS=0x16
MINS=0x3
kfdhdb.mntstmp.hi: 32901363 ; 0x0b0: HOUR=0x13 DAYS=0x7 MNTH=0x2
YEAR=0x7d8
kfdhdb.mntstmp.lo: 3359888384 ; 0x0b4: USEC=0x0 MSEC=0xf5 SECS=0x4
MINS=0x32
kfdhdb.secsize: 512 ; 0x0b8: 0x0200
kfdhdb.blksize: 4096 ; 0x0ba: 0x1000
kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact: 113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize: 5722043 ; 0x0c4: 0x00574fbb
kfdhdb.pmcnt: 52 ; 0x0c8: 0x00000034
kfdhdb.fstlocn: 1 ; 0x0cc: 0x00000001
kfdhdb.altlocn: 2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002
kfdhdb.redomirrors[0]: 0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]: 0 ; 0x0da: 0x0000
kfdhdb.redomirrors[2]: 0 ; 0x0dc: 0x0000
kfdhdb.redomirrors[3]: 0 ; 0x0de: 0x0000
kfdhdb.dbcompat: 168820736 ; 0x0e0: 0x0a100000
kfdhdb.grpstmp.hi: 32900398 ; 0x0e4: HOUR=0xe DAYS=0x9 MNTH=0x1
YEAR=0x7d8
kfdhdb.grpstmp.lo: 223643648 ; 0x0e8: USEC=0x0 MSEC=0x122
SECS=0x15 MINS=0x3
kfdhdb.ub4spare[0]: 0 ; 0x0ec: 0x00000000
kfdhdb.ub4spare[1]: 0 ; 0x0f0: 0x00000000
kfdhdb.ub4spare[2]: 0 ; 0x0f4: 0x00000000
kfdhdb.ub4spare[3]: 0 ; 0x0f8: 0x00000000
kfdhdb.ub4spare[4]: 0 ; 0x0fc: 0x00000000
kfdhdb.ub4spare[5]: 0 ; 0x100: 0x00000000
kfdhdb.ub4spare[6]: 0 ; 0x104: 0x00000000
kfdhdb.ub4spare[7]: 0 ; 0x108: 0x00000000
kfdhdb.ub4spare[8]: 0 ; 0x10c: 0x00000000
kfdhdb.ub4spare[9]: 0 ; 0x110: 0x00000000
.
.
[oracle@drac1 bin]$ kfed read /dev/raw/raw1
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 4290772992 ; 0x004: T=1 NUMB=0x7fc00000
kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 44045 ; 0x010: 0x0000ac0d
kfbh.fcn.wrap: 4096 ; 0x014: 0x00001000
kfbh.spare1: 51147 ; 0x018: 0x0000c7cb
kfbh.spare2: 2054913149 ; 0x01c: 0x7a7b7c7d
[oracle@drac1 bin]$ kfed read /dev/raw/raw2
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 34 ; 0x001: 0x22
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 4290772992 ; 0x004: T=1 NUMB=0x7fc00000
kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 59301 ; 0x010: 0x0000e7a5
kfbh.fcn.wrap: 512 ; 0x014: 0x00000200
kfbh.spare1: 409189 ; 0x018: 0x00063e65
kfbh.spare2: 2054913149 ; 0x01c: 0x7a7b7c7d
alert_+ASM1.log:
.
Thu Feb 7 19:44:23 2008
Starting background process ASMB
ASMB started with pid=24, OS id=2278
Thu Feb 7 19:44:41 2008
WARNING: cache failed to read fn=453 indblk=0 from disk(s): 0
ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [453]
[2147483648] [0 != 1]
NOTE: a corrupted block was dumped to the trace file
System State dumped to trace file
/opt/oracle/admin/+ASM/udump/+asm1_ora_4636.trc
NOTE: cache initiating offline of disk 0 group 2
WARNING: offlining disk 0.3916384773 (DATA_0000) with mask 0x3
NOTE: PST update: grp = 2, dsk = 0, mode = 0x6
Thu Feb 7 19:44:42 2008
ERROR: too many offline disks in PST (grp 2)
.
asm1_ora_4636.trc:
.
*** SERVICE NAME:() 2008-02-07 19:44:41.691
*** SESSION ID:(100.102) 2008-02-07 19:44:41.690
OSM metadata block dump:
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 162 ; 0x001: 0xa2
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 114084224 ; 0x004: T=0 NUMB=0x6ccc980
kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check: 83951616 ; 0x00c: 0x05010000
kfbh.fcn.base: 26700 ; 0x010: 0x0000684c
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
—– Abridged Call Stack Trace —–
kfcReadBlk()+1181
kfcReadBuffer()+917
kfcGet0()+14974
kfcGet1Priv()+161
kfcGetPriv()+178
kffSendMap()+2050
kffIdentify()+1552
kfnsFileIdentify()+595
kfnDispatch()+86
opiodr()+984
ttcpip()+1235
opitsk()+1298
Cannot find symbol
Cannot find symbol
opiino()+1028
opiodr()+984
opidrv()+547
sou2o()+114
opimai_real()+163
main()+116
Cannot find symbol
<0x3b49e1c3fb>
It seems that we have two block corruptions since the last time diskgroup was
mounted.
.
[32BIT oracle@orcl bdump]$ grep -n ‘15196’ alert_+ASM1.log | sort |
uniq
ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [453]
[2147483648] [0 != 1]
ORA-15196: invalid ASM block header [kfr.c:6086] [endian_kfbh] [3] [6558] [0
!= 1]
.
Usually this type of corruption would require restore of diskgroup if the
redundancy is external
You may want to check if DUL can help in this case.