Email: service@parnassusdata.com 7 x 24 online support!
ora-00600 [kfcema02] cause the diskgroup can not bring up.
If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.
Encountered a disk storage destroyed yesterday.
After recovery get this disk error.
Found rac can not bring up due to the diskgroup can not mount.
SQL> ALTER DISKGROUP ALL MOUNT
...
ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []
-
alert__ASM3.log
===================
SQL> ALTER DISKGROUP ALL MOUNT
NOTE: cache registered group ASMREDO1 number=1 incarn=0x0a4827c2
NOTE: cache began mount (not first) of group ASMREDO1 number=1 incarn=0x0a4827c2
NOTE: cache registered group DG_ORA number=2 incarn=0x873827c3
NOTE: cache began mount (first) of group DG_ORA number=2 incarn=0x873827c3
WARNING::ASMLIB library not found. See trace file for details.
NOTE: Assigning number (1,0) to disk (/dev/raw/raw10)
NOTE: Assigning number (2,6) to disk (/dev/raw/raw9)
NOTE: Assigning number (2,5) to disk (/dev/raw/raw8)
NOTE: Assigning number (2,4) to disk (/dev/raw/raw7)
NOTE: Assigning number (2,3) to disk (/dev/raw/raw6)
NOTE: Assigning number (2,2) to disk (/dev/raw/raw5)
NOTE: Assigning number (2,1) to disk (/dev/raw/raw4)
NOTE: Assigning number (2,0) to disk (/dev/raw/raw3)
kfdp_query(ASMREDO1): 3
kfdp_queryBg(): 3
NOTE: cache opening disk 0 of grp 1: ASMREDO1_0000 path:/dev/raw/raw10
NOTE: F1X0 found on disk 0 fcn 0.0
NOTE: cache mounting (not first) group 1/0x0A4827C2 (ASMREDO1)
kjbdomatt send to node 0
NOTE: attached to recovery domain 1
NOTE: LGWR attempting to mount thread 2 for diskgroup 1
NOTE: LGWR mounted thread 2 for disk group 1
NOTE: opening chunk 2 at fcn 0.146305 ABA
NOTE: seq=6 blk=5782
NOTE: cache mounting group 1/0x0A4827C2 (ASMREDO1) succeeded
NOTE: cache ending mount (success) of group ASMREDO1 number=1 incarn=0x0a4827c2
NOTE: start heartbeating (grp 2)
kfdp_query(DG_ORA): 5
kfdp_queryBg(): 5
NOTE: cache opening disk 0 of grp 2: DG_ORA_0000 path:/dev/raw/raw3
NOTE: F1X0 found on disk 0 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DG_ORA_0001 path:/dev/raw/raw4
NOTE: cache opening disk 2 of grp 2: DG_ORA_0002 path:/dev/raw/raw5
NOTE: cache opening disk 3 of grp 2: DG_ORA_0003 path:/dev/raw/raw6
NOTE: cache opening disk 4 of grp 2: DG_ORA_0004 path:/dev/raw/raw7
NOTE: cache opening disk 5 of grp 2: DG_ORA_0005 path:/dev/raw/raw8
NOTE: cache opening disk 6 of grp 2: DG_ORA_0006 path:/dev/raw/raw9
NOTE: cache mounting (first) group 2/0x873827C3 (DG_ORA)
* allocate domain 2, invalid = TRUE
kjbdomatt send to node 0
NOTE: attached to recovery domain 2
NOTE: starting recovery of thread=1 ckpt=348.1542 group=2
NOTE: starting recovery of thread=2 ckpt=189.5027 group=2
NOTE: starting recovery of thread=3 ckpt=182.5380 group=2
Errors in file /opt/oracle/db/diag/asm/+asm/+ASM3/trace/+ASM3_ora_13438.trc (incident=5754):
ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc
Trace dumping is performing id=[cdmp_20120917220327]
Abort recovery for domain 2
NOTE: crash recovery signalled OER-600
ERROR: ORA-600 signalled during mount of diskgroup DG_ORA
ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []
ERROR: ALTER DISKGROUP ALL MOUNT
NOTE: cache dismounting group 2/0x873827C3 (DG_ORA)
NOTE: lgwr not being msg'd to dismount
kjbdomdet send to node 0
detach from dom 2, sending detach message to node 0
Please provide the following:
-- AMDU output
Placeholder for AMDU binaries and using with ASM 10g (Doc ID 553639.1)
-- Kfed read output of all the disks that are part of the diskgroup you are unable to mount.
-Let us use the kfed to read the device
Building and using the kfed utility
------------------------------------------------
* For releases 10.2.0.X and up execute:
1) Change to the rdbms/lib directory:
% cd $ORACLE_HOME/rdbms/lib
2) Generate the executable:
10.2.0.XX:
% make -f ins_rdbms.mk ikfed
Using kfed:
Reading a file:
kfed read
example:
% kfed read /dev/rdsk/emcpower10a
-Please run the kfed read on the disks and provide me with the output
/dev/raw/raw1: bound to major 8, minor 16
/dev/raw/raw2: bound to major 8, minor 32
/dev/raw/raw3: bound to major 8, minor 48
/dev/raw/raw4: bound to major 8, minor 64
/dev/raw/raw5: bound to major 8, minor 80
/dev/raw/raw6: bound to major 8, minor 96
/dev/raw/raw7: bound to major 8, minor 112
/dev/raw/raw8: bound to major 8, minor 128
/dev/raw/raw9: bound to major 8, minor 144
/dev/raw/raw10: bound to major 8, minor 160
<<< from the above disks -- do they all belong to the diskgroup?
kfed read /dev/raw/raw4
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 2147483649 ; 0x008: TYPE=0x8 NUMB=0x1
kfbh.check: 2061250939 ; 0x00c: 0x7adc317b
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000
kfdhdb.compat: 168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum: 1 ; 0x024: 0x0001
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DG_ORA_0001 ; 0x028: length=11
kfdhdb.grpname: DG_ORA ; 0x048: length=6
kfdhdb.fgname: DG_ORA_0001 ; 0x068: length=11
kfdhdb.capname: ; 0x088: length=0
kfed read /dev/raw/raw5
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 2147483650 ; 0x008: TYPE=0x8 NUMB=0x2
kfbh.check: 2061327740 ; 0x00c: 0x7add5d7c
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000
kfdhdb.compat: 168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum: 2 ; 0x024: 0x0002
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DG_ORA_0002 ; 0x028: length=11
kfdhdb.grpname: DG_ORA ; 0x048: length=6
kfdhdb.fgname: DG_ORA_0002 ; 0x068: length=11
kfdhdb.capname: ; 0x088: length=0
kfed read /dev/raw/raw6
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 2147483651 ; 0x008: TYPE=0x8 NUMB=0x3
kfbh.check: 2061320572 ; 0x00c: 0x7add417c
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000
kfdhdb.compat: 168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum: 3 ; 0x024: 0x0003
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DG_ORA_0003 ; 0x028: length=11
kfdhdb.grpname: DG_ORA ; 0x048: length=6
kfdhdb.fgname: DG_ORA_0003 ; 0x068: length=11
kfdhdb.capname: ; 0x088: length=0
kfed read /dev/raw/raw7
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 2147483652 ; 0x008: TYPE=0x8 NUMB=0x4
kfbh.check: 2061327740 ; 0x00c: 0x7add5d7c
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000
kfdhdb.compat: 168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum: 4 ; 0x024: 0x0004
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DG_ORA_0004 ; 0x028: length=11
kfdhdb.grpname: DG_ORA ; 0x048: length=6
kfdhdb.fgname: DG_ORA_0004 ; 0x068: length=11
kfdhdb.capname: ; 0x088: length=0
kfed read /dev/raw/raw8
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 2147483653 ; 0x008: TYPE=0x8 NUMB=0x5
kfbh.check: 2061320572 ; 0x00c: 0x7add417c
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000
kfdhdb.compat: 168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum: 5 ; 0x024: 0x0005
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DG_ORA_0005 ; 0x028: length=11
kfdhdb.grpname: DG_ORA ; 0x048: length=6
kfdhdb.fgname: DG_ORA_0005 ; 0x068: length=11
kfdhdb.capname: ; 0x088: length=0
kfed read /dev/raw/raw9
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 2147483654 ; 0x008: TYPE=0x8 NUMB=0x6
kfbh.check: 2059439481 ; 0x00c: 0x7ac08d79
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000
kfdhdb.compat: 168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum: 6 ; 0x024: 0x0006
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DG_ORA_0006 ; 0x028: length=11
kfdhdb.grpname: DG_ORA ; 0x048: length=6
kfdhdb.fgname: DG_ORA_0006 ; 0x068: length=11
kfdhdb.capname: ; 0x088: length=0
kfed read /dev/raw/raw10
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check: 4131885754 ; 0x00c: 0xf64792ba
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000
kfdhdb.compat: 168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum: 0 ; 0x024: 0x0000
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: ASMREDO1_0000 ; 0x028: length=13
kfdhdb.grpname: ASMREDO1 ; 0x048: length=8
kfdhdb.fgname: ASMREDO1_0000 ; 0x068: length=13
kfdhdb.capname: ; 0x088: length=0
I have the kfed read outputs only from
/dev/raw/raw4
/dev/raw/raw5
/dev/raw/raw6
/dev/raw/raw7
/dev/raw/raw8
/dev/raw/raw9
/dev/raw/raw10
From the AMDU output -- we see :
----------------------------- DISK REPORT N0009 ------------------------------
Disk Path: /dev/raw/raw2
Unique Disk ID:
Disk Label:
Physical Sector Size: 512 bytes
Disk Size: 2048 megabytes
** NOT A VALID ASM DISK HEADER. BAD VALUE IN FIELD blksize_kfdhdb **
----------------------------- DISK REPORT N0010 ------------------------------
Disk Path: /dev/raw/raw1
Unique Disk ID:
Disk Label:
Physical Sector Size: 512 bytes
Disk Size: 2048 megabytes
** NOT A VALID ASM DISK HEADER. BAD VALUE IN FIELD blksize_kfdhdb **
Do the above 2 disks belong to the diskgroup that you are trying to mount?
we encounter a disk storage destroyed yesterday.
after we recover this disk error. We found our rac can not bring up due to the diskgroup can not mount.
error ora-00600 shows in the alert log file.
kjbdomatt send to node 0
NOTE: attached to recovery domain 2
NOTE: starting recovery of thread=1 ckpt=348.1542 group=2
NOTE: starting recovery of thread=2 ckpt=189.5027 group=2
NOTE: starting recovery of thread=3 ckpt=182.5380 group=2
Errors in file /opt/oracle/db/diag/asm/+asm/+ASM3/trace/+ASM3_ora_13438.trc (incident=5754):
ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc
Trace dumping is performing id=[cdmp_20120917220327]
Abort recovery for domain 2
NOTE: crash recovery signalled OER-600
ERROR: ORA-600 signalled during mount of diskgroup DG_ORA
ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []
Could you please share generated trace and incident files with us,
cd /opt/oracle/db/diag/asm/+asm/+ASM3/trace
grep '2012-09-22 22' *trc | awk -F: '{print $1}' | uniq
and incident file named as below ,
/opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc
What the exact process you do to recover disk error,please let us know.
Research ::---------
==========
kfdp_query(ASMREDO1): 3
----- Abridged Call Stack Trace -----
<-ksedsts()+315<-kfdp_query()+337<-kfdPstSyncPriv()+589<-kfgFinalizeMount()+1629<-kfgscFinalize()+1051<-kfgForEachKfgsc()+194<-kfgsoFinalize()+135<-kfgFinalize()+388<-kfxdrvMount()+3712<-kfxdrvEntry()+1707<-opiexe()+21338<-opiosq0()+6520<-kpooprx()+353<-kpoal8()+922
*** 2012-09-17 22:03:22.816
<-opiodr()+2554<-ttcpip()+1058<-opitsk()+1449<-opiino()+1026<-opiodr()+2554<-opidrv()+580<-sou2o()+90<-opimai_real()+145<-ssthrdmain()+177<-main()+215<-__libc_start_main()+244<-_start()+41----- End of Abridged Call Stack Trace -----
*** 2012-09-17 22:03:26.954
kfdp_query(DG_ORA): 5
----- Abridged Call Stack Trace -----
<-ksedsts()+315<-kfdp_query()+337<-kfdPstSyncPriv()+589<-kfgFinalizeMount()+1629<-kfgscFinalize()+1051<-kfgForEachKfgsc()+194<-kfgsoFinalize()+135<-kfgFinalize()+388<-kfxdrvMount()+3712<-kfxdrvEntry()+1707<-opiexe()+21338<-opiosq0()+6520<-kpooprx()+353<-kpoal8()+922
<-opiodr()+2554<-ttcpip()+1058<-opitsk()+1449<-opiino()+1026<-opiodr()+2554<-opidrv()+580<-sou2o()+90<-opimai_real()+145<-ssthrdmain()+177<-main()+215<-__libc_start_main()+244<-_start()+41----- End of Abridged Call Stack Trace -----
2012-09-17 22:03:27.250989 : Start recovery for domain=2, valid=0, flags=0x4
NOTE: starting recovery of thread=1 ckpt=348.1542 group=2
NOTE: starting recovery of thread=2 ckpt=189.5027 group=2
NOTE: starting recovery of thread=3 ckpt=182.5380 group=2
WARNING:io_submit failed due to kernel limitations MAXAIO for process=128 pending aio=128
WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=65536 AIO-NR=6272
WARNING:Oracle process running out of OS kernel I/O resources
WARNING:Oracle process running out of OS kernel I/O resources
Incident 5754 created, dump file: /opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc
ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []
Abort recovery for domain 2, flags = 0x4
kjb_abort_recovery: abort recovery for domain 2 @ inc 4
kjb_abort_recovery: domain flags=0x0, valid=0
kfdp_dismount(): 6
----- Abridged Call Stack Trace -----
File_name :: +ASM3_ora_13438.trc
Could you please share already requested incident file with us ,
/opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc
I am looking for below file ,please share .
/opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc
Research ::---------
==========
NOTE: start heartbeating (grp 2)
kfdp_query(DG_ORA): 5
kfdp_queryBg(): 5
NOTE: cache opening disk 0 of grp 2: DG_ORA_0000 path:/dev/raw/raw3
NOTE: F1X0 found on disk 0 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DG_ORA_0001 path:/dev/raw/raw4
NOTE: cache opening disk 2 of grp 2: DG_ORA_0002 path:/dev/raw/raw5
NOTE: cache opening disk 3 of grp 2: DG_ORA_0003 path:/dev/raw/raw6
NOTE: cache opening disk 4 of grp 2: DG_ORA_0004 path:/dev/raw/raw7
NOTE: cache opening disk 5 of grp 2: DG_ORA_0005 path:/dev/raw/raw8
NOTE: cache opening disk 6 of grp 2: DG_ORA_0006 path:/dev/raw/raw9
NOTE: cache mounting (first) group 2/0x873827C3 (DG_ORA)
* allocate domain 2, invalid = TRUE
kjbdomatt send to node 0
NOTE: attached to recovery domain 2
NOTE: starting recovery of thread=1 ckpt=348.1542 group=2
NOTE: starting recovery of thread=2 ckpt=189.5027 group=2
NOTE: starting recovery of thread=3 ckpt=182.5380 group=2
Errors in file /opt/oracle/db/diag/asm/+asm/+ASM3/trace/+ASM3_ora_13438.trc (incident=5754):
ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc
Trace dumping is performing id=[cdmp_20120917220327]
Abort recovery for domain 2
NOTE: crash recovery signalled OER-600
ERROR: ORA-600 signalled during mount of diskgroup DG_ORA
ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []
ERROR: ALTER DISKGROUP ALL MOUNT
kfdp_query(ASMREDO1): 3
----- Abridged Call Stack Trace -----
<-ksedsts()+315<-kfdp_query()+337<-kfdPstSyncPriv()+589<-kfgFinalizeMount()+1629<-kfgscFinalize()+1051<-kfgForEachKfgsc()+194<-kfgsoFinalize()+135<-kfgFinalize()+388<-kfxdrvMount()+3712<-kfxdrvEntry()+1707<-opiexe()+21338<-opiosq0()+6520<-kpooprx()+353<-kpoal8()+922
*** 2012-09-17 22:03:22.816
<-opiodr()+2554<-ttcpip()+1058<-opitsk()+1449<-opiino()+1026<-opiodr()+2554<-opidrv()+580<-sou2o()+90<-opimai_real()+145<-ssthrdmain()+177<-main()+215<-__libc_start_main()+244<-_start()+41----- End of Abridged Call Stack Trace -----
*** 2012-09-17 22:03:26.954
kfdp_query(DG_ORA): 5
----- Abridged Call Stack Trace -----
<-ksedsts()+315<-kfdp_query()+337<-kfdPstSyncPriv()+589<-kfgFinalizeMount()+1629<-kfgscFinalize()+1051<-kfgForEachKfgsc()+194<-kfgsoFinalize()+135<-kfgFinalize()+388<-kfxdrvMount()+3712<-kfxdrvEntry()+1707<-opiexe()+21338<-opiosq0()+6520<-kpooprx()+353<-kpoal8()+922
<-opiodr()+2554<-ttcpip()+1058<-opitsk()+1449<-opiino()+1026<-opiodr()+2554<-opidrv()+580<-sou2o()+90<-opimai_real()+145<-ssthrdmain()+177<-main()+215<-__libc_start_main()+244<-_start()+41----- End of Abridged Call Stack Trace -----
2012-09-17 22:03:27.250989 : Start recovery for domain=2, valid=0, flags=0x4
NOTE: starting recovery of thread=1 ckpt=348.1542 group=2
NOTE: starting recovery of thread=2 ckpt=189.5027 group=2
NOTE: starting recovery of thread=3 ckpt=182.5380 group=2
WARNING:io_submit failed due to kernel limitations MAXAIO for process=128 pending aio=128
WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=65536 AIO-NR=6272
WARNING:Oracle process running out of OS kernel I/O resources
WARNING:Oracle process running out of OS kernel I/O resources
Incident 5754 created, dump file: /opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc
ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []
Abort recovery for domain 2, flags = 0x4
kjb_abort_recovery: abort recovery for domain 2 inc 4
kjb_abort_recovery: domain flags=0x0, valid=0
kfdp_dismount(): 6
----- Abridged Call Stack Trace -----
File_name :: +ASM3_ora_13438.trc
========= Dump for incident 5754 (ORA 600 [kfcema02]) ========
----- Beginning of Customized Incident Dump(s) -----
CE: (0x0x617be648) group=2 (DG_ORA) obj=4 (disk) blk=2115
hashFlags=0x0000 lid=0x0002 lruFlags=0x0000 bastCount=1
flags_kfcpba=0x49 copies=1 blockIndex=67 AUindex=0 AUcount=1
copy #0: disk=4 au=910336
BH: (0x0x6178e798) bnum=10 type=ALLOCTBL state=rcv chgSt=not modifying
flags=0x00000000 pinmode=excl lockmode=null bf=0x0x61409000
kfbh_kfcbh.fcn_kfbh = 0.165046340 lowAba=0.0 highAba=0.0
last kfcbInitSlot return code=null cpkt lnk is null ralFlags=0x00000000
-------------------------------------------------------------------------------
----- Invocation Context Dump -----
Address: 0x2b5faa6e8498
Phase: 3
flags: 0x18E0001
Incident ID: 5754
Error Descriptor: ORA-600 [kfcema02] [0] [165057275] [] [] [] [] []
Error class: 0
Problem Key # of args: 1
Number of actions: 8
----- Incident Context Dump -----
Address: 0x7fff6c8a42b8
Incident ID: 5754
Problem Key: ORA 600 [kfcema02]
Error: ORA-600 [kfcema02] [0] [165057275] [] [] [] [] []
[00]: dbgexExplicitEndInc [diag_dde]
[01]: dbgeEndDDEInvocationImpl [diag_dde]
[02]: dbgeEndDDEInvocation [diag_dde]
[03]: kfcema [ASM]<-- Signaling
[04]: kfrPass2 [ASM]
[05]: kfrcrv [ASM]
[06]: kfcMountPriv [ASM]
[07]: kfcMount [ASM]
[08]: kfgInitCache [ASM]
[09]: kfgFinalizeMount [ASM]
[10]: kfgscFinalize [ASM]
[11]: kfgForEachKfgsc [ASM]
[12]: kfgsoFinalize [ASM]
[13]: kfgFinalize [ASM]
[14]: kfxdrvMount [ASM]
[15]: kfxdrvEntry [ASM]
[16]: opiexe []
[17]: opiosq0 []
[18]: kpooprx []
[19]: kpoal8 []
[20]: opiodr []
[21]: ttcpip []
[22]: opitsk []
[23]: opiino []
[24]: opiodr []
[25]: opidrv []
[26]: sou2o []
[27]: opimai_real []
[28]: ssthrdmain []
[29]: main []
[30]: __libc_start_main []
[31]: _start []
MD [00]: 'SID'='115.3' (0x3)
MD [01]: 'ProcId'='19.1' (0x3)
MD [02]: 'PQ'='(50331648, 1347894201)' (0x7)
MD [03]: 'Client ProcId'='oraclemos5200db3 (TNS V1-V3).13438_47689880133216' (0x0)
Impact 0:
Impact 1:
Impact 2:
Impact 3:
Derived Impact:
File_name :: +ASM3_ora_13438_i5754.trc
1. Execute
.
.
kfed read /dev/raw/raw7 aunum=910336 blknum=2115 text=/tmp/kfed_raw7_910336_2115.txt
kfed read /dev/raw/raw7 text=/tmp/kfed_raw7.txt
.
.
2. get the 'File 1 Block 1' for the diskgroup following:
.
a. for each disk in the diskgroup execute:
.
kfed read <DSK> read | grep f1b1
.
3. you may get non-zero values for 'kfdhdb.f1b1locn' like:
.
kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002
.
4. for that disk execute (Replace <AUNUM> using the one from previous step):
.
kfed read <DSK> aunum=<AUNUM> text=kfed_<DSK>_<AUNUM>.txt
kfed read <DSK> text=kfed_<DSK>_w_f1b1.txt
.
.
.
5. Set the below event in asm pfile and try to mount the diskgroup DG_ORA manually at asm level,
Reproduce the problem setting on the instance:
.
event = "15199 trace name context forever, level 0x8007"
Then start the asm instnace using that pfile
startup nomount pfile=<pfile name>;
Then try to mount each diskgroup manually one-by-one including DG_ORA,
sql> . alter diskgroup <diskgroup_name> mount;
and collect the traces from bdump/udump. The event will dump the redo until
we get the error.
.
6. for each disk in the diskgroup get a backup for the first 50Mb (Replace the <disk_name>):
.
dd if=<disk_name> of=/tmp/<disk_name>.dd
.
later compress those files and upload them to the bug.
.
7. At the end please upload:
.
a. Complete alert for all the ASM instances
b. traces produced when event was set
c. metadata dumps (files /tmp/kfed*)
d. OS logs /var/adm/messages* for each node which contains latest timestamp of those mount.
e. dd dumps
----------------------------- DISK REPORT N0008 ------------------------------
Disk Path: /dev/raw/raw3
Unique Disk ID:
Disk Label:
Physical Sector Size: 512 bytes
Disk Size: 1047552 megabytes
Group Name: DG_ORA
Disk Name: DG_ORA_0000
Failure Group Name: DG_ORA_0000
Disk Number: 0
Header Status: 3
Disk Creation Time: 2012/03/01 15:31:59.955000
Last Mount Time: 2012/04/07 15:40:22.454000
Compatibility Version: 0x0a100000(10010000)
Disk Sector Size: 512 bytes
Disk size in AUs: 1047552 AUs
Group Redundancy: 1
Metadata Block Size: 4096 bytes
AU Size: 1048576 bytes
Stride: 113792 AUs
Group Creation Time: 2012/03/01 15:31:59.829000
File 1 Block 1 location: AU 2 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
File_name :: report.txt
It is available for diskgroup DG_ORA on disk /dev/raw/raw3.
Please execute that command on same device.
Yes,we need only for disk raw3 as this issue of mount is related to dg_ora diskgroup and there should be atleast one location in a diskgroup,so you can see 2 from 2 different diskgroup.
Hence ,for this step 2-4 do the mentioned action plan then rest of them.
2. get the 'File 1 Block 1' for the diskgroup following:
.
a. for each disk in the diskgroup execute:
.
kfed read /dev/raw/raw3 read | grep f1b1
.
3. you may get non-zero values for 'kfdhdb.f1b1locn' like:
.
kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002
.
4. for that disk execute (Replace <AUNUM> using the one from previous step):
.
kfed read /dev/raw/raw3 aunum=2 text=kfed_raw3_2.txt
kfed read /dev/raw/raw3 text=kfed_raw3_w_f1b1.txt
It seems file asmlog.part01.rar is broken ,could you please share the same again with us .
1. Clarification of current patch level status:
=> from uploaded OPatch lsinventory output:
Oracle Home : /opt/oracle/db/product/11g/db_1
Installed Top-level Products (2):
Oracle Database 11g 11.1.0.6.0
Oracle Database 11g Patch Set 1 11.1.0.7.0
Interim patches (3) :
Patch 9549042 : applied on Thu Mar 01 09:25:24 WIT 2012
Patch 7272646 : applied on Thu Mar 01 08:48:05 WIT 2012
Patch 12419384 : applied on Thu Mar 01 08:42:47 WIT 2012
=> DATABASE PSU 11.1.0.7.8 (INCLUDES CPUJUL2011)
=> Note:
1) The prior mentioned bug 6163771 is already fixed in patchset 11.1.0.7
@@ PATCHSET REQUEST #70719 CREATED IN BUG 6712856 FOR FIX IN 11.1.0.7.0
2) I cannot find any informations inthis SR if the ASM and DB are common. Need to clarify.
2. From +ASM3_ora_13438_i5754.trc:
*** ACTION NAME:() 2012-09-17 22:03:27.432
Dump continued from file: /opt/oracle/db/diag/asm/+asm/+ASM3/trace/+ASM3_ora_13438.trc
ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []
========= Dump for incident 5754 (ORA 600 [kfcema02]) ========
----- Beginning of Customized Incident Dump(s) -----
CE: (0x0x617be648) group=2 (DG_ORA) obj=4 (disk) blk=2115
hashFlags=0x0000 lid=0x0002 lruFlags=0x0000 bastCount=1
flags_kfcpba=0x49 copies=1 blockIndex=67 AUindex=0 AUcount=1
copy #0: disk=4 au=910336
BH: (0x0x6178e798) bnum=10 type=ALLOCTBL state=rcv chgSt=not modifying
flags=0x00000000 pinmode=excl lockmode=null bf=0x0x61409000
kfbh_kfcbh.fcn_kfbh = 0.165046340 lowAba=0.0 highAba=0.0
last kfcbInitSlot return code=null cpkt lnk is null ralFlags=0x00000000
...
*** 2012-09-17 22:03:27.553
----- Current SQL Statement for this session (sql_id=2pa6sbf4762ga) -----
ALTER DISKGROUP ALL MOUNT
----- Call Stack Trace -----
Function List:
skdstdst <- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp
<- PGOSF52_ksfdmp <- dbgexPhaseII <- dbgexExplicitEndInc <- dbgeEndDDEInvocatio <- nImpl
<- dbgeEndDDEInvocatio <- kfcema <- kfrPass2 <- kfrcrv <- kfcMountPriv
<- kfcMount <- kfgInitCache <- kfgFinalizeMount <- 2241 <- kfgscFinalize
<- kfgForEachKfgsc <- kfgsoFinalize <- kfgFinalize <- kfxdrvMount <- kfxdrvEntry
<- opiexe <- opiosq0 <- kpooprx <- kpoal8 <- opiodr
<- ttcpip <- opitsk <- opiino <- opiodr <- opidrv
<- sou2o <- opimai_real <- ssthrdmain <- main <- libc_start_main
@@ Bug 13407102: ORA-600 [KFCEMA02] AND ORA-600 [KFCMOUNT15] HAPPENED ON ASM INSTANCE
1. The prior engineer sent out an action plan to you with the target to patch the diskgroup.
Could we ask you for the results from this action plan please ? Was you able to mount the diskgroup after that plan ?
+++ for Bug 13407102: ORA-600 [KFCEMA02] AND ORA-600 [KFCMOUNT15] HAPPENED ON ASM INSTANCE, there is no patch at all. also it happens on 11gR2
2. If the problem is still remaining verify that the affected diskgroup is dismounted on all nodes/asm instances.
After that please try to mount it on ASM instance 1 only (manually in SQLPLUS).
What is the result ? Do you still get the same ORA-600 error as before ?
Please re-upload the most current alertfile from ASM instance 1 together with the tracefiles which will be written.
++ no patch can be applied, do you still want us to do this?
3. If the internal error is still remaining and the patching of the diskgroup failed then we have to rebuild the diskgroup.
Do you have a full backup of the data within the affected diskgroup ? Please clarify.
+++ sorry, no backup
Unfortunately this is a misunderstanding.
The action plan which was provided to you by the prior engineer was to patch the bad blocks in the affected disks.
- not to apply any patch. Even if we could apply a patch it would be possible that this patch does only avoid new
occurances - but probably it will not repair the current situation (if the diskgroup is corrupted).
Please note that in case that we cannot repair the diskgroup you will have to rebuild the diskgroup and then
to restore and recover the lost data. Accordingly you should have at least some kind of worste case backup ?
Your issue was transferred to me. My name is Pallavi and I will be helping you with your issue. I am currently reviewing/researching the situation and will update the SR / call you as soon as I have additional information. Thank you for your patience.
We can try to patch the diskgroup. If this doesn't work, you will have to recreate the diskgroup and restore data from a valid backup.
!!!!!! VERY IMPORTANT: Be sure you have a valid backup of data pertaining to ora_data diskgroup. !!!!!!
-----------------------------------------------------------------
You need to create kfed and amdu for further use.
1) kfed is a tool that allows to read/write the ASM metadata. To create kfed, connect as the user owner of the oracle software and execute:
$cd $ORACLE_ASMHOME/rdbms/lib
$make -f ins_rdbms.mk ikfed
2)AMDU was released with 11g, and is a tool used to get the location of the ASM metadata across the disks.
As many other tools released with 11g, it can be used on 10g environments. Note 553639.1 is the placeholder for the different platforms. The note include also instructions for the configuration.
* Transfer amdu and facp to a working directory and include it on LD_LIBRARY_PATH, PATH and other relevant variables.
There is no guarantee that the patching would work. It all depends on the status of the disk that we are trying to patch. We will only know what the status is when we try.
As the ASM software owner, execute facp:
$ ./facp 'diskstring' 'DISKGROUP NAME' ALL
eg:
$./facp '/dev/vg00/rraw*' 'DATAHP' ALL
Run this only ONCE -- and then please update the sr with all the files it has generated.
Did you execute facp command as requested ,if not please do the same and share related generated files with us ,
As the ASM software owner, execute facp:
$ ./facp 'diskstring' '<DISKGROUP NAME>' ALL
$ ./facp '/dev/raw/raw*' 'DG_ORA' ALL
Then share related files named as below ,
facp_report
facp_dump_1
facp_dump_2
facp_dump_3
facp_restore
facp_patch_1 (one per node that uses the dg)
facp_adjust
facp_check
facp_patch
Note:: Run this only ONCE
We are waiting for the same.
Execute the below command and share generated logfile with us,
script /tmp/facp.log
# Run the following to lower all checkpoints by 10 blocks:
$ ./facp_adjust -10
# Then run facp_check.
$ ./facp_check
exit
Share the file named as /tmp/facp.log
Try to adjust to some lower value than 10 using below command,
./facp_adjust -<integer>
Then ,validate ,
$ ./facp_check
If facp_check reports "Valid Checkpoint" for all threads, it's the indication
to proceed with the real patching, which means, updating the ACD records
on the disks with the records from files fac_patch_*.
To continue with this step, facp_check should have returned "Valid Checkpoint" for all threads.
Then execute the below command to patch the ACDC ,
./facp_patch
Then try to mount this diskgroup manually ,
SQL> alter diskgroup dg_ora mount;
if again mount fail with same error then go back up to facp_adjust step, using a new argument for facp_adjust and continue until diskgroup is mounted.
Instruction: Following note : How to fix error ORA-600 [KFCEMA02] (Doc ID 728884.1) As per above note,Ct tried to patch acd ,but still not able to mount the diskgroup, oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_adjust -9 3 patch files written oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_check --- Executing amdu to validate checkpoint target blocks --- Thread 1 (348,1533): Valid Checkpoint Thread 2 (189,5018): Valid Checkpoint Thread 3 (182,5371): Valid Checkpoint oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_patch --- Executing amdu to check for heartbeat --- Patching Thread 1 kfracdc.ckpt.seq: 348 kfracdc.ckpt.blk: 1533 Patching Thread 2 kfracdc.ckpt.seq: 189 kfracdc.ckpt.blk: 5018 Patching Thread 3 kfracdc.ckpt.seq: 182 kfracdc.ckpt.blk: 5371 Save files ./facp_* to document what was patched oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> export ORACLE_SID= ASM1 Refer to the SQL*Plus User's Guide and Reference for more information. oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> sqlplus / as sysdba SQL*Plus: Release 11.1.0.7.0 - Production on Wed Sep 19 15:12:37 2012 Copyright (c) 1982, 2008, Oracle. All rights reserved. Connected to an idle instance. SQL> startup nomount; ASM instance started Total System Global Area 283930624 bytes Fixed Size 2158992 bytes Variable Size 256605808 bytes ASM Cache 25165824 bytes SQL> alter diskgroup DG_ORA mount; alter diskgroup DG_ORA mount * ERROR at line 1: ORA-00600: internal error code, arguments: [kfcema02], [0], [165054516], [], [], [], [], [], [], [], [], [] SQL> host oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> exit Seems Ct needs to recreate this diskgroup and restore data from backup. If they does not have backup. Then ,Need to log a bug to involve development team further.
Activity Instruction
Created: 18-Sep-2012 03:28:29 PM GMT+00:00 Instruction Type: Severity 1 : End of Shift Note
Instruction: Currently we try to find out if the diskgroup can be patched/repaired. Aritra Kundu has sent out an action plan therefor. We are still waiting for related customer feedback. If the diskgroup cannot be repaired we have to rebulid it.
Activity Instruction
Created: 18-Sep-2012 08:08:23 AM GMT+00:00 Instruction Type: Severity 1 : End of Shift Note
Instruction: Seems Ct is on PSU 8 and related known defect is already RFIed into 11.1.0.7 BUG 6712856 - RFI BACKPORT OF BUG 6163771 FOR INCLUSION IN 11.1.0.7.0 Waiting for Ct to share requested information,after that needs to raise a defect with development team and page BDE immediately to involve them.
2. TECHNICAL & BUSINESS IMPACT
Probably diskgroup corruption.
If we try to mount the affected diskgroup then we fail during the dg recovery with an internal error:
ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []
Currently we try to find out if the diskgroup can be patched/repaired.
Aritra Kundu has sent out an action plan therefor. We are still waiting for related customer feedback.
ove note,Ct tried to patch acd ,but still not able to mount the diskgroup,
oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_adjust -9
3 patch files written
oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_check
--- Executing amdu to validate checkpoint target blocks ---
Thread 1 (348,1533): Valid Checkpoint
Thread 2 (189,5018): Valid Checkpoint
Thread 3 (182,5371): Valid Checkpoint
oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_patch
--- Executing amdu to check for heartbeat ---
Patching Thread 1
kfracdc.ckpt.seq: 348
kfracdc.ckpt.blk: 1533
Patching Thread 2
kfracdc.ckpt.seq: 189
kfracdc.ckpt.blk: 5018
Patching Thread 3
kfracdc.ckpt.seq: 182
kfracdc.ckpt.blk: 5371
Save files ./facp_* to document what was patched
oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> export ORACLE_SID=+ASM1
Refer to the SQL*Plus User's Guide and Reference for more information.
oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> sqlplus / as sysdba
SQL*Plus: Release 11.1.0.7.0 - Production on Wed Sep 19 15:12:37 2012
Copyright (c) 1982, 2008, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup nomount;
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2158992 bytes
Variable Size 256605808 bytes
ASM Cache 25165824 bytes
SQL> alter diskgroup DG_ORA mount;
alter diskgroup DG_ORA mount
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kfcema02], [0], [165054516], [],
[], [], [], [], [], [], [], []
SQL> host
oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> exit
Seems Ct needs to recreate this diskgroup and restore data from backup.
If they does not have backup.
Then ,Need to log a bug to involve development team further.
again the latest uploaded file 'facplog' is not readable on our side - it cannot be de-compressed.
Please make sure that the uploaded compressed files are readable/can be de-compressed again -
still before uploading them. In that way we all can save time...
To get the current correct status of the patching action please upload the next informations:
1. Re-upload file ''facplog'.
2. Upload the most current asm alertfile from all instances.
3. Upload that tracefile which was written during the last happened ORA-600 [kfcema02]
1. from asm alertfile (inst.1):
=> latest Occurance:
Wed Sep 19 15:33:40 2012
SQL> alter diskgroup DG_ORA mount
...
NOTE: cache opening disk 0 of grp 2: DG_ORA_0000 path:/dev/raw/raw3
NOTE: F1X0 found on disk 0 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DG_ORA_0001 path:/dev/raw/raw4
NOTE: cache opening disk 2 of grp 2: DG_ORA_0002 path:/dev/raw/raw5
NOTE: cache opening disk 3 of grp 2: DG_ORA_0003 path:/dev/raw/raw6
NOTE: cache opening disk 4 of grp 2: DG_ORA_0004 path:/dev/raw/raw7
NOTE: cache opening disk 5 of grp 2: DG_ORA_0005 path:/dev/raw/raw8
NOTE: cache opening disk 6 of grp 2: DG_ORA_0006 path:/dev/raw/raw9
NOTE: cache mounting (first) group 2/0x95BC2DFD (DG_ORA)
...
Wed Sep 19 15:33:45 2012
NOTE: attached to recovery domain 2
NOTE: starting recovery of thread=1 ckpt=348.1542 group=2
NOTE: starting recovery for thread 1 at
NOTE: seq=348 blk=1542
NOTE: starting recovery of thread=2 ckpt=189.5027 group=2
NOTE: starting recovery for thread 2 at
NOTE: seq=189 blk=5027
NOTE: starting recovery of thread=3 ckpt=182.5380 group=2
NOTE: starting recovery for thread 3 at
NOTE: seq=182 blk=5380
Errors in file /opt/oracle/db/diag/asm/+asm/+ASM1/trace/+ASM1_ora_2519.trc (incident=9775):
ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []
Abort recovery for domain 2
NOTE: crash recovery signalled OER-600
ERROR: ORA-600 signalled during mount of diskgroup DG_ORA
ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []
ERROR: alter diskgroup DG_ORA mount
...
2. from +ASM1_ora_2519.trc:
2012-09-19 15:25:25.156129 : Start recovery for domain=2, valid=0, flags=0x4
NOTE: starting recovery of thread=1 ckpt=348.1537 group=2
NOTE: starting recovery of thread=2 ckpt=189.5022 group=2
NOTE: starting recovery of thread=3 ckpt=182.5375 group=2
...
*** 2012-09-19 15:25:25.172
kfrHtAdd: obj=0x1 blk=0x6e6 op=133 fcn:0.165051322 -> 0.165051323
kfrHtAdd: bcd: obj=1 blk=1766 from:0.165051322 to:0.165051323
...
=> revovery is running...
...
*** 2012-09-19 15:25:25.206
kfrHtAdd: obj=0x6e9 blk=0x80000000 op=161 fcn:0.165057973 -> 0.165057974
*** 2012-09-19 15:25:25.206
kfrHtAdd: obj=0x1 blk=0x6e9 op=133 fcn:0.165057974 -> 0.165057975
*** 2012-09-19 15:25:25.206
kfrHtAdd: obj=0x80000006 blk=0x60f op=65 fcn:0.165057967 -> 0.165057975
*** 2012-09-19 15:25:25.206
kfrHtAdd: obj=0x6e9 blk=0x80000000 op=161 fcn:0.165057974 -> 0.165057975
WARNING:io_submit failed due to kernel limitations MAXAIO for process=128 pending aio=128
WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=65536 AIO-NR=6400
WARNING:Oracle process running out of OS kernel I/O resources
WARNING:Oracle process running out of OS kernel I/O resources
*** 2012-09-19 15:25:25.212
kfrRcvSetRem: obj=0x1 blk=0x6e7 [set] = 284
block needed no recovery:
CE: (0x0x617be2b0) group=2 (DG_ORA) obj=1 blk=1767
hashFlags=0x0000 lid=0x0002 lruFlags=0x0000 bastCount=1
flags_kfcpba=0x18 copies=1 blockIndex=231 AUindex=0 AUcount=0
copy #0: disk=3 au=762686
BH: (0x0x6178e360) bnum=5 type=FILEDIR state=rcv chgSt=not modifying
flags=0x00000000 pinmode=excl lockmode=null bf=0x0x61404000
kfbh_kfcbh.fcn_kfbh = 0.165054713 lowAba=0.0 highAba=0.0
last kfcbInitSlot return code=null cpkt lnk is null ralFlags=0x00000000
...
=> from here is seems that the recovery was interrupted due to an I/O kernel limitation:
WARNING:io_submit failed due to kernel limitations MAXAIO for process=128 pending aio=128
WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=65536 AIO-NR=6400
WARNING:Oracle process running out of OS kernel I/O resources
WARNING:Oracle process running out of OS kernel I/O resources
=== Follow up ===
3. From the block patching actions:
=> regarding to the patched blocks we are here:
SQL> host
oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_adjust -0
3 patch files written
oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_check
--- Executing amdu to validate checkpoint target blocks ---
Thread 1 (348,1542): WRONG SEQ NUMBER
Thread 2 (189,5027): WRONG SEQ NUMBER
Thread 3 (182,5380): Valid Checkpoint
DO NOT PATCH WITH THE CURRENT PATCH FILES
oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_patch
--- Executing amdu to check for heartbeat ---
Patching Thread 1
kfracdc.ckpt.seq: 348
kfracdc.ckpt.blk: 1542
Patching Thread 2
kfracdc.ckpt.seq: 189
kfracdc.ckpt.blk: 5027
Patching Thread 3
kfracdc.ckpt.seq: 182
kfracdc.ckpt.blk: 5380
Save files ./facp_* to document what was patched
=> not sure why '/facp_adjust' command was used with zero (-0) ?
SQL> alter diskgroup DG_ORA mount;
alter diskgroup DG_ORA mount
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [],[], [], [], [], [], [], [], []
4. Patch level status of the instance:
Oracle Database 11g Patch Set 1 11.1.0.7.0
There are 2 products installed in this Oracle Home.
Interim patches (3) :
Patch 9549042 : applied on Thu Mar 01 09:25:24 WIT 2012
Patch 7272646 : applied on Thu Mar 01 08:48:05 WIT 2012
Patch 12419384 : applied on Thu Mar 01 08:42:47 WIT 2012
=> PSU 11.1.0.7.8
=> so we are on 11.1.0.7.8 here
please see our latest analysis below.
Currently I can see two problems when we are trying to mount the corrupted diskgroup.
We get the known ORA-600 but also an error about an I/O kernel limitation during the block recovery.
I would like to avoid the I/O kernel limitation error during the recovery. Maybe after that the recovery can
complete and resolve the situation - instead to patch blocks manually.
We know about the next bugs in companion with I/O kernel limitation errors (from note 868590.1):
"...
For 11gR1
The fix for unpublished Bug 6687381 is included in patch set 11.1.0.7
The fix for Bug 7523755 is available as overlay patch on Patch Set Update 11.1.0.7.10 ,
... apply patch set 11.1.0.7 and Patch 13343461 on top of that., then Apply fix for Bug 7523755...
"
Accordingly I would suggest to follow the next actions now:
Since you are currently on 11.1.0.7.8 you would need to apply at first PSU 11.1.0.7.10. in all Oracle_Homes (ASM & DB).
Afterwards apply the fix for Bug 7523755.
Finally, after the patches are applied, restart the instance and try to mount the diskgroup again.
Verify if the block recovery can be completed now or if we are still failing with the same ORA-600.
At least the I/O errors should not be reported anymore now.
apply at first PSU 11.1.0.7.10. in all Oracle_Homes (ASM & DB).
Afterwards apply the fix for Bug 7523755
Finally, after the patches are applied, restart the instance and try to mount the diskgroup again.
Verify if the block recovery can be completed now or if we are still failing with the same ORA-600.
At least the I/O errors should not be reported anymore now.