Email: service@parnassusdata.com 7 x 24 online support!
ORA-15196 Oracle ASM CASE STUDY: UNDERSTANDING ERROR ORA-15196
This document provides an explanation of error ORA-15196, including the details of each argument, suggestions for the diagnostic of the error and finally includes a case study using a real problem reported by a customer.
Error Description
ORA-15196 is reported after a validation of an ASM metadata block has failed. The error will be reported in the following format:
ORA-15196: invalid ASM block header [1st] [2nd] [3rd] [4th] [5th != 6th]
Where the arguments indicate:
Argument Meaning
- 1st Function and line number in the code, where the exception is raised 2nd Field failing the validation
- 3rd ASM object number stored in the block
- 4th ASM block number stored in the block
- 5th Value associated with field referenced by argument 2 6th Expected value for field referenced by argument 2
Example:
ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]
Function and line number in the code, where the exception is raised = kfc.c:7997
Field failing the validation = endian_kfbh ASM object number stored in the block = 1 ASM block number stored in the block = 93
Value associated with field referenced by argument #2 = 211
Expected value for field referenced by argument #2 = 0
Arguments description
- Function and line number in the code, where the exception is raised
In general terms it is valid to say this argument will be the same in most of the possible cases, because is always the same routine where this exception is raised.
#define kfbValid(data, len, type, bl) \
kfbValidPriv(data, len, type, bl, FILE , LINE ).
- Field failing the validation
The ASM metadata is composed by many different structures like file directory, disk directory, active change directory (ACDC), etc, which are organized by files (asm file# between 1 and 255). Each file will be made of extents, which will be made of ASM block (4096 bytes). Each block has a generic block header (kfbh), and any of those fields can be validated.
kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 4 ; 0x002: KFBTYP_FILEDIR kfbh.datfmt: 1 ; 0x003: 0x01 kfbh.block.blk: 80 ; 0x004: T=0 NUMB=0x50 kfbh.block.obj: 1 ; 0x008: TYPE=0x0 NUMB=0x1 kfbh.check: 4268948098 ; 0x00c: 0xfe72fa82 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000
A short description of each of the fields referenced above (file kf3.h):
kfbh.endian endianness of writer big or little endian
kfbh.hard H.A.R.D. magic # and block size
kfbh.type metadata block type (type of ASM metadata)
kfbh.datfmt metadata block data format
kfbh.block.blk block location of this block
kfbh.block.obj check value to verify consistency
kfbh.check change number of last change
kfbh.spare1 zero pad out to 32bytes
kfbh.spare2 zero pad out to 32 bytes
A list of the fields reported by this error through different SR is:
endian_kfbh
obj_kfbl hard_kfbh
type_kfbh
datfmt_kfbh
check_kfbh
- ASM object number stored in the block
Every ASM metadata block belongs to a specific file associated with a specific ASM structure. That’s why ASM File numbers between 1 and 255 are used to identify the files storing those structures. The value on this field, references the ASM file number.
ASM File Number ASM Metadata
1 File Directory
2 Disk Directory
3 Active Change Directory (ACD)
4 Continous Operations Directory (COD)
5 Template Directory
6 Alias Directory
9 Attributes Directory
12 Staleness Directory
For other ASM metadata structures like PST, ATB, DISK HEADER, this field will have a static value 2147483648 (0x80000000)
- ASM block number stored in the block
An ASM file will allocate extents, which are associated with Allocation Units. Multiple ASM metadata blocks of 4096 bytes make the extent, considering the default Allocation Unit size of 1MB; there are 256 blocks on each extent/AU.
The value stored on this field indicates the block number relative to a particular file. In this example, (93) is the block number, which will be stored in the first extent of the file. That extent will be allocated on a specific Allocation Unit of any of the disks in the diskgroup.
- Value associated with field referenced by argument #2
This is the value found in the block for the field referenced in argument #2.
- Expected value for field referenced by argument 2
This is the expected value for the block referenced by argument # 2.
Having the description of all the arguments for error ORA-15196, It should be possible to have a better understanding of the message:
ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]
In the previous example, the field failing the validations is endian_kfbh, belong to file 1 (FILE DIRECTORY); it was also relative block 93, and the value for endian_kfbh was 211 while the correct value should have been 0.
Diagnostics
Up to 10gR2, there are some bugs (patch included) related to this error.
5554692 | Related to indirect extent allocation. Please read the bug descriptionin webiv, because not all cases of ORA-15196 are this particular bug. |
6027802 | This was closed as not a bug, but was related to some IO issues caused by EMC Powerpath. Same type of data mismatch has been observed on other PP installations |
6453944 | ORA-15196 with ASM disks larger than 2TB using ASMLIB |
The major number of issues of this error is associated with data changed outside of ASM. This include:
- Disks formatted at the OS level while it was used by ASM
- Disks assigned to a file system while used by ASM
- IO errors (stale writes)
- Usage of 3rdparty software
Once this error is reported, the diskgroup needs to be recreated. There are situations where diskgroup cannot be mounted, or others where any reference to the metadata (recursive or non recursive), will signal the error and dismount the diskgroup.
Data Collection
In order to understand the extension of the problem and produce a correct diagnostic, it is essential to obtain the following data:
- Alert.log and trace file associated to the error
- First 300MB of the disk affected with the error
In the alert.log, review the line before the report of error ORA-15196:
WARNING: cache failed to read fn=1 blk=80 from disk(s): 0
ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]
In the line prior the report of error ORA-15196, it indicates the disk storing the block: from disk(s): 0.
To get the first 300MB:
$dd if=<device path> of=/tmp/disk.dd bs= 1048576 count=300
It may be necessary to provide partial copy of other disks in the diskgroup.
- Output from AMDU if available
AMDU will be explained with more detail in a different note (TBD).
This tool is part of the New Features introduced with 11g. It reads the ASM disks and extract information into different files. Those files have a mapping of the ASM metadata, an image with the content of the disks or it is possible to extract files from the diskgroup.
AMDU can extract the information even if the diskgroup is dismounted.
The mapping file is very important for the diagnostic of error ORA-15196. It has the specific location for each of the extents of each ASM metadata file.
Note 553639.1 is the placeholder for the AMDU binaries for some of the platforms.
Data Review
- Always review other blocks in the boundaries of the affected block. If more than one block has incorrect data (zeros), and they belong to different ASM structures (file directory, disk directory, etc), it is most likely was caused outside of ASM: disk reformatted, assigned to another volume manager, etc.
Use kfed to extract the content of the blocks.
- Reviewing the trace file generated by the error.
The trace file always will print a dump of the ASM metadata block in memory, and also a short call stack. The output of the block is the same generated by kfed, which is a readable by the user.
*** SERVICE NAME:() 2008-01-23 11:57:23.892
*** SESSION ID:(39.74) 2008-01-23 11:57:23.892
OSM metadata block dump:
kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 80 ; 0x004: T=0 NUMB=0x50
kfbh.block.obj: 1 ; 0x008: TYPE=0x0 NUMB=0x1 kfbh.check: 4268948098 ; 0x00c: 0xfe72fa82 kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
/* data remove on purpose */
After the OSM metadata block dump, the short call stack is printed:
—– Abridged Call Stack Trace —–
kfcReadBlk()+1276 kfcLoad()+2148 kffbScanNext()+252 kffbTableCb()+700 kfgTableCb()+1252 kffilTableCb()+240 qerfxFetch()+896 qersoFetch()+720 qerjotFetch()+184 opifch2()+8092 kpoal8()+4196 opiodr()+1548 ttcpip()+1284 opitsk()+1432 opiino()+1128 opiodr()+1548 opidrv()+896 sou2o()+80 opimai_real()+124 main()+152
- Compare the data in the trace file with the data extracted from disk using kfed.
Comparing the block dumped in the trace file and the block in disk, it is possible to identify the exact cause of the check validation failure. Every case will be different, but if the data stored in disk is zeros, always remember to validate other blocks (adjacent). If more blocks are reporting invalid data (zeros), this is an indication the disk has been formatted outside ASM.
Example 1:
This is an example of a block with invalid data. The type of the block is KFBTYP_INVALID, generated when a incorrect type is stored.
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 34 ; 0x001: 0x22
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 4290772992 ; 0x004: T=1 NUMB=0x7fc00000
kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 13879 ; 0x010: 0x00003637
kfbh.fcn.wrap: 512 ; 0x014: 0x00000200
kfbh.spare1: 978943 ; 0x018: 0x000eefff
kfbh.spare2: 2054913149 ; 0x01c: 0x7a7b7c7d
Example 2:
The full content of the block has 0xd4.
disk:0 au:2 block:253 file:1 physical extent:0 block:253 kfed read ausz=1048576 blksz=4096 aunum=2 blknum=253 dev=/dev/rdsk/c2t50060E8000C41384d2s6 kfbh.endian: 212 ; 0x000: 0xd4 kfbh.hard: 212 ; 0x001: 0xd4 kfbh.type: 212 ; 0x002: *** Unknown Enum *** kfbh.datfmt: 212 ; 0x003: 0xd4 kfbh.block.blk: 3570717908 ; 0x004: T=1 NUMB=0x54d4d4d4 kfbh.block.obj: 3570717908 ; 0x008: TYPE=0xd NUMB=0x4d4d4 kfbh.check: 3570717908 ; 0x00c: 0xd4d4d4d4 kfbh.fcn.base: 3570717908 ; 0x010: 0xd4d4d4d4 kfbh.fcn.wrap: 3570717908 ; 0x014: 0xd4d4d4d4 kfbh.spare1: 3570717908 ; 0x018: 0xd4d4d4d4 kfbh.spare2: 3570717908 ; 0x01c: 0xd4d4d4d4 kfbtTraverseBlock: Invalid OSM block type 212 0000: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0020: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0040: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0060: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0080: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 00a0: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4
CASE STUDY
The diskgroup was not used for some months, used by a copy of a database. Due to business reasons, that database required to be used. Mounting the diskgroup was possible, but when the database was mounted, and reading the ASM metadata was required, error ORA-15196 was signaled and diskgroup dismounted.
The diskgroup was configured using external redundancy with a single disk and using the default Allocation Unit size of 1MB.
Data Collected
- The messages in the alert.log:
WARNING: cache failed to read fn=1 blk=256 from disk(s): 0
ORA-15196: invalid ASM block header [kfc.c:7997] [obj_kfbl] [1] [256] [3 != 1]
- The ASM block dumped in the trace file.
*** SESSION ID:(108.5) 2008-02-06 10:05:31.054
OSM metadata block dump:
kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 7 ; 0x002: KFBTYP_ACDC
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 10752 ; 0x004: T=0 NUMB=0x2a00
kfbh.block.obj: 3 ; 0x008: TYPE=0x0 NUMB=0x3
kfbh.check: 1103194877 ; 0x00c: 0x41c16afd
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
- AMDU together with 300MB for the disk were collected.
Data Review
- The error:
WARNING: cache failed to read fn=1 blk=256 from disk(s): 0
ORA-15196: invalid ASM block header [kfc.c:7997] [obj_kfbl] [1] [256] [3 != 1]
The error provides the following information:
o The field failing the validation is obj_kfbl
o The block belongs to file 1 (fn=1). File 1 is the File Directory.
o The block is block 256 (blk=256)
o The value for obj_kfbl found was 3 but the expected value should be 1.
File extents, allocation units, blocks in ASM start at 0. Also, block size is 4096. Using the default AU size (1MB), there are 256 blocks. Block 256 is stored in the second extent.
Although the diskgroup was mounted, any query referencing x$kffxp trying to get the extent mapping for file 1 failed. As a result, it was not possible to identify the AU used by block 256 from file 1 (the affected block).
- Using AMDU
One of the files generated by AMDU is the mapping file (*.map) . That file contains the location on disk for every extent of the files stored in the diskgroup. The only record for file 1 was:
N0001 D0000 R00 A00000002 F00000001 I0 E00000000 U00 C00256 S0001 B0002097152
This line indicates that for File 1 (F00000001)), the first extent is stored in Allocation Unit 2 ( A00000002 ) from disk 0 ( D0000 ) .
t was not another entry for file 1 in the mapping file, but AMDU was generating a core dump. It was discovered AMDU was trying to read Allocation Unit 50.
One of the cool things of AMDU, is the possibility of dumping the content of a complete extent for a particular file, redirecting the output into a text file.
$amdu –diskstring ‘<path of device>’ –dump ‘<diskgroup name> -print ‘DG.F1.X1.B0.C256’
The previous command will dump 256 blocks of File 1 Extent 1 starting at block 0.
The results of the last command were:
************************** PRINTING XYZ.F1.X1.B0.C2 **************************
——————————– BLOCK 1 OF 2 ——————————–
…………………………………………………………………
disk:0 au:50 block:0 file:1 physical extent:1 block:0
kfed read ausz=1048576 blksz=4096 aunum=50 blknum=0 dev=/emea/bde/home/users/jfiguer2/disk.dd
At this point the conclusions were:
- The ASM metadata shows that Allocation Unit 50 from disk 0 belongs to File 1.
——————————– BLOCK 1 OF 2 ——————————–
…………………………………………………………………
disk:0 au:50 block:0 file:1 physical extent:1 block:0
kfed read ausz=1048576 blksz=4096 aunum=50 blknum=0 dev=/emea/bde/home/users/jfiguer2/disk.dd
- If the block belongs to file 1, the value for kfbh.block.obj field should have been 1 together with the value for kfbh.type, which should have been KFBTYP_FILEDIR. But that was not the case:
The error ORA-15196:
WARNING: cache failed to read fn=1 blk=256 from disk(s): 0
ORA-15196: invalid ASM block header [kfc.c:7997] [obj_kfbl] [1] [256] [3 != 1]
- The content dumped into the trace file was the same found on disk. The check validation failed because the data stored in the block was not part of the correct ASM metadata, in this case file directory.
The next step was to validate all the blocks in the same Allocation Unit. Those blocks belong to the same ASM metadata (KFBTYP_FILEDIR). One Allocation Unit is used exclusively by one unique file.
Example for block 1 from AU 50:
disk:0 au:50 block:1 file:1 physical extent:1 block:1
kfed read ausz=1048576 blksz=4096 aunum=50 blknum=1 dev=/emea/bde/home/users/jfiguer2/disk.dd
The solution
There was not an available backup for the database stored on the diskgroup, so it was required to keep the diskgroup mounted. Patching the ASM metadata, replacing the content of the first block from Allocation Unit 50, with a valid data.
It was not possible to rebuild the real data for the block 0, so it was replaced with block
- Additional patching was required, in order to adjust other fields in the block. Once the block was successfully patched, the diskgroup was mounted and queries on internal views did not dismount the diskgroup.
Opening the database report errors trying to identify one data file. The extent mapping for this file was stored in the patched block. Luckily that file was not relevant for the database. After setting the file offline, the database opened without errors.
Because was not possible to guarantee the integrity of the diskgroup, it was recommended to take a backup of the database and rebuild the diskgroup.