Email: service@parnassusdata.com 7 x 24 online support!
ORA-600 [3020] "Stuck Recovery"
If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.
Parnassusdata Software Database Recovery Team
Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com
Format: ORA-600 [3020] [a] [b] [c] [d] [e]
VERSIONS:
version 6.0 and above DESCRIPTION:
This is called a 'STUCK RECOVERY'.
There is an inconsistency between the information stored in the redo and the information stored in a database block being recovered.
ARGUMENTS:
For Oracle 9.2 and earlier: Arg [a] Block DBA
Arg [b] Redo Thread Arg [c] Redo RBA Seq
Arg [d] Redo RBA Block No Arg [e] Redo RBA Offset.
For Oracle 10.1
Arg [a] Absolute file number of the datafile. Arg [b] Block number
Arg [c] Block DBA
FUNCTIONALITY:
kernel cache recovery parallel
IMPACT:
INSTANCE FAILURE during recovery.
SUGGESTIONS:
There have been cases of receiving this error when RECOVER has been issued, but either some datafiles were not restored to disk, or the restoration has not finished.
Therefore, ensure that the entire backup has been restored and that the restore has finished PRIOR to issuing a RECOVER database command.
If problems continue, consider restoring from a backup and doing a point-in-time recovery to a time PRIOR to the one implied by
the ORA-600[3020] error.
Example:
SQL> recover database until time 'YYYY-MON-DD:HH:MI:SS'; This error can also be caused by a lost update.
During normal operations, block updates/writes are being performed to a number of files including database datafiles, redo log files, archived redo log files etc.
This error can be reported if any of these updates are lost for some reason.
Therefore, thoroughly check your operating system and disk hardware.
In the case of a lost update, restore an old copy of the datafile and attempt to recover and roll forward again.
If the Known Issues section below does not help in terms of identifying a solution, please submit the trace files and alert.log to Oracle Support Services for further analysis.
Known Issues:
Related Articles
Note:1265884.1 Resolving ORA-752 or ORA-600 [3020] During Standby Recovery
Known Bugs
You can restrict the list below to issues likely to affect one of the following versions by clicking the relevant button:
|
ORA-1172 OR ORA-600[3020] Quick Support Debugging Guide
Given that this error could be due to a lost update to either the datafile and/or the redo files, one thing to do would be to get dumps of both.
Refer to the following notes for information on how to do this :
Note:1031381.6 How to Dump Redo Log File Information
Note:45852.1 Taking BLOCKDUMPS on Oracle8 - The ALTER SYSTEM DUMP command **INTERNAL ONLY**
It is especially useful to focus on the particular datafile block implied by the ORA-600 [3020]. Dump all redo for that block, starting with the log sequence before the restored datafile,
up to the point of failure.
Blockdumps of the datafile should be taken at various stages of the recovery process - for example right after doing the restore; and then again after each redo log file has been applied; and just before the SCN (or point in time) that the ORA-600 was reported; and just after redo
for the given SCN has been applied; and so on.
The idea being that you may narrow down the point at which something went wrong.
ORA-600 [3020] [a] [b] [c] [d] [e]
Versions: 7.0.X - 8.0.5 Source: knl/kcrp.c
===========================================================================
Meaning:
Recovering database and REDO entry has an INC/SEQUENCE number greater than that on the database block.
In Oracle8 where the block structure is different it still means the same basic thing - the redo record we have has an SCN / SEQ which does not match the database block we are wanting to apply it to.
This is called 'STUCK RECOVERY'.
---------------------------------------------------------------------------
Argument Description:
a. Block DBA
b. Redo Thread
c. Redo RBA Seq
d. Redo RBA Block No
e. Redo RBA Offset.
---------------------------------------------------------------------------
Diagnosis:
There are many possible causes for this most resulting from either invalid sets of commands or media corruption.
- Has customer restored a backup, open the DB, closed the DB and then tried to recover without re-loading the backup ??
** If they say no GET THE ALERT LOG and prove it - it's easy to waste a lot of time when this was the real cause.
- If the problem was a lost update, restore of an OLDER copy of the datafile and a recovery may work.
- The quick option here is to restore and recover UP TO an SCN
just before the problem. Customer will lose some data as this is an incomplete recovery so you need to know the priority:
a) TIME or b) Minimal Data Loss.
- Check the tracefile for the 3020 report. It is possible to signal OERI(3020) if the datafile block is corrupt.
Eg: OERI(3020) with Inc=0 Seq=1 reported for the disk block is possibly a zeroed out data-block on the datafile and NOT a redo issue.
- Is parallel server being used ?
If so another thread may have the required changes and they haven't been read for some reason. Check for OS and DLM errors. Try to make sure only ONE instance attempts any recovery by shutting down other instances.
- Are hot backups being used ??
Check that the backups are occuring correctly between BEGIN and END backup commands.
- Up to Oracle 8i you can try to skip the error using the hidden
parameter:CORRUPT_BLOCKS_ON_STUCK_RECOVERY
Be aware that blocks will be marked corrupt if this is used so make sure the error is not on a dictionary object !!
- From 9i you can try to skip the error using the 'ALLOW .. CORRUPTION' clause of the RECOVER DATABASE command.
(Note that in 11g onwards you may need to set DB_LOST_WRITE_PROTECT=NONE for the "ALLOW 1 CORRUPTION" clause to work)
- For logging a bug you need:
(a) Where an error is reported, get any trace files produced and relevant redo log dumps if necessary.
Document completely the circumstances leading
up to the error including configuration; type of backup (manual, RMAN, incremental, etc.);
the exact commands used to create the backups and
the exact commands used to do the restore and recovery.
(b) Provide a reproducible test case or dial-in information to development.
(c) Where relevant, determine if generic or port-specific issue.
Articles:
Parameter:CORRUPT_BLOCKS_ON_STUCK_RECOVERY
---------------------------------------------------------------------------
Example OERI:3020 dump in Oracle8
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*** 1999.07.02.01.02.58.000
RECOVERY OF THREAD 1 STUCK AT BLOCK 14099 OF FILE 6
REDO RECORD - Thread:1 RBA: 0x0045ee.00009c8b.0010 LEN: 0x00e8 VLD: 0x01 SCN scn: 0x0000.0951d868 07/02/99 00:57:19
CHANGE #2 TYP:2 CLS: 1 AFN:6 DBA:0x01803713 SCN:0x0000.09519e84 SEQ: 1 OP:10.4
buffer tsn: 5 rdba: 0x01803713 (6/14099) scn:0x0000.0951b4d4 seq:0x01 flg:0x00 tail:0xb4d40601
frmt:0x02 chkval:0x0000 type:0x06=trans data
*** 1999.07.02.01.02.58.000
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments:
[3020], [25179923], [1], [17902], [40075], [16], [], []
Breaking this up shows the following SCN information: Redo SCN: 0x0000.0951d868
SCN expected on block: 0x0000.09519e84 SCN on Buffer: 0x0000.0951b4d4
In this case ithe actual SCN marked in the block
in the buffer cache is _later_ than the expected SCN, but _before_
the SCN level for the redo change vector. Normally, the SCN in the CHANGE line must match exactly the one on the block (in the buffer cache);
and redo application brings that block to the (later) SCN/SEQ
on the redo record. One possible explanation is that the system
saw a stale copy of the datafile block when the redo was generated, so that the SCN in the CHANGE line is the wrong one. That would indicate a possible lost update to the datafile.
More commonly, the ORA-600 [3020] error indicates that the SCN on the block is BEHIND the SCN on the redo we want to apply,
so there is a GAP. I.e., the REDO is ahead of the block.
However, in this example there is still a problem even though the block initially appears to be AHEAD of the REDO (normally OK).
Why? The SCN on the block is BELOW the most recent commit SCN.
If we applied the current redo record then the SCN on the block would advance to the more recent commit SCN so if this block is truely
ahead of this redo record it must have an SCN >= the most recent commit SCN. It hasn't, so something is wrong - most likely a lost datafile write which occurred between two items of redo causing
two redo records using the same block SCN to base their change on.
Known issues caused by 3rd party provider
1. Lost IO / Corruption caused by EMC. From JET SR: 3-1260172021. EMC bug ID: emc230687
ID: emc230687
Domain: EMC1 HP-UX 11v1
Solution Class: 3.X Compatibility
ORA-600 [3020] during recovery caused bu LOST IO due to EMC bug ID: emc230687.
No errors raised within the I/O stack at the host level nor from a Timefinder perspective, API ECA debug data void of any anomalies Timefinder w/Oracle best practices process is being adhered to ( recoverable business solution process )
This also caused some corruption errors like:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [29], [2121334], [6108] kdbchk: xaction header lock count mismatch
No errors raised with the Symm as well. Corruption issue resolved by applying fix 44177, see the following Primus article for more i ETA emc204393
2. Lost IO by EMC. EMC solution # is emc251398
Fixed by the latest microcode version 5773.163.113 applied on the Symmetrix DMX (no changes on V-MAX cabins). EMC solution # is emc2
==
Ensure that this note comes out on top in Metalink when searched ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 3020 3020 3020 3020 3020 3020 3020 3020 3020 3020
3020 3020 3020 3020 3020 3020 3020 3020 3020 3020