Thursday, February 26, 2009

Recovery Scenarios using RMAN

Point in Time Recovery
run {
allocate channel d1 type disk;
set until time
"to_date('2007-03-31:20:00:00','yyyy-mm-dd:hh24:mi:ss')";
restore database;
recover database;
alter database open resetlogs;
}

Recovery from missing or corrupted datafile(s):
This scenario deals with a situation where a datafile has gone missing, or is corrupted beyond repair. For concreteness, we look at a case where a datafile is missing. Below is a transcript of an SQL Plus session that attempts to open a database with a missing
datafile (typed commands in bold, lines in italics are my comments, all other lines are feedback from SQL Plus):

--open SQL Plus from the command line without logging on to database
C:\>sqlplus /nolog
SQL*Plus: Release 9.2.0.4.0 - Production on Tue Jan 25 14:52:41 2005
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
--Connect to the idle Oracle process as a privileged user and start up instance
SQL> connect / as sysdba
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 131555128 bytes
Fixed Size 454456 bytes
Variable Size 88080384 bytes
Database Buffers 41943040 bytes
Redo Buffers 1077248 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: 'D:\ORACLE_DATA\DATAFILES\ORCL\USERS01.DBF'
SQL>

The error message tells us that file# 4 is missing. Note that although the startup command has failed, the database is in the mount state. Thus, the database control file, which is also the RMAN repository can be accessed by the instance and by RMAN. We now recover the missing file using RMAN. The transcript of the recovery session is reproduced below (bold lines are typed commands, comments in italics, the rest is feedback from RMAN):

--logon to RMAN
C:\>rman target /
Recovery Manager: Release 9.2.0.4.0 - Production Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database: ORCL (DBID=1507972899)
--restore missing datafile
RMAN> restore datafile 4;
Starting restore at 26/JAN/05
using target database controlfile instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=14 devtype=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: sid=15 devtype=DISK
channel ORA_DISK_1: starting datafile backupset
restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00004 to D:\ORACLE_DATA\DATAFILES\ORCL\USERS01.DBF
channel ORA_DISK_1: restored backup piece 1 piece handle=D:\BACKUPQGB0UEC_1_1.BAK tag=TAG20050124T152708 params=NULL
channel ORA_DISK_1: restore complete
Finished restore at 26/JAN/05
--recover restored datafile - RMAN applies all logs automatically
RMAN> recover datafile 4;
Starting recover at 26/JAN/05
using channel ORA_DISK_1
using channel ORA_DISK_2
starting media recovery
archive log thread 1 sequence 4 is already on disk as file E:\ORACLE_ARCHIVE\ORCL_4.ARC
archive log thread 1 sequence 5 is already on disk as file C:\ORACLE_ARCHIVE\ORCL_5.ARC
archive log thread 1 sequence 6 is already on disk as file E:\ORACLE_ARCHIVE\ORCL_6.ARC
archive log thread 1 sequence 7 is already on disk as file E:\ORACLE_ARCHIVE\ORCL_7.ARC
archive log filename=E:\ORACLE_ARCHIVE\ORCL_4.ARC thread=1 sequence=4
archive log filename=C:\ORACLE_ARCHIVE\ORCL_5.ARC thread=1 sequence=5
media recovery complete
Finished recover at 26/JAN/05
--open database for general use
RMAN> alter database open;
database opened
RMAN>

In the above scenario, the database is already in the mount state before the RMAN session is initiated. If the database is not mounted, you should issue a "startup mount" command before attempting to restore the missing datafile. The database must be mounted before any datafile recovery can be done.
If the database is already open when datafile corruption is detected, you can recover the datafile without shutting down the database. The only additional step is to take the relevant tablespace offline before starting recovery. In this case you
would perform recovery at the tablespace level. The commands are:

C:\>rman target /
Recovery Manager: Release 9.2.0.4.0 - Production Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database: ORCL (DBID=1507972899)
--offline affected tablespace
RMAN> sql 'alter tablespace USERS offline immediate';
using target database controlfile instead of recovery catalog
sql statement: alter tablespace USERS offline immediate
--recover offlined tablespace
RMAN> recover tablespace USERS;
Starting recover at 26/JAN/05
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=14 devtype=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: sid=12 devtype=DISK
starting media recovery
media recovery complete
Finished recover at 26/JAN/05
--online recovered tablespace
RMAN> sql 'alter tablespace USERS online';
sql statement: alter tablespace USERS online
RMAN>

Recovery from block corruption:
It is possible to recover corrupted blocks using RMAN backups. This is a somewhat exotic scenario, but it can be useful in certain circumstances, as illustrated by the following example. Here's the situation: a user connected to SQLPlus gets a data block corruption error when she queries a table. Here's a part of the session transcript:

SQL> connect testuser/testpassword
Connected.
SQL> select count(*) from test_table;
select count(*) from test_table
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4,
block # 2015)
ORA-01110: data file 4:
'D:\ORACLE_DATA\DATAFILES\ORCL\USERS01.DBF'

Since we know the file and block number, we can perform block level recovery using RMAN. This is best illustrated by example:
C:\>rman target /
Recovery Manager: Release 9.2.0.4.0 - Production Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database: ORCL (DBID=1507972899)
--restore AND recover specific block
RMAN> blockrecover datafile 4 block 2015;
Starting blockrecover at 26/JAN/05
using target database controlfile instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=19 devtype=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: sid=20 devtype=DISK
channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00004
channel ORA_DISK_1: restored block(s) from backup piece 1
piece handle=E:\BACKUPQGB0UEC_1_1.BAK
tag=TAG20050124T152708 params=NULL
channel ORA_DISK_1: block restore complete
starting media recovery
media recovery complete
Finished blockrecover at 26/JAN/05
RMAN>

Now our user should be able to query the table from her SQLPlus session. Here's her session transcript after block recovery.
SQL> select count(*) from test_table;
COUNT(*)
----------
217001
SQL>

A couple of important points regarding block recovery:
1. Block recovery can only be done using RMAN.
2. The entire database can be open while performing block recovery.
3. Check all database files for corruption. This is important - there could be other corrupted blocks.
Verification of database files can be done using RMAN or the dbverify utility. To verify using RMAN simply do a complete database backup with default settings.
If RMAN detects block corruption, it will exit with an error message pointing out the guilty file/block Recovery from missing or corrupted redo log group
Case 1: A multiplexed copy of the missing log is available.
If a redo log is missing, it should be restored from a multiplexed copy, if possible. This is the only way to recover without any losses. Here's an example, where I attempt to startup from SQLPlus when a redo log is missing:

SQL> startup
ORACLE instance started.
Total System Global Area 131555128 bytes
Fixed Size 454456 bytes
Variable Size 88080384 bytes
Database Buffers 41943040 bytes
Redo Buffers 1077248 bytes
Database mounted.
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1:
'D:\ORACLE_DATA\LOGS\ORCL\REDO03A.LOG'
SQL>

To fix this we simply copy REDO03A.LOG from its multiplexed location on E: to the above location on E:. After copying the file, we issue an "alter database open" from the above SQLPlus session:

SQL> alter database open;
Database altered.
SQL>
That's it - the database is open for use.

Case 2: All members of a log group lost.
In this case an incomplete recovery is the best we can do. We will lose all transactions from the missing log and all subsequent logs. We illustrate using the same example as above. The error message indicates that members of log group 3 are missing. We don't have a copy of this file, so we know that an incomplete recovery is required. The first step is to determine how much can be recovered. In order to do this, we query the V$LOG view (when in the mount state) to find the system change number (SCN) that we can recover to (Reminder: the SCN is a monotonically increasing number that is incremented whenever a commit is
issued):

--The database should be in the mount state for v$log access
SQL> select first_change# from v$log where group#=3 ;
FIRST_CHANGE#
-------------
370255
SQL>
The FIRST_CHANGE# is the first SCN stamped in the missing log. This implies that the last SCN stamped in the previous log is 370254 (FIRST_CHANGE#-1). This is the highest SCN that we can recover to. In order to do the recovery we must first restore ALL datafiles to this SCN, followed by recovery (also up to this SCN). This is an incomplete recovery, so we must open the database resetlogs after we're done. Here's a transcript of the recovery session (typed commands in bold, comments in italics, all other lines are RMAN feedback):

C:\>rman target /
Recovery Manager: Release 9.2.0.4.0 - Production Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database: ORCL (DBID=1507972899)
--Restore ENTIRE database to determined SCN
RMAN> restore database until scn 370254;
Starting restore at 26/JAN/05
using channel ORA_DISK_1
using channel ORA_DISK_2
channel ORA_DISK_1: starting datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00001 to D:\ORACLE_DATA\DATAFILES\ORCL\SYSTEM01.DBF
restoring datafile 00004 to D:\ORACLE_DATA\DATAFILES\ORCL\USERS01.DBF
channel ORA_DISK_2: starting datafile backupset
restore
channel ORA_DISK_2: specifying datafile(s) to restore from backup set
restoring datafile 00002 to D:\ORACLE_DATA\DATAFILES\ORCL\UNDOTBS01.DBF
restoring datafile 00003 to D:\ORACLE_DATA\DATAFILES\ORCL\TOOLS01.DBF
channel ORA_DISK_2: restored backup piece 1
piece handle=E:\BACKUPGB14IB_1_1.BAK
tag=TAG20050124T171139 params=NUL
channel ORA_DISK_2: restore complete
channel ORA_DISK_1: restored backup piece 1
piece handle=E:\BACKUPGB14IB_1_1.BAK
tag=TAG20050124T171139 params=NUL
channel ORA_DISK_1: restore complete
Finished restore at 26/JAN/05
--Recover database
RMAN> recover database until scn 370254;
Starting recover at 26/JAN/05
using channel ORA_DISK_1
using channel ORA_DISK_2
starting media recovery
archive log thread 1 sequence 9 is already on disk as file E:\ORACLE_ARCHIVE\ORCL_9.ARC
archive log thread 1 sequence 10 is already on disk as file E:\ORACLE_ARCHIVE\ORCL_10.ARC
archive log thread 1 sequence 11 is already on disk as file E:\ORACLE_ARCHIVE\ORCL_11.ARC
archive log thread 1 sequence 12 is already on disk as file E:\ORACLE_ARCHIVE\ORCL_12.ARC
archive log filename=E:\ORACLE_ARCHIVE\ORCL_9.ARC
thread=1 sequence=9
archive log filename=E:\ORACLE_ARCHIVE\ORCL_10.ARC
thread=1 sequence=10
media recovery complete
Finished recover at 26/JAN/05
--open database with RESETLOGS (see comments below)
RMAN> alter database open resetlogs;
database opened
RMAN>

The following points should be noted:
1. The entire database must be restored to the SCN that has been determined by querying v$log.
2. All changes beyond that SCN are lost. This method of recovery should be used only if you are sure that you cannot do better. Be sure to multiplex your redo logs, and (space permitting) your archived logs!
3. The database must be opened with RESETLOGS, as a required log has not been applied. This resets the log sequence to zero, thereby rendering all prior backups worthless. Therefore, the first step after opening a database RESETLOGS is to take a fresh backup. Note that the RESETLOGS option must be used for any incomplete recovery.

Recovery from missing or corrupted control file:
Case 1: A multiplexed copy of the control file is available.
On startup Oracle must read the control file in order to find out where the datafiles and online logs are located. Oracle expects to find control files at locations specified in the CONTROL_FILE initialization parameter. The instance will fail to mount the
database if any one of the control files are missing or corrupt. A brief error message will be displayed, with further details recorded in the alert log. The exact error message will vary depending on what has gone wrong. Here's an example:

SQL> startup
ORACLE instance started.
Total System Global Area 135338868 bytes
Fixed Size 453492 bytes
Variable Size 109051904 bytes
Database Buffers 25165824 bytes
Redo Buffers 667648 bytes
ORA-00205: error in identifying controlfile, check alert log for more info
SQL>
On checking the alert log, as suggested, we find the following:
ORA-00202: controlfile:
'e:\oracle_dup_dest\controlfile\ORCL\control02.ctl'
ORA-27046: file size is not a multiple of logical block size
OSD-04012: file size mismatch (OS 5447783)

The above corruption was introduced by manually editing the control file when the database was closed. The solution is simple, provided you have at least one uncorrupted control file - replace the corrupted control file with a copy using operating system
commands. Remember to rename the copied file. The database should now start up without any problems Recover database by means of a backup from the old incarnation

1. Obtain the primary key of the prior incarnation with a LIST command:
# obtain primary key of old incarnation
LIST INCARNATION OF DATABASE trgt;
List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset
SCN Reset Time
------- ------- ------- ------ -------
---------- ----
1 2 TRGT 1224038686 PARENT 1 02-OCT-02
1 582 TRGT 1224038686 CURRENT 59727 10-OCT-02

2. Make sure the database is started but not mounted:
SHUTDOWN FORCE NOMOUNT
3. Reset the incarnation to the primary key that you just obtained:
# reset database to old incarnation
RESET DATABASE TO INCARNATION 2;
4. Recover the database, performing the following actions in the RUN command:
• Set the end time for recovery to the time just before the loss of the data.
• If automatic channels are not configured, then manually allocate one or more channels.
• Restore the control file and mount it.
• Restore and recover the database.

For example, run the following commands:
RUN
{
# set time to just before data was lost.
SET UNTIL TIME 'Oct 8 2002 07:55:00';
RESTORE CONTROLFILE; # FROM AUTOBACKUP not needed in catalog mode
ALTER DATABASE MOUNT; # mount database after restoring control file
RESTORE DATABASE;
RECOVER DATABASE;
}
5. If recovery is successful, then reset the online redo logs:
# this command automatically resets the database so that this incarnation is the new incarnation
ALTER DATABASE OPEN RESETLOGS;

Recovery
Restore from Cold Backup
A cold backup is a backup performed while the database is off-line and unavailable to its users. Procedure for restore from cold backup
1. Make sure source and destination databases are on the save version and patch (Very Important)
2. create control file script from trace file
SQLPLUS> ALTER DATABASE backup controlfile to trace;
SQLPLUS>SHOW PARAMETER UDUMP
3. Shutdown the Source Database
4. Copy all datafiles and trace file to the destination database (do not copy redo log files and control_files)
5. startup nomount pfile=init.ora
6. Create the script from the trace file. Here is the sample
CREATE CONTROLFILE SET DATABASE "DEV" RESETLOGS
NOARCHIVELOG
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
MAXINSTANCES 8
MAXLOGHISTORY 2726
LOGFILE
GROUP 1 (
'/u02/oracle/oradata/DEV/redot1g1m2.log',
'/u01/oracle/oradata/DEV/redot1g1m1.log'
) SIZE 10M,
GROUP 2 (
'/u02/oracle/oradata/DEV/redot1g2m2.log',
'/u01/oracle/oradata/DEV/redot1g2m1.log'
) SIZE 10M,
GROUP 3 (
'/u01/oracle/oradata/DEV/redot1g3m1.log',
'/u02/oracle/oradata/DEV/redot1g3m2.log'
) SIZE 10M,
GROUP 4 (
'/u01/oracle/oradata/DEV/redot1g4m1.log',
'/u02/oracle/oradata/DEV/redot1g4m2.log'
) SIZE 10M
DATAFILE
'/u02/oracle/oradata/DEV/system01.dbf',
'/u02/oracle/oradata/DEV/undotbs01.dbf',
'/u02/oracle/oradata/DEV/sysaux01.dbf',
'/u02/oracle/oradata/DEV/users01.dbf',
'/u02/oracle/oradata/DEV/example01.dbf',
'/u02/oracle/oradata/DEV/data01.dbf',
'/u02/oracle/oradata/DEV/data02.dbf',
'/u02/oracle/oradata/DEV/data03.dbf',
'/u02/oracle/oradata/DEV/data04.dbf',
'/u02/oracle/oradata/DEV/index01.dbf',
'/u02/oracle/oradata/DEV/index02.dbf',
'/u02/oracle/oradata/DEV/index03.dbf',
'/u02/oracle/oradata/DEV/index04.dbf',
'/u02/oracle/oradata/DEV/lob01.dbf',
'/u02/oracle/oradata/DEV/bea01.dbf',
'/u02/oracle/oradata/DEV/beaidx01.dbf',
'/u02/oracle/oradata/DEV/bealob01.dbf'
CHARACTER SET WE8ISO8859P1 ;
7. Open the database with resetlogs
SQL> @control_file.sql;
Control file created.
8. Open the database with resetlogs
ALTER DATASASE OPEN RESETLOGS;
9. Create Temporary tablespace
ALTER TABLESPACE TEMP ADD TEMPFILE
‘/opt/oracle/schedwin/temp01.dbf’ size 150M;


Restore from Hot Backup
A hot backup is a backup performed while the database is on-line and available for read/write. Except for oracle exports, one can only do on-line backups when running in ARCHIVELOG mode.
CASE 1: Restore database using RMAN Full backup.
#1 Copy the following files to the target server
-The database backup piece
Controlfile backup piece
Parameter file. i.e pfile.
#2 Edit the parameter with new environment specific values
#3 Start the database in nomount mode, using RMAN.
$ rman target /
Recovery Manager: Release 9.2.0.6.0 - 64bit Production Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database (not started)
RMAN> startup nomount
Oracle instance started
Total System Global Area 773295264 bytes
Fixed Size 732320 bytes
Variable Size 134217728 bytes
Database Buffers 637534208 bytes
Redo Buffers 811008 bytes
#4 Restore control file from backup piece.
RMAN> restore controlfile from
'schedcd_633997804_109_1';
#5 Mount the database
RMAN>alter database mount;
#6 Catalog the backup piece
RMAN>catalog backuppiece ‘schedcd_633997804_109_1';
#7 Get to know the last sequence available in the archivelog backup using the following command.
RMAN> list backup of archivelog all;
#8 Rename the redologsfiles, so that they can be created in a new location when opened in resetlogs.
SQL>alter database rename file
‘/opt/oracle/oradata/schedcd/redo/redoa1.log’ to
‘/opt/oracle/oradata/schedwin/redo/redoa1.log’;

#9 Restore the datafiles to new locations and recover
RMAN> run
{ set until sequence
set newname for datafile 1 to
/opt/u01/oradata/schedwin/dbf/system01.dbf;
set newname for datafile 2 to
/opt/u01/oradata/schedwin/dbf/perfstat01.dbf;
set newname for datafile 3 to
/opt/u01/oradata/schedwin/dbf/cwmlite01.dbf;
set newname for datafile 4 to
/opt/u01/oradata/schedwin/dbf/drsys01.dbf;
set newname for datafile 5 to
/opt/u01/oradata/schedwin/dbf/indx01.dbf;
set newname for datafile 6 to
/opt/u01/oradata/schedwin/dbf/odm01.dbf;
set newname for datafile 7 to
/opt/u01/oradata/schedwin/dbf/tools01.dbf;
set newname for datafile 8 to
/opt/u01/oradata/schedwin/dbf/users01.dbf;
set newname for datafile 9 to
/opt/u01/oradata/schedwin/dbf/xdb01.dbf;
set newname for datafile 10 to
/opt/u02/oradata/schedwin/dbf/users02.dbf;
set newname for datafile 11 to
/opt/u03/oradata/schedwin/dbf/users03.dbf;
set newname for datafile 12 to
/opt/u01/oradata/schedwin/dbf/schedwin_indx01.dbf;
set newname for datafile 14 to
/opt/u02/oradata/schedwin/dbf/schedwin_indx02.dbf;
set newname for datafile 16 to
/opt/u01/oradata/schedwin/dbf/undotblspce01.dbf;
restore database validate; # This is optional
restore database;
switch datafile all;
recover database;
alter database open resetlogs;
}

No comments: