9i RAC-Backup And Recovery Quest

Transcription

9i RAC: Manual Backup and RecoveryShankar GovindanIntroductionWhen we plan to move a large database or a heavily used OLTP database to a cluster setup, toget enough mileage from the horizontal scaling of 9i RAC, we normally have lots of questions onhow we are going to handle various database maintenance that are traditionally done, how weare going to setup the same and what changes or tools need to be in place to handle RAC.In this paper we are going to look at one of the important administration jobs, the backing up of a9i RAC. We will also look at how we can recover from a simple data file loss or the other extreme,a Disaster recovery.Backup MethodWe can use RMAN to backup and recover a database, but if the database is huge or the backuptimes are more than 5 hours, then precious time is lost. Sites that have large databases can lookat Shadow copy.We use Hitachi RAID system and use Hitachi Shadow copy. (For more information on the HitachiShadow copy options, you can visit their website). The backup is done in the traditional manner: tablespace is put in the begin backup mode,the database files are copied andthen the tablespace is put back in the end backup mode.The way the shadow backup works, is by having a media server which has a set of disks. Thedisks are attached to the production server and synced (syncing of a terabyte takes 6 hours), thesyncing can happen any time. Once, the media server disks are in sync with the productionserver, we fire a script which will put all the tablespaces in backup mode. The sync mirror disk isthen sliced off the production server. This takes less than 3 minutes. The tablespaces are thenput back in the end backup mode. The backup is now complete and takes only 3 minutes. Themirror disk is then copied over to tape offline using the media server.We can also sync the backup archive log directories and copy them over to tape.Setting up environment for Manual BackupLet’s see how we can do a traditional backup in a 9i RAC environment. As we know that there aremultiple nodes in a RAC setup, all pointing to a single database. But the backup of databaseshould be initiated from a single node, usually the primary node. The primary node is the one yousetup first and do most of the maintenance from that node.The first thing to do is setup your profile in such a way that when you login as the owner of thedatabase, in our case usually ‘ORACLE’, your environment is setup correctly. We have toremember that the database name, let’s say RPTQ in our case is only used for connecting to thedatabase by the clients (this can also be masked/wrapped by using alias in tnsnames). When welogin locally to one of the nodes, we have to set our ORACLE SID to the instance name and notto the database name.

Let’s say we have a two node(LJCQS034 and LJCQS035) instance called rptq1 and rptq2,pointing to a database RPTQ ( remember that the SID’s are case sensitive in Unix and it is betterto set them all up in the lower case to avoid confusion) as shown in the figure below.When we login to server LJCQS034 hosting instance rptq1, then we set our ORACLE SIDpointing to rptq1( presuming our ORACLE HOME is shared or point the ORACLE HOME to theinstance rptq1’s oracle home).We will execute all the maintenance jobs as a DBA locally and setup our environment for a singlenode, the primary node, to execute all our automatic maintenance and monitoring scripts. Yourcron for the database will run in the primary node, although you can copy over the same cron tothe second node for failover and activate it manually.Setting up your .profile for a single node will be as shown --------------------------------------------# If you are oracle and logging in one of the instance, then set that specific -----------------------------------------if [ " /usr/ucb/whoami " "oracle" ] && [ " hostname " "LJCQS034" ]; then. ./rptq1.envelse. ./rptq2.envfi

You can include the same lines in your second node’s .profile. The env script rptq1.env shouldexist and will look like:#!/bin/ksh# -------------------------------- # # filename: /dba/etc/rptq1.env # # Created by : Shankar Govindan # Dated: 23-July-2003 # # # History : # # -------------------------------- # If TWO TASK is set, unset itif [ -n " TWO TASK" ]; thenunset TWO TASKfiexportLD LIBRARY PATH bc/lib:.export NLS DATE FORMAT DD-MON-RRexport NLS DATE LANGUAGE AMERICANexport NLS LANG AMERICAN AMERICA.WE8ISO8859P1export NLS NUMERIC CHARACTERS .,export NLS SORT BINARYexport ORACLE BASE /sv03export ORACLE HOME /sv03/sw/oracle/rptqdb/9.2.0export ORACLE SID rptq1export ORACLE TERM vt100unset ORAENV ASK; export ORAENV ASKexport ORA NLS32 aexport ORA NLS33 aexport ORA NLS aexportPATH dba/sbin:.export TNS ADMIN /sv03/sw/oracle/rptqdb/9.2.0/network/adminexport UDUMP /sv03/oracle/admin/rptq/udumpexport PFILE /sv03/oracle/admin/rptq/pfileexport BDUMP /sv03/oracle/admin/rptq/bdumpIf you need to automate the backup and if the scripts are run as a cron, then you need to sourcethe ORACLE SID from somewhere, either from a file or from the Oratab. The oratab entry for a 9iRAC instance will have the instance name and not the database name. For example we if we areinitiating the backup from the primary node, then the oratab of the primary node will have:rptq1:/sv03/sw/oracle/rptqdb/9.2.0:NThe secondary node or the second node will have rptq2 in the oratab.Manual Backup of data filesThe backup of the database is initiated from the primary node or primary instance. We don’t haveto setup anything or initiate anything from the secondary node or any other nodes that arepointing to the database.Login to the primary instance as a user who has the privilege to backup the datafiles and initiatethe backup commands. Alter tablespace tablespace name begin backup;Copy datafiles to backup directory/server and then end thetablespace backup.Alter tablespace tablespace name end backup;

We know that we need to do this for all the tablespaces. If you are using the shadow copyconcept like we do for large databases, then we need to put all the tablespaces into backup modeat once and then break the mirror disk. We then put all the tablespace to end backup mode. Themirror copy goes to tape backup offline.Manual Backup of archive logsThe archive logs that are generated at the time of backup should get into the backup set for ameaningful recovery. Incase of disaster and we need to recover to the last backup, then the lastfew archive logs are required to bring the database to some consistent state for a cancel basedrecovery.In a RAC environment we setup the archive format to include a %t. The %t will identify whichthread or instance that generated the archivelog.log archive format arch%s %t.arcTraditionally we force a log switch before and after a backup to get the timestamps in on the datafiles and to push the last few logs to be archived, so they get into the tape as part of the backupset. We normally execute,Alter system checkpoint;Alter system switch logfile;In a RAC environment, this command will only switch the logfile for a single instance. We have toremember that there are archive logs generated by the other instances too, and these archivesneed to be part of the backup set for any meaningful recovery.To force a log switch and push the logs to archive for all the instances, we need to execute,Alter system archive log current;Alter system archive log current;Once these archivelogs are pushed and visible, they are then compressed/moved to the backuparchivelog directory and are part of the backup. If you have setup shadow copy, then, the backuparchivelog directory can also be synced and mirrored, it can then be part of the backup set whenthe mirror is broken after the data files backup.Manual backup of server config fileThere is a server config file that stores all the database and instance information when the RACsetup is created. The name of the file is srvm.dbf.There is also a file in the /var/opt/oracle called srvConfig.locoracle ljcqs034: pwd/var/opt/oracleoracle ljcqs034: ls -ltrtotal 8-rw-r--r-1 oracledba-rw-r--r-1 oracledba-rw-rw-r-1 oracleother47 Aug 18 16:47 srvConfig.loc123 Aug 26 13:18 oraInst.loc812 Aug 29 18:35 oratabThe srvConfig.loc file contains the pointer to the location of the srvm.dbf.srvconfig loc /sv00/db00/oradata/rptq/srvm.dbf

We have to make sure that this file is in the oracle dbf file directory location, so that it gets backedup periodically. Incase you are not shadow copying your data files, then you need to back this fileup as part of your backup procedure. (You can always work around the loss of this file, byrecreating the RAC setup once again using the srvctl commands).Recovery from a lost data fileLet’s try to simulate the loss of a data file and try to recover the same in a RAC environment. Create a tablespaceSQL create tablespace testrac datafile'/Ora base/db11/oradata/drac/testrac 01.dbf'extent management local uniform size 4MSEGMENT SPACE MANAGEMENT AUTO;size 100MSQL alter tablespace testrac add datafile'/Ora base/db11/oradata/drac/testrac 02.dbf' size 100M;Tablespace altered.SQL select file name,bytes from dba data files where tablespace namelike 'TESTRAC';FILE ---/Ora base/db11/oradata/drac/testrac 01.dbf/Ora base/db11/oradata/drac/testrac 02.dbfBYTES---------104857600104857600SQL alter user sxgovind default tablespace testrac;User altered.SQL connect sgovind Create some tables and load dataSQL create table ruby as select * from dba objects;Table created.SQL create table hammerhead as select * from dba tables;Table created.SQL select segment name,segment type,tablespace name from dba segmentswhere tablespace name like 'TESTRAC';SEGMENT NAMESEGMENT TYPETABLESPACE HEADTABLETESTRACSQL alter tablespace TESTRAC begin backup;

Backup the tablespaceSQL select d.name,b.status,b.time from v datafile d,v backup bwhere d.file# b.file# and b.status -------/Ora base/db11/oradata/drac/testrac 01.dbf/Ora base/db11/oradata/drac/testrac 02.dbfACTIVEACTIVESTATUSTIME------------------ --------22-MAY-0322-MAY-03oracle ljcqs097: cp testrac 01.dbf HOMESQL alter tablespace testrac end backup;Tablespace altered. Remove data file associated with the tablespaceoracle ljcqs097: rm testrac 01.dbfSQL select file name,bytes from dba data files where tablespace namelike 'TESTRAC';select file name,bytes from dba data files where tablespace name like'TESTRAC'*ERROR at line 1:ORA-01116: error in opening database file 64ORA-01110: data file 64: '/Ora base/db11/oradata/drac/testrac 01.dbf'ORA-27041: unable to open fileSVR4 Error: 2: No such file or directoryAdditional information: 3 Recover data file associated with the tablespaceSQL alter database datafile'/Ora base/db11/oradata/drac/testrac 01.dbf' offline;Database altered.oracle ljcqs098: cp testrac 01.dbf /Ora base/db11/oradata/dracSQL alter database recover datafile'/Ora base/db11/oradata/drac/testrac 01.dbf';alter database recover datafile'/Ora base/db11/oradata/drac/testrac 01.dbf'*ERROR at line 1:ORA-00279: change 1936203896230 generated at 05/22/2003 14:01:09 neededforthread 2ORA-00289: suggestion : /shared/arch/oradata/drac/arch/arch 2 12.arcORA-00280: change 1936203896230 for thread 2 is in sequence #12SQL ALTER DATABASE RECOVER CANCEL;

SQL recover datafile '/Ora base/db11/oradata/drac/testrac 01.dbf';ORA-00279: change 1936203896230 generated at 05/22/2003 14:01:09 neededforthread 2ORA-00289: suggestion : /shared/arch/oradata/drac/arch/arch 2 12.arcORA-00280: change 1936203896230 for thread 2 is in sequence #12Specify log: { RET suggested filename AUTO CANCEL}AUTOORA-00279: change 1936203896361 generated at 05/22/2003 14:06:13 neededforthread 2ORA-00289: suggestion : /shared/arch/oradata/drac/arch/arch 2 13.arcORA-00280: change 1936203896361 for thread 2 is in sequence #13ORA-00278: log file '/shared/arch/oradata/drac/arch/arch 2 12.arc' nolongerneeded for this recoveryLog applied.Media recovery complete.SQL alter database datafile'/Ora base/db11/oradata/drac/testrac 01.dbf' online;Database altered.SQL select file name,bytes from dba data files where tablespace namelike 'TESTRAC';FILE ---/Ora base/db11/oradata/drac/testrac 01.dbf/Ora base/db11/oradata/drac/testrac y from a DisasterWhen we recover a lost data file, we recover them online and both the instances of the RACenvironment are up and running. We only offline the data file and try to recover the data file froma backup, the same way we do with a non-RAC setup.Let’s see what happens at the extreme end, when we have a disaster and we need to bring thelast backup from tape and recover to a point in time or apply the available archive logs and do acancel based recovery.We need to remember that we cannot bring both the instances of the database up. We need tomount the database in a single instance mode and then initiate a recovery. Once the recovery iscomplete we can bring all the other nodes up and running.We don’t have to do any changes to the environment or the parameter files.

Verify the srvm.dbf file exists in the location and is not overwritten by the copy over of datafiles.Check the node visibility and start GSD Daemon to check RAC config is okay.oracle ljcqs034: lsnodesljcqs034ljcqs035oracle ljcqs034: gsdctl statGSD is not running on the local nodeoracle ljcqs035: gsdctl statGSD is not running on the local nodeoracle ljcqs034: gsdctl startFailed to start GSD on local nodeWe have to make sure that we have the server configured correctly and the GSD daemon is upand running. In case the srvm.dbf file is lost and the GSD does not come up, then recreate theRAC configuration as explained below in the Recovery of srvConfig information section of thisnote.Once the gsd daemon is up and running, we startup the database in a single instance mode.Login to the primary node and verify the ORACLE SID,oracle ljcqs034: echo ORACLE SIDrptq1oracle ljcqs032: sqlplus /nologSQL*Plus: Release 9.2.0.3.0 - Production on Wed Oct 8 09:24:28 2003Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.SQL connect / as sysdbaConnected to an idle instance.SQL startup nomount;SQL recover database using backup controlfile until cancel;ORA-00279: change 1937810322614 generated at 08/16/2003 12:29:16 neededforthread 1ORA-00289: suggestion : /sv04/data/arch/rptq/arch703203 1.arcORA-00280: change 1937810322614 for thread 1 is in sequence #703203Specify log: { RET suggested filename AUTO CANCEL}ORA-00279: change 1937810322614 generated at 08/16/2003 12:29:16 neededforthread 2It does not suggest what archivelog file to apply for thread 2. You need to choose the latest onethat match the timestamp of the one applied for thread 1 and start the apply from there.Specify log: { RET suggested filename AUTO CANCEL}

/sv04/data/arch/rptq/arch3636 2.arcORA-00279: change 1937810325033 generated at 08/16/2003 12:35:50 neededforthread 1ORA-00289: suggestion : /sv04/data/arch/rptq/arch703204 1.arcORA-00280: change 1937810325033 for thread 1 is in sequence #703204ORA-00278: log file '/sv04/data/arch/rptq/arch703203 1.arc' no longerneededfor this recoverySpecify log: { RET suggested filename AUTO CANCEL}This time it suggests what file is required for thread 2. The first time seems to be an issue, once itapplies the first right archive log file for thread 2, it then prompts for more.ORA-00279: change 1937810464267 generated at 08/16/2003 13:07:38 neededforthread 2ORA-00289: suggestion : /sv04/data/arch/rptq/arch3637 2.arcORA-00280: change 1937810464267 for thread 2 is in sequence #3637ORA-00278: log file '/sv04/data/arch/rptq/arch3636 2.arc' no longerneeded forthis recoverySpecify log: { RET suggested filename AUTO CANCEL}ORA-00279: change 1937810500912 generated at 08/16/2003 13:12:27 neededforthread 2ORA-00289: suggestion : /sv04/data/arch/rptq/arch3638 2.arcORA-00280: change 1937810500912 for thread 2 is in sequence #3638ORA-00278: log file '/sv04/data/arch/rptq/arch3637 2.arc' no longerneeded forthis recoverySpecify log: { RET suggested filename AUTO CANCEL}ORA-00279: change 1937810531734 generated at 08/16/2003 13:15:44 neededforthread 2ORA-00289: suggestion : /sv04/data/arch/rptq/arch3639 2.arcORA-00280: change 1937810531734 for thread 2 is in sequence #3639ORA-00278: log file '/sv04/data/arch/rptq/arch3638 2.arc' no longerneeded forthis recoverySpecify log: { RET suggested filename AUTO CANCEL}CANCELMedia recovery cancelled.SQL alter database open resetlogs;Database altered.

If your database is setup to use the tempfile temporary tablespace, then you need to recreatethem.ALTER TABLESPACE TEMP ADD TEMPFILE'/sv00/db13/oradata/rptq/temp 01.dbf'SIZE 2044M REUSE AUTOEXTEND OFF;ALTER TABLESPACE TEMP ADD TEMPFILE'/sv00/db13/oradata/rptq/temp 02.dbf'SIZE 2044M REUSE AUTOEXTEND OFF;ALTER TABLESPACE TEMP ADD TEMPFILE'/sv00/db13/oradata/rptq/temp 03.dbf'SIZE 2044M REUSE AUTOEXTEND OFF;ALTER TABLESPACE TEMP ADD TEMPFILE'/sv00/db13/oradata/rptq/temp 04.dbf'SIZE 2044M REUSE AUTOEXTEND OFF;ALTER TABLESPACE TEMP ADD TEMPFILE'/sv00/db13/oradata/rptq/temp 05.dbf'SIZE 2044M REUSE AUTOEXTEND OFF;Recovery from loss of srvConfig informationIf you loose the srvm.dbf file and the GSD daemon does not come up, then its time to recreatethe srvm.dbf, by recreating the server configuration.oracle ljcqs098: which gsd/shared/oracle/product/9.2.0/bin/gsdoracle ljcqs098: cd ORACLE HOMEoracle ljcqs032: gsdctl startFailed to start GSD on local nodeThe following command will wipe out all the previous information that existed in the srvm.dbf file.Once you setup the environment, you should not execute the below command again.oracle ljcqs032: srvconfig -init –foracle.ops.mgmt.rawdevice.RawDeviceException: PRKR-1025 : file/var/opt/oracle/srvConfig.loc does not contain property srvconfig locat java.lang.Throwable. init (Compiled Code)at java.lang.Exception. init (Compiled Code)at oracle.ops.mgmt.rawdevice.RawDeviceException. init eUtil.getDeviceName(Compiled Code)at oracle.ops.mgmt.rawdevice.RawDeviceUtil. init (CompiledCode)at led Code)The file srvm.dbf was not part of the backup and hence was not recovered. The file does not existand the srvconfig command does point out the same.

Work around is to create a new file and update the /var/opt/oracle/srvConfig.loc file of the newlocation of the srvm.dbf file.oracle ljcqs032: touch /sv00/db00/oradata/rpt1/srvm.dbforacle ljcqs032: chmod 755 srvm.dbforacle ljcqs032: cd /var/opt/oracleoracle ljcqs032: vi srvConfig.logand add this line and save.srvconfig loc /sv00/db00/oradata/rpt1/srvm.dbfNow start the GSD daemon and then start adding the database and instance information to thesrvm.dbf file.oracle ljcqs098: gsdctl startSuccessfully started GSD on local nodeoracle ljcqs097: srvctl add database -d rptq -o /shared/oracle/product/9.2.0oracle ljcqs098: srvctl add instance -d rptq -i rptq1 -n ljcqs097oracle ljcqs098: srvctl add instance -d rptq -i rptq2 -n ljcqs098Check if the configuration has been setup correctly.oracle ljcqs097: srvctl configrptqoracle ljcqs097: srvctl config database -d rptqljcqs097 rptq1 /shared/oracle/product/9.2.0ljcqs098 rptq2 /shared/oracle/product/9.2.0Shankar Govindan works as a Sr. Oracle DBA at CNF Inc, Portland, Oregon. ShankarGovindan is Oracle Certified 7, 8 and 8I; you can contact him atshankargovindan@yahoo.com. Note: The above info as usual is of my individual tests andopinions and has nothing to do with the company I work for or represent.

put back in the end backup mode. The backup is now complete and takes only 3 minutes. The mirror disk is then copied over to tape offline using the media server. We can also sync the backup archive log directories and copy them over to tape. Setting up environment for Manual Backup Let's see how we can do a traditional backup in a 9i RAC .