Principles Of Transaction-Oriented Database Recovery

Transcription

Principles of Transaction-Oriented Database RecoveryTHEO HAERDERFachbereich Informatik, University of Kaiserslautern, West GermanyANDREAS REUTER 1IBM Research Laboratory, San Jose, California 95193In this paper, a terminological framework is provided for describing different transactionoriented recovery schemes for database systems in a conceptual rather than animplementation-dependentway. By introducing the terms materialized database,propagation strategy, and checkpoint, we obtain a means for classifying arbitraryimplementations from a unified viewpoint. This is complemented by a classificationscheme for logging techniques, which are precisely defined by using the other terms. It isshown that these criteria are related to all relevant questions such as speed and scope ofrecovery and amount of redundant information required. The primary purpose of thispaper, however, is to establish an adequate and precise terminology for a topic in whichthe confusion of concepts and implementational aspects still imposes a lot of problems.Categories and Subject Descriptors: D.4.5 [Operating Systems]: Reliability--fau/ttolerance; H.1.0 [Models and Principles]: General; H.2.2 [Database Management]:Physical Design--recovery and restart; H.2.4 [Database Management]: Systems-transactmn processing; H.2.7 [Database Management]: Database Administration--logging and recoveryGeneral Terms: Databases, Fault Tolerance, TransactionsINTRODUCTIONtechnology. The methods and technologyDatabase technology has seen tremendous of such a discipline should be well represented in the literature by systematic surprogress during the past ten years. Conveysof the field. There are, in fact, a numcepts and facilities that evolved in the sinber of recent publications that attempt togle-user batch environments of the earlydays have given rise to efficient multiuser summarize what is known about differentdatabase systems with user-friendly inter- aspects of database management [e.g., Astrahan et al. 1981; Stonebraker 1980; Grayfaces, distributed data management, etc.From a scientific viewpoint, database sys- et al. 1981; Kohler 1981; Bernstein andtems today are established as a mature Goodman 1981; Codd 1982]. These papersdiscipline with well-approved methods and fall into two categories: (1) descriptions ofinnovative prototype systems and (2) thorough analyses of special problems and their1 Permanent address: Fachbereich Informatik, Uni- solutions, based on a clear methodologicaland terminological framework. We are conversity of Kaiserslautern, West Germany.Permission to copy without fee all or part of this material is granted provided that the copies are not made ordistributed for direct commercial advantage, the ACM copyright notice and the title of the publication and itsdate appear, and notice is given that copying is by permission of the Association for Computing Machinery. Tocopy otherwise, or to republish, requires a fee and/or specific permission. 1983 ACM 0360-0300/83/1200-0287 00.75ComputingSurveys,Vol. 1 , No. 4, December1983

288 T. Haerder and A. ReuterCONTENTSINTRODUCTION1. DATABASE RECOVERY: WHAT IT ISEXPECTED TO DO1.1 What Is a Transaction?1.2 Which Failures Have to Be Anticipated1.3 Summary of Recovery Actions2. THE MAPPING HIERARCHY OF A DBMS2.1 The Mapping Process: Objects and Operations2.2 The Storage Hierarchy: ImplementationalEnvironment2.3 Different Views of a Database2.4 Mapping Concepts for Updates3. CRASH RECOVERY3.1 State of the Database after a Crash3.2 Types of Log Information to SupportRecoveryActions3.3 Examples of Recovery Techniques3.4 Examples of Loggingand Recovery Components3 5 Evaluation of Loggingand Recovery Concepts4. ARCHIVE RECOVERY5 CONCLUSIONACKNOWLEDGMENTSREFERENCESvtributing to the second category in the fieldof database recovery. In particular, we areestablishing a systematic framework for establishing and evaluating the basic concepts for fault-tolerant database operation.The paper is organized as follows. Section 1 contains a short description of whatrecovery is expected to accomplish andwhich notion of consistency we assume.This involves introducing the transaction,which has proved to be the major paradigmfor synchronization and recovery in advanced database systems. This is also themost important difference between this paper and Verhofstadt's survey, in whichtechniques for file recovery are describedwithout using a particular notion of consistency [Verhofstadt 1978]. Section 2 provides an implementational model for database systems, that is, a mapping hierarchyof data types. Section 3 introduces the keyconcepts of our framework, describing thedatabase states after a crash, the type oflog information required, and additionalmeasures for facilitating recovery. CrashComputing Surveys, Vol. 15, No. 4, December 1983recovery is demonstrated with three sampleimplementation techniques. Section 4 applies concepts addressed in previous sections on media recovery, and Section 5summarizes the scope of our taxonomy.1. DATABASE RECOVERY: WHAT IT ISEXPECTED TO DOUnderstanding the concepts of database recovery requires a clear comprehension oftwo factors: the type of failure the database has tocope with, and the notion of consistency that is assumedas a criterion for describing the state tobe reestablished.Before beginning a discussion of thesefactors, we would like to point out that thecontents of this section rely on the description of failure types and the concept of atransaction given by Gray et al. [1981].1.1 What Is a Transaction?It was observed quite early that manipulating data in a multiuser environment requires some kind of isolation to preventuncontrolled and undesired interactions. Auser (or process) often does things whenworking with a database that are, up to acertain point in time, of tentative or preliminary value. The user may read some dataand modify others before finding out thatsome of the initial input was wrong, invalidating everything that was done up to thatpoint. Consequently, the user wants to remove what he or she has done from thesystem. If other users (or processes) havealready seen the "dirty data" [Gray et al.1981] and made decisions based upon it,they obviously will encounter difficulties.The following questions must be considered: How do they get the message that someof their input data has disappeared, whenit is possible that they have already finished their job and left the terminal? How do they cope with such a situation?Do they also throw away what they havedone, possibly affecting others in turn?Do they reprocess the affected parts oftheir program?

Principles o/ Transaction-Oriented Database RecoveryFUNDS TRANSFER: PROCEDURE; BEGIN TRANSACTION;ON ERROR DO; RESTORE TRANSACTION;GET INPUT MESSAGE;PUT MESSAGE ('TRANSFER FAILED');GO TO COMMIT;END;GET INPUT MESSAGE;EXTRACT ACCOUNT DEBIT,ACCOUNT CREDIT,AMOUNT FROM MESSAGE; UPDATE ACCOUNTSSET BALANCE-- BALANCE- AMOUNTWHERE ACCOUNTS NUMBER ACCOUNTS DEBIT; UPDATE ACCOUNTSSET BALANCE-- BALANCE AMOUNTWHERE ACCOUNTS NUMBER ACCOUNTS CREDIT; INSERT INTO HISTORY(DATE, MESSAGE);PUT MESSAGE ('TRANSFER DONE');COMMIT: COMMIT TRANSACTION;END; 289/*in case of error*//*undo all work*//*reacquire input*//*report failure*//*get and parse input*//*do debit*//*do credit*//*keep audit trail*//*report success*//*commitupdates*//*end of program*/Figure 1. Example of a transaction program. (From Gray et al. [1981].)These situations and dependencies havebeen investigated thoroughly by Bjork andDavies in their studies of the so-called"spheres of control" [Bjork 1973; Davies1973, 1978]. They indicate that data beingoperated by a process must be isolated insome way that lets others know the degreeof reliability provided for these data, thatis, Will the data be changed without notification to others? Will others be informed about changes? Will the value definitely not change anymore?This ambitious concept was restricted touse in database systems by Eswaran et al.[1976] and given its current name, the"transaction." The transaction basically reflects the idea that the activities of a particular user are isolated from all concurrentactivities, but restricts the degree of isolation and the length of a transaction. Typically, a transaction is a short sequence ofinteractions with the database, using operators such as FIND a record or MODIFYan item, which represents one meaningfulactivity in the user's environment. Thestandard example that is generally used toexplain the idea is the transfer of moneyfrom one account to another. The corresponding transaction program is given inFigure 1.The concept of a transaction, which includes all database interactions between BEGIN TRANSACTION and COMMIT TRANSACTION in the above example, requires that all of its actions beexecuted indivisibly: Either all actions areproperly reflected in the database or nothing has happened. No changes are reflectedin the database if at any point in timebefore reaching the COMMIT TRANSACTION the user enters the ERRORclause containing the R E S T O R ETRANSACTION. To achieve this kind ofindivisibility, a transaction must have fourproperties:Atomicity. It must be of the all-or-nothing type described above, and the user must,whatever happens, know which state he orshe is in.Consistency. A transaction reaching itsnormal end (EOT, end of transaction),thereby committing its results, preservesthe consistency of the database. In otherwords, each successful transaction by definition commits only legal results. This conComputing Surveys, Vol. 15, No. 4, December 1983

290*T.Haerder and A. DWRITEREADWRITECOMMITAB()RT - YSTEM ABORTSTRANSACTIONFigure 2. Three possible outcomes of a transaction.(From Gray et al. [1981].)dition is necessary for the fourth property,durability.Isolation. Events within a transactionmust be hidden from other transactionsrunning concurrently. If this were not thecase, a transaction could not be reset to itsbeginning for the reasons sketched above.The techniques that achieve isolation areknown as synchronization, and since Grayet al. [1976] there have been numerouscontributions to this topic of database research [Kohler 1981].Durability. Once a transaction has beencompleted and has committed its results tothe database, the system must guaranteethat these results survive any subsequentmalfunctions. Since there is no sphere ofcontrol constituting a set of transactions,the database management system (DBMS)has no control beyond transaction boundaries. Therefore the user must have a guarantee that the things the system says havehappened have actually happened. Since,by definition, each transaction is correct,the effects of an inevitable incorrect transaction (i.e., the transaction containingfaulty data) can only be removed by countertransactions.These four properties, atomicity, consistency, isolation, and durability (ACID), describe the major highlights of the transaction paradigm, which has influenced manyaspects of development in database systems. We therefore consider the questionof whether the transaction is supported bya particular system to be the ACID test ofthe system's quality.In summary, a transaction can terminatein the three ways illustrated in Figure 2. Itis hoped that the transaction will reach itscommit point, yielding the all case (as inthe all-or-nothing dichotomy). SometimesComputingSurveys,Vol. 15, No. 4, December1983the transaction detects bad input or otherviolations of consistency, preventing a normal termination, in which case it will resetall that it has done (abort). Finally, a transaction may run into a problem that canonly be detected by the system, such astime-out or deadlock, in which case its effects are aborted by the DBMS.In addition to the above events occurringduring normal execution, a transaction canalso be affected by a system crash. This isdiscussed in the next section.1.2 Which Failures Have to Be AnticipatedIn order to design and implement a recovery component, one must know preciselywhich types of failures are to be considered,how often they will occur, how much timeis expected for recovery, etc. One must alsomake assumptions about the reliability ofthe underlying hardware and storage media, and about dependencies between different failure modes. However, the list ofanticipated failures will never be completefor these reasons: For each set of failures that one can thinkof, there is at least one that was forgotten. Some failures are extremely rare. Thecost of redundancy needed to cope withthem may be so high that it may be asensible design decision to exclude thesefailures from consideration. If one ofthem does occur, however, the system willnot be able to recover from the situationautomatically, and the database will becorrupted. The techniques for handlingthis catastrophe are beyond the scope ofthis paper.We shall consider the following types offailure:Transaction Failure. The transaction offailure has already been mentioned in theprevious section. For various reasons, thetransaction program does not reach its normal commit and has to be reset back to itsbeginning, either at its own request or onbehalf of the DBMS. Gray indicates that 3percent of all transactions terminate abnormally, but this rate is not likely to be aconstant [Gray et al. 1981]. From our ownexperiences with different application da-

Principles of Transaction-Oriented Database Recoverytabases, and from Gray's result [Effelsberget al. 1981; Gray 1981], we can concludethat* Within one application, the ratio oftransactions that abort themselves israther constant, depending only on theamount of incorrect input data, the quality of consistency checking performed bythe transaction program, etc. The ratio of transactions being abortedby the DBMS, especially those caused bydeadlocks, depends to a great extent onthe degree of parallelism, the granularityof locking used by the DBMS, the logicalschema (there may be hot spot data, ordata that are very frequently referencedby many concurrent transactions), andthe degree of interference between concurrent activities (which is, in turn, veryapplication dependent).For our classification, it is sufficient tosay that transaction failures occur 10-100times per minute, and that recovery fromthese failures must take place within thetime required by the transaction for itsregular execution.System Failure. The system failures thatwe are considering can be caused by a bugin the DBMS code, an operating systemfault, or a hardware failure. In each of thesecases processing is terminated in an uncontrolled manner, and we assume that thecontents of main memory are lost. Sincedatabase-related secondary (nonvolatile)storage remains unaffected, we require thata recovery take place in the same amountof time that would have been required forthe execution of all interrupted transactions. If one transaction is executed withinthe order of 10 milliseconds to 1 second,the recovery should take no more than afew minutes. A system failure is assumedto occur several times a week, depending onthe stability of both the DBMS and itsoperational environment.Media Failure. Besides these more orless normal failures, we have to anticipatethe loss of some or all of the secondarystorage holding the database. There areseveral causes for such a problem, the most 291common of which are* bugs in the operating system routines forwriting the disk, hardware errors in the channel or diskcontroller, head crash, loss of information due to magnetic decay.Such a situation can only be overcomeby full redundancy, that is, by a copy of thedatabase and an audit trail covering whathas happened since then.Magnetic storage devices are usually veryreliable, and recovery from a media failureis not likely to happen more often thanonce or twice a year. Depending on the size ofa database, the media used for storing thecopy, and the age of the copy, recovery ofthis type will take on the order of 1 hour.1.3 Summary of Recovery ActionsAs we mentioned in Section 1.1,the notionof consistency that we use for defining thetargets of recovery is tied to the transactionparadigm, which we have encapsulated inthe "ACID principle." According to thisdefinition, a database is consistent if andonly ifit contains the results of successfultransactions. Such a state will hereafter becalled transaction consistent or logicallyconsistent. A transaction, in turn, must notsee anything but effects of complete transactions (i.e., a consistent database in thoseparts that it uses), and will then, by definition, create a consistent update of thedatabase. What does that mean for therecovery component?Let us for the moment ignore transactions being aborted during normal execution and consider only a system failure (acrash). We might then encounter the situation depicted in Figure 3. TransactionsT1, T2, and T3 have committed before thecrash, and therefore will survive. Recoveryafter a system failure must ensure that theeffects of all successful transactions areactually reflected in the database. But whatis to be done with T4 and T5? Transactionshave been defined to be atomic; they eithersucceed or disappear as though they hadnever been entered. There is therefore nochoice about what to do after a systemComputing turve , Vol. 15, No. 4, D mber 1988

292T1T2T3 T. Haerder and A. ReuterII[,IIT4 IT5ITime,, ) SYSTEMCRASHFigure3. Scenario for discussing transaction-oriented recovery. (From Gray et al. [1981].)failure; the effects of all incomplete transactions must be removed from the database.Clearly, a recovery component adhering tothese principles will produce a transactionconsistent database. Since all successfultransactions have contributed to the database state, it will be the most recent transaction-consistent state. We now can distinguish four recovery actions coping with different situations [Gray 1978]:Transaction UNDO. If a transactionaborts itself or must be aborted by thesystem during normal execution, this willbe called "transaction UNDO." By definition, UNDO removes all effects of thistransaction from the database and does notinfluence any other transaction.Global UNDO. When recovering from asystem failure, the effects of all incompletetransactions have to be rolled back.Partial REDO. When recovering from asystem failure, since execution has beenterminated in an uncontrolled manner, results of complete transactions may not yetbe reflected in the database. Hence theymust be repeated, if necessary, by the recovery component.Global REDO. Gray terms this recoveryaction "archive recovery" [Gray et al.1981]. The database is assumed to be physically destroyed; we therefore must startfrom a copy that reflects the state of thedatabase some days, weeks, or months ago.Since transactions are typically short, weneed not consider incomplete transactionsover such a long time. Rather we have tosupplement the copy with the effects of alltransactions that have committed since thecopy was created.With these definitions we have introduced the transaction as the only unit ofrecovery in a database system. This is anideal condition that does not exactly matchComputing Surveys, Vol. 15, No. 4, December 1983reality. For example, transactions might benested, that is, composed of smaller subtransactions. These subtransactions alsoare atomic, consistent, and isolated--butthey are not durable. Since the results ofsubtransactions are removed whenever theenclosing transaction is undone, durabilitycan only be guaranteed for the highesttransaction in the composition hierarchy.A two-level nesting of transactions can befound in System R, in which an arbitrarynumber of save points can be generatedinside a transaction [Gray et al. 1981]. Thedatabase and the processing state can bereset to any of these save points by theapplication program.Another extension of the transactionconcept is necessary in fields like CAD.Here the units of consistent state transitions, that is, the design steps, are so long{days or weeks) that it is not feasible totreat them as indivisable actions. Hencethese long transactions are consistent, isolated, and durable, but they are not atomic[Gray 1981]. It is sufficient for the purposeof our taxonomy to consider "ideal" transactions only.2. THE MAPPING HIERARCHY OF A DBMSThere are numerous techniques and algorithms for implementing database recovery,many of which have been described in detailby Verhofstadt [1978]. We want to reducethese various methods to a small set of basicconcepts, allowing a simple, yet preciseclassification of all reasonable implementation techniques; for the purposes of illustration, we need a basic model of the DBMSarchitecture and its hardware environment.This model, although it contains many familiar terms from systems like INGRES,System R, or those of the CODASYL [1973,1978] type, is in fact a rudimentary database architecture that can also be appliedto unconventional approaches like CASSMor DIRECT [Smith and Smith 1979], although this is not our purpose here.2.1 The Mapping Process:Objects and OperationsThe model shown in Table 1 describes themajor steps of dynamic abstraction fromthe level of physical storage up to the user

Principles of Transaction-Oriented Database RecoveryTable 1. Description of theLevel of abstractionAuxiliary mapping dataNonprocedural or algebraic accessRelations, views tuplesLogical schema descriptionRecord-oriented, navigational accessRecords, sets, hierarchies,networksLogical and physicalschema descriptionRecord and accesspath managementPhysical records,access pathsFree space tables, DBkey translation tablesPropagation controlSegments, pagesPage tables, Bloom filtersFile managementFiles, blocksDirectories, VTOCs,etc.File Management. The lowest layer operates directly on the bit patterns stored onsome nonvolatile, direct access device likea disk, drum, or even magnetic bubblememory. This layer copes with the physicalcharacteristics of each storage type andabstracts these characteristics into fixedlength blocks. These blocks can be read,written, and identified by a (relative) blocknumber. This kind of abstraction is usuallydone by the data management system(DMS) of a normal general-purpose operating system.293DB-MappingHaerarchyObjectsinterface. At the bottom, the database consists of some billions of bits stored on disk,which are interpreted by the DBMS intomeaningful information on which the usercan operate. With each level of abstraction(proceeding from the bottom up), the objects become more complex, allowing morepowerful operations and being constrainedby a larger number of integrity rules. Theuppermost interface supports one of thewell-known data models, whether relational, networklike, or hierarchical.Note that this mapping hierarchy is virtually contained in each DBMS, althoughfor performance reasons it will hardly bereflected in the module structure. We shallbriefly sketch the characteristics of eachlayer, with enough detail to establish ourtaxonomy. For a more complete descriptionsee Haerder and Reuter [1983]. database literature, but for reasons that willbecome clear in the following sections westrictly distinguish between pages andblocks. A page is a fixed-length partition ofa linear address space and is mapped intoa physical block by the propagation controllayer. Therefore a page can be stored indifferent blocks during its lifetime in thedatabase, depending on the strategy implemented for propagation control.Access Path Management. This layer implements mapping functions much morecomplicated than those performed by subordinate layers. It has to maintain all physical object representations in the database(records, fields, etc.), and their related access paths (pointers, hash tables, searchtrees, etc.) in a potentially unlimited linearvirtual address space. This address space,which is divided into fixed-length pages, isprovided by the upper interface of the supporting layer. For performance reasons, thepartitioning of data into pages is still visibleon this level.Propagation2 Control. This level is notusually considered separately in the currentNavigational Access Layer. At the top ofthis layer we find the operations and objectsthat are typical for a procedural data manipulation language (DML). Occurrencesof record types and members of sets arehandled by statements like STORE, MODIFY, FIND NEXT, and CONNECT [CODASYL 1978]. At this interface, the usernavigates one record at a time through ahierarchy, through a network, or along logical access paths.2 This term is introduced in Section 2.4; its meaningis not essential to the understanding of this paragraph.Nonprocedural Access Layer. This levelprovides a nonprocedural interface to theComputingSurveys, Vol. 15, No. 4, December 1983

294 T.Haerder and A. ReuterTemporary LogSupports Transaction UNDO,Global UNDO, partial REDOHostComputerDBMS !" ' ) CodeLog BufferDatabaseBuffer mt- Archive LogSupports Global REDOPhysical Copy of I the Database Archive Copy ofthe Database( Figure 4. Storage hierarchy of a DBMS during normal mode of operation.database. With each operation the user canhandle sets of results rather than singlerecords. A relational model with high-levelquery languages like SQL or QUEL is aconvenient example of the abstractionachieved by the top layer [Chamberlin1980; Stonebraker et al. 1976].On each level, the mapping of higherobjects to more elementary ones requiresadditional data structures, some of whichare shown in Table 1.2.2 The Storage Hierarchy:Implementational EnvironmentBoth the number of redundant data required to support the recovery actions described in Section 1 and the methods ofcollecting such data are strongly influencedby various properties of the different storage media used by the DBMS. In particular,the dependencies between volatile and permanent storage have a strong impact onalgorithms for gathering redundant information and implementing recovery measures [Chen 1978]. As a descriptional framework we shall use a storage hierarchy, asComputingSurveys,VoL 15, No. 4, December1983shown in Figure 4. It closely resembles thesituation that must be dealt with by mostof today's commercial database systems.The host computer, where the application programs and DBMS are located, hasa main memory, which is usually volatile. 8Hence we assume that the contents of thedatabase buffer, as well as the contents ofthe output buffers to the log files, are lostwhenever the D B M S terminates abnormally. Below the volatile main memorythere is a two-level hierarchy of permanentcopies of the database. One level containsan on-line version of the database in directaccess memory; the other contains an archive copy as a provision against loss of theon-line copy. While both are functionallysituated on the same level, the on-line copyis almost always up-to-date, whereas thearchive copy can contain an old state of thedatabase. Our main concern here is database recovery, which, like all provisions for3 In some real-time applications main memory is supported by a battery backup. It is possible that in thefuture mainframes will have some stable buffer storage. However, we are not considering these conditionshere.

Principles of Transaction-Oriented Database Recoveryfault tolerance, is based upon redundancy.We have mentioned one type of redundancy: the archive copy, kept as a startingpoint for reconstruction of an up-to-dateon-line version of the database (globalREDO). This is discussed in more detail inSection 4. To support this, and other recovery actions introduced in Section 1, twotypes of log files are required:Temporary Log. The information collected in this file supports crash recovery;that is, it contains information needed toreconstruct the most recent database (DB)buffer. Selective transaction UNDO requires random access to the log records.Therefore we assume that the temporarylog is located on disk.Archive Log. This file supports globalREDO after a media failure. It depends onthe availability of the archive copy andmust contain all changes committed to thedatabase after the state reflected in thearchive copy. Since the archive log is alwaysprocessed in sequential order, we assumethat the archive log is written on magnetictape.2.3 Different Views of a DatabaseIn Section 2.1, we indicated that the database looks different at each level of abstraction, with each level using different objectsand interfaces. But this is not what wemean by "different views of a database" inthis section. We have observed that theprocess of abstraction really begins at Level3, up to which there is only a more convenient representation of data in external storage. At this level, abstraction is dependenton which pages actually establish the linearaddress space, that is, which block is readwhen a certain page is referenced. In theevent of a failure, there are different possibilities for retrieving the contents of apage. These possibilities are denoted bydifferent views of the database:The current database comprises all objects accessible to the DBMS during normalprocessing. The current contents of allpages can be found on disk, except for thosepages that have been recently modified.Their new contents are found in the DB 295buffer. The mapping hierarchy is completely correct.The materialized database is the statethat the D B M S finds at restart after a crashwithout having applied any log information.There is no buffer. Hence some page modifications (even of successful transactions)may not be reflected in the on-line copy. Itis also possible that a new state of a pagehas been written to disk, but the controlstructure that maps pages to blocks has notyet been updated. In this case, a referenceto such a page will yield the old value. Thisview of the database is what the recoverysystem has to transform into the most recent logically consistent current database.The physical database is composed of allblocks of the on-line copy containing pageimages--current or obso

the database management system (DBMS) has no control beyond transaction bound- aries. Therefore the user must have a guar- antee that the things the system says have happened have actually happened. Since, by definition, each transaction is correct, . Principles of Trans