Steal But No Force: Efficient Hardware Undo Redo Logging For Persistent .

Transcription

Steal but No Force: Efficient Hardware Undo Redo Loggingfor Persistent Memory SystemsMatheus Almeida Ogleari , Ethan L. Miller ,† , Jishen Zhao ,‡ University of California, Santa Cruz † Pure Storage ‡ University of California, San Diego {mogleari,elm,jishen.zhao}@ucsc.edu ‡ jzhao@ucsd.eduAbstract—Persistent memory is a new tier of memory thatfunctions as a hybrid of traditional storage systems and mainmemory. It combines the benefits of both: the data persistenceof storage with the fast load/store interface of memory. Mostprevious persistent memory designs place careful control overthe order of writes arriving at persistent memory. This canprevent caches and memory controllers from optimizing systemperformance through write coalescing and reordering. Weidentify that such write-order control can be relaxed byemploying undo redo logging for data in persistent memorysystems. However, traditional software logging mechanisms areexpensive to adopt in persistent memory due to performanceand energy overheads. Previously proposed hardware loggingschemes are inefficient and do not fully address the issues insoftware.To address these challenges, we propose a hardwareundo redo logging scheme which maintains data persistenceby leveraging the write-back, write-allocate policies used incommodity caches. Furthermore, we develop a cache forcewrite-back mechanism in hardware to significantly reducethe performance and energy overheads from forcing datainto persistent memory. Our evaluation across persistentmemory microbenchmarks and real workloads demonstratesthat our design significantly improves system throughput andreduces both dynamic energy and memory traffic. It alsoprovides strong consistency guarantees compared to softwareapproaches.I. I NTRODUCTIONPersistent memory presents a new tier of data storagecomponents for future computer systems. By attaching NonVolatile Random-Access Memories (NVRAMs) [1], [2], [3],[4] to the memory bus, persistent memory unifies memoryand storage systems. NVRAM offers the fast load/storeaccess of memory with the data recoverability of storage in asingle device. Consequently, hardware and software vendorsrecently began adopting persistent memory techniques intheir next-generation designs. Examples include Intel’s ISAand programming library support for persistent memory [5],ARM’s new cache write-back instruction [6], Microsoft’sstorage class memory support in Windows OS and inmemory databases [7], [8], Red Hat’s persistent memorysupport in the Linux kernel [9], and Mellanox’s persistentmemory support over fabric [10].Though promising, persistent memory fundamentallychanges current memory and storage system design assump-tions. Reaping its full potential is challenging. Previous persistent memory designs introduce large performance and energy overheads compared to native memory systems, withoutenforcing consistency [11], [12], [13]. A key reason is thewrite-order control used to enforce data persistence. Typicalprocessors delay, combine, and reorder writes in caches andmemory controllers to optimize system performance [14],[15], [16], [13]. However, most previous persistent memorydesigns employ memory barriers and forced cache writebacks (or cache flushes) to enforce the order of persistentdata arriving at NVRAM. This write-order control is suboptimal for performance and do not consider natural cachingand memory scheduling mechanisms.Several recent studies strive to relax write-order controlin persistent memory systems [15], [16], [13]. However,these studies either impose substantial hardware overhead byadding NVRAM caches in the processor [13] or fall back tolow-performance modes once certain bookkeeping resourcesin the processor are saturated [15].Our goal in this paper is to design a high-performancepersistent memory system without (i) an NVRAM cache orbuffer in the processor, (ii) falling back to a low-performancemode, or (iii) interfering with the write reordering by cachesand memory controllers. Our key idea is to maintain datapersistence with a combined undo redo logging scheme inhardware.Undo redo logging stores both old (undo) and new (redo)values in the log during a persistent data update. It offers akey benefit: relaxing the write-order constraints on cachingpersistent data in the processor. In our paper, we showthat undo redo logging can ensure data persistence withoutneeding strict write-order control. As a result, the caches andmemory controllers can reorder the writes like in traditionalnon-persistent memory systems (discussed in Section II-B).Previous persistent memory systems typically implementeither undo or redo logging in software. However, highperformance software undo redo logging in persistent memory is unfeasible due to inefficiencies. First, software logginggenerates extra instructions in software, competing for limited hardware resources in the pipeline with other criticalworkload operations. Undo redo logging can double thenumber of extra instructions over undo or redo logging

alone. Second, logging introduces extra memory traffic inaddition to working data access [13]. Undo redo loggingwould impose more than double extra memory traffic insoftware. Third, the hardware states of caches are invisibleto software. As a result, software undo redo logging, an ideaborrowed from database mechanisms designed to coordinatewith software-managed caches, can only conservatively coordinate with hardware caches. Finally, with multithreadedworkloads, context switches by the operating system (OS)can interrupt the logging and persistent data updates. Thiscan risk the data consistency guarantee in multithreadedenvironment (Section II-C discusses this further).Several prior works investigated hardware undo or redologging separately [17], [15] (Section VII). These designshave similar challenges such as hardware and energy overheads [17], and slowdown due to saturated hardware bookkeeping resources in the processor [15]. Supporting bothundo and redo logging can further exacerbate the issues.Additionally, hardware logging mechanisms can eliminatethe logging instructions in the pipeline, but the extra memorytraffic generated from the log still exists.To address these challenges, we propose a combinedundo redo logging scheme in hardware that allows persistent memory systems to relax the write-order control byleveraging existing caching policies. Our design consists oftwo mechanisms. First, a Hardware Logging (HWL) mechanism performs undo redo logging by leveraging writeback write-allocate caching policies [14] commonly used inprocessors. Our HWL design causes a persistent data updateto automatically trigger logging for that data. Whether astore generates an L1 cache hit or miss, its address, oldvalue, and new value are all available in the cache hierarchy.As such, our design utilizes the cache block writes to updatethe log with word-size values. Second, we propose a cacheForce Write-Back (FWB) mechanism to force write-backsof cached persistent working data in a much lower, yet moreefficient frequency than in software models. This frequencydepends only on the allocated log size and NVRAM writebandwidth, thus decoupling cache force write-backs fromtransaction execution. We summarize the contributions ofthis paper as following: This is the first paper to exploit the combination ofundo redo logging to relax ordering constraints on cachesand memory controllers in persistent memory systems.Our design relaxes the ordering constraints in a waythat undo logging, redo logging, or copy-on-write alonecannot. We enable efficient undo redo logging for persistentmemory systems in hardware, which imposes substantially more challenges than implementing either undo- orredo- logging alone. We develop a hardware-controlled cache force write-backmechanism, which significantly reduces the performanceoverhead of force write-backs by efficiently tuning thewrite-back frequency. We implement our design through lightweight softwaresupport and processor modifications.II. BACKGROUND AND M OTIVATIONPersistent memory is fundamentally different from traditional DRAM main memory or their NVRAM replacement,due to its persistence (i.e., crash consistency) propertyinherited from storage systems. Persistent memory needsto ensure the integrity of in-memory data despite systemcrashes and power loss [18], [19], [20], [21], [16], [22],[23], [24], [25], [26], [27]. The persistence property is notguaranteed by memory consistency in traditional memorysystems. Memory consistency ensures a consistent globalview of processor caches and main memory, while persistentmemory needs to ensure that the data in the NVRAM mainmemory is standalone consistent [16], [19], [22].A. Persistent Memory Write-order ControlTo maintain data persistence, most persistent memory designs employ transactions to update persistent data and carefully control the order of writes arriving in NVRAM [16],[19], [28]. A transaction (e.g., the code example in Figure 1) consists of a group of persistent memory updatesperformed in the manner of “all or nothing” in the face ofsystem failures. Persistent memory systems also force cachewrite-backs (e.g., clflush, clwb, and dccvap) and usememory barrier instructions (e.g., mfence and sfence)throughout transactions to enforce write-order control [28],[15], [13], [19], [29].Recent works strived to improve persistent memory performance towards a native non-persistent system [15], [16],[13]. In general, whether employing logging in persistentmemory or not, most face similar problems. (i) They introduce nontrivial hardware overhead (e.g., by integratingNVRAM cache/buffers or substantial extra bookkeepingcomponents in the processor) [13], [30]. (ii) They fall back tolow-performance modes once the bookkeeping componentsor the NVRAM cache/buffer are saturated [13], [15]. (iii)They inhibit caches from coalescing and reordering persistent data writes [13] (details discussed in Section VII).Forced cache write-backs ensure that cached data updates made by completed (i.e., committed) transactionsare written to NVRAM. This ensures NVRAM is in apersistent state with the latest data updates. Memory barriersstall subsequent data updates until the previous updates bythe transaction complete. However, this write-order controlprevents caches from optimizing system performance viacoalescing and reordering writes. The forced cache writebacks and memory barriers can also block or interfere withsubsequent read and write requests that share the memorybus. This happens regardless of whether these requests areindependent from the persistent data access or not [26], [31].

(a)(b)(c)Undo logging onlyTx begindo some readsdo some computationUncacheable Ulog( addr(A), old val(A) )write new val(A) //new val(A) A’clwb //force writebackTx commitTx beginRedo logging onlydo some readsdo some computationUncacheable Rlog( addr(A), new val(A) )memory barrierwrite new val(A) //new val(A) A’Tx commitTx beginUndo redo loggingdo some readsdo some computationUncacheable log( addr(A), new val(A), old val(A) )write new val(A) //new val(A) A’clwb // can be delayedTx commitFigure 1.Tx beginLoggingUncacheableUndo logging of store A1Ulog A1Ulog A2Write Astore A’1 store A’1Tx commitCacheableUlog AN store A’Nclwb A’1.A’NTime“Write A” consists of N store instructionsLoggingWrite ARlog A’1Rlog A’2 Tx commitMemory barrierRedo logging of the transactionRlog A’Nstore A’1store A’1 store A’NTimeLoggingRlog A’1Rlog A’2Ulog A1Ulog A2Write Astore A’1 store A’1Tx commitRlog A’NUlog AN store A’NTimeComparison of executing a transaction in persistent memory with (a) undo logging, (b) redo logging, and (c) both undo and redo logging.B. Why Undo Redo LoggingWhile prior persistent memory designs only employ eitherundo or redo logging to maintain data persistence, weobserve that using both can substantially relax the aforementioned write-order control placed on caches.Logging in persistent memory. Logging is widely used inpersistent memory designs [19], [29], [22], [15]. In additionto working data updates, persistent memory systems canmaintain copies of the changes in the log. Previous designstypically employ either undo or redo logging. Figure 1(a)shows that an undo log records old versions of data beforethe transaction changes the value. If the system fails duringan active transaction, the system can roll back to the statebefore the transaction by replaying the undo log. Figure 1(b)illustrates an example of a persistent transaction that usesredo logging. The redo log records new versions of data.After system failures, replaying the redo log recovers thepersistent data with the latest changes tracked by the redolog. In persistent memory systems, logs are typically uncacheable because they are meant to be accessed only duringthe recovery. Thus, they are not reused during applicationexecution. They must also arrive in NVRAM in order, whichis guaranteed through bypassing the caches.Benefits of undo redo logging. Combining undo and redologging (undo redo) is widely used in disk-based databasemanagement systems (DBMSs) [32]. Yet, we find that wecan leverage this concept in persistent memory design torelax the write-order constraints on the caches.Figure 1(a) shows that uncacheable, store-granular undologging can eliminate the memory barrier between the logand working data writes. As long as the log entry (U log A1 )is written into NVRAM before its corresponding store tothe working data (store A01 ), we can undo the partiallycompleted store after a system failure. Furthermore, storeA01 must traverse the cache hierarchy. The uncacheableU log A1 may be buffered (e.g., in a four to six cache-line sized entry write-combining buffer in x86 processors).However, it still requires much less time to get out of theprocessor than cached stores. This naturally maintains thewrite ordering without explicit memory barrier instructionsbetween the log and the persistent data writes. That is,logging and working data writes are performed in a pipelinelike manner (like in the timeline in Figure 1(a)). is similar tothe “steal” attribute in DBMS [32], i.e, cached working dataupdates can steal the way into persistent storage before transaction commits. However, a downside is that undo loggingrequires a forced cache write-back before the transactioncommits. This is necessary if we want to recover the latesttransaction state after system failures. Otherwise, the datachanges made by the transaction will not be committed tomemory.Instead, redo logging allows transactions to commit without explicit cache write-backs because the redo log, onceupdates complete, already has the latest version of thetransactions (Figure 1(b)). This is similar to the “no-force”attribute in DBMS [32], i.e., no need to force the workingdata updates out of the caches at the end of transactions.However, we must use memory barriers to complete the redolog of A before any stores of A reach NVRAM. We illustratethis ordering constraint by the dashed blue line in thetimeline. Otherwise, a system crash when the redo loggingis incomplete, while working data A is partially overwrittenin NVRAM (by store A0k ), causes data corruption.Figure 1(c) shows that undo redo logging combines thebenefits of both “steal” and “no-force”. As a result, wecan eliminate the memory barrier between the log andpersistent writes. A forced cache write-back (e.g., clwb) isunnecessary for an unlimited sized log. However, it can bepostponed until after the transaction commits for a limitedsized log (Section II-C).

Tx begin(TxID)do some readsdo some computationUncacheable log(addr(A),new val(A),old val(A))write new val(A) // A’clwb //conservatively usedTx commit(a)Figure 2.Micro-ops:store log A’1store log A’2.Micro-ops:load A1load A2 store log A1store log A2.Processor CoreCorecachecacheShared CacheA’kclwbMemory ControllerVolatileNonvolatileAundo Ulog BUlog Aredo Rlog BNVRAM(b)Rlog AA’AUlog CRlog CA’k still in! cachesInefficiency of logging in software.C. Why Undo Redo Logging in HardwareThough promising, undo redo logging is not used inpersistent memory system designs because previous softwarelogging schemes are inefficient (Figure 2).Extra instructions in the CPU pipeline. Logging insoftware uses logging functions in transactions. Figure 2(a)shows that both undo and redo logging can introduce alarge number of instructions into the CPU pipeline. Aswe demonstrate in our experimental results (Section VI),using only undo logging can lead to more than doubledinstructions compared to memory systems without persistentmemory. Undo redo logging can introduce a prohibitivelylarge number of instructions to the CPU pipeline, occupyingcompute resources needed for data movement.Increased NVRAM traffic. Most instructions for loggingare loads and stores. As a result, logging substantiallyincreases memory traffic. In particular, undo logging mustnot only store to the log, but it must also first read theold values of the working data from the cache and memoryhierarchy. This further increases memory traffic.Conservative cache forced write-back. Logs can havea limited size1 . Suppose that, without losing generality, alog can hold undo redo records of two transactions (Figure 2(b)). To log a third transaction (U log C and Rlog C),we must overwrite an existing log record, say U log A andRlog A (transaction A). If any updates of transaction A(e.g., A0k ) are still in caches, we must force these updatesinto the NVRAM before we overwrite their log entry. Theproblem is that caches are invisible to software. Therefore,software does not know whether or which particular updatesto A are still in the caches. Thus, once a log becomes full(after garbage collection), software may conservatively forcecache write-backs before committing the transaction. Thisunfortunately negates the benefit of redo logging.Risks of data persistence in multithreading. In additionto the above challenges, multithreading further complicatessoftware logging in persistent memory, when a log is sharedby multiple threads. Even if a persistent memory system1 Although we can grow the log size on demand, this introduces extrasystem overhead on managing variable size logs [19]. Therefore, we studyfixed size logs in this paper.issues clwb instructions in each transaction, a contextswitch by the OS can occur before the clwb instructionexecutes. This context switch interrupts the control flow oftransactions and diverts the program to other threads. Thisreintroduces the aforementioned issue of prematurely overwriting the records in a filled log. Implementing per-threadlogs can mitigate this risk. However, doing so can introducenew persistent memory API and complicates recovery.These inefficiencies expose the drawbacks of undo redologging in software and warrants a hardware solution.III. O UR D ESIGNTo address the challenges, we propose a hardwareundo redo logging design, consisting of Hardware Logging(HWL) and cache Force Write-Back (FWB) mechanisms.This section describes our design principles. We describedetailed implementation methods and the required softwaresupport in Section IV.A. Assumptions and Architecture OverviewFigure 3(a) depicts an overview of our processor andmemory architecture. The figure also shows the circularlog structure in NVRAM. All processor components arecompletely volatile. We use write-back, write-allocate cachescommon to processors. We support hybrid DRAM NVRAMfor main memory, deployed on the processor-memory buswith separate memory controllers [19], [13]. However, thispaper focuses on persistent data updates to NVRAM.Failure Model. Data in DRAM and caches, but not inNVRAM, are lost across system reboots. Our design focuses on maintaining persistence of user-defined critical datastored in NVRAM. After failures, the system can recoverthis data by replaying the log in NVRAM. DRAM is usedto store data without persistence [19], [13].Persistent Memory Transactions. Like prior work inpersistent memory [19], [22], we use persistent memory“transactions” as a software abstraction to indicate regionsof memory that are persistent. Persistent memory writesrequire a persistence guarantee. Figure 2 illustrates a simplecode example of a persistent memory transaction implemented with logging (Figure 2(a)), and our with design(Figure 2(b)). The transaction defines object A as criticaldata that needs persistence guarantee. Unlike most loggingbased persistent memory transactions, our transactions eliminate explicit logging functions, cache forced write-backinstructions, and memory barrier instructions. We discussour software interface design in Section IV.Uncacheable Logs in the NVRAM. We use singleconsumer, single-producer Lamport circular structure [33]for the log. Our system software can allocate and truncatethe log (Section IV). Our hardware mechanisms append thelog. We chose a circular log structure because it allowssimultaneous appends and truncates without locking [33],

L1 L1 ProcessorCoreCacheControllersCore Last-level CacheTx begin(TxID)do some readsdo some computationWrite ATx commitLogBufferProcessorHead PointerDRAMLog(Uncacheable)NVRAMLog TornEntry: bit1-bitTxIDTID addr(A1)16-bit 8-bit48-bitA’1A11-word 1-word(a) Architecture overview.5L1 A131Tail PointerProcessorCoreA’1 hit 24 Memory Controllersmicro-ops: (A’ , A’ , are new values to be written)12store A’1store A’2. 41Volatile Log Buffer4 NVRAMLogLog(b) In case of a store hit in L1 cache.Figure 3.3TxID, TID, addr(A1), A’1, A1NonvolatileNVRAMTx commit2 Write-allocateA’1 hits in aA1 Lower-level lower-levelcacheTx commit 5L1 4Log BufferTxID, TID, addr(A1), A’1, A1CoreA’1 miss 2(c) In case of a store miss in L1 cache.Overview of the proposed hardware logging in persistent memory.[19]. Figure 3(a) shows that log records maintain undo andredo information of a single update (e.g., store A1 ). Inaddition to the undo (A1 ) and redo (A01 ) values, log recordsalso contain the following fields: a 16-bit transaction ID, an8-bit thread ID, a 48-bit physical address of the data, anda torn bit. We use a torn bit per log entry to indicate theupdate is complete [19]. Torn bits have the same value for allentries in one pass over the log, but reverses when a log entryis overwritten. Thus, completely-written log records all havethe same torn bit value, while incomplete entries have mixedvalues [19]. The log must accommodate all write requestsof undo redo.The log is typically used during system recovery, andrarely reused during application execution. Additionally, logupdates must arrive in NVRAM in store-order. Therefore, wemake the log uncacheable. This is in line with most priorworks, in which log updates are written directly into a writecombine buffer (WCB) [19], [31] that coalesces multiplestores to the same cache line.B. Hardware Logging (HWL)The goal of our Hardware Logging (HWL) mechanismis to enable feasible undo redo logging of persistent datain our microarchitecture. HWL also relaxes ordering constraints on caching in a manner that neither undo norredo logging can. Furthermore, our HWL design leveragesinformation naturally available in the cache hierarchy butnot to the programmer or software. It does so withoutthe performance overhead of unnecessary data movementor executing logging, cache force-write-back, or memorybarrier instructions in pipeline.Leveraging Existing Undo Redo Information in Caches.Most processors caches use write-back, write-allocatecaching policies [34]. On a write hit, a cache only updatesthe cache line in the hitting level with the new values. Adirty bit in the cache tag indicates cache values are modifiedbut not committed to memory. On a write miss, the writeallocate (also called fetch-on-write) policy requires the cacheto first load (i.e., allocate) the entire missing cache line before writing new values to it. HWL leverages the write-back,write-allocate caching policies to feasibly enable undo redologging in persistent memory. HWL automatically triggers alog update on a persistent write in hardware. HWL recordsboth redo and undo information in the log entry in NVRAM(shown in Figure 2(b)). We get the redo data from thecurrently in-flight write operation itself. We get the undodata from the write request’s corresponding write-allocatedcache line. If the write request hits in the L1 cache, weread the old value before overwriting the cache line anduse that for the undo log. If the write request misses inL1 cache, that cache line must first be allocated anyway, atwhich point we get the undo data in a similar manner. Thelog entry, consisting of a transaction ID, thread, the addressof the write, and undo and redo values, is written out to thecircular log in NVRAM using the head and tail pointers.These pointers are maintained in special registers describedin Section IV.Inherent Ordering Guarantee Between the Log andData. Our design does not require explicit memory barriersto enforce that undo log updates arrive at NVRAM beforeits corresponding working data. The ordering is naturally ensured by how HWL performs the undo logging and workingdata updates. This includes i) the uncached log updates andcached working data updates, and ii) store-granular undologging. The working data writes must traverse the cachehierarchy, but the uncacheable undo log updates do not.Furthermore, our HWL also provides an optional volatilelog buffer in the processor, similar to the write-combiningbuffers in commodity processor design, that coalesces thelog updates. We configure the number of log buffer entriesbased on cache access latency. Specifically, we ensure thatthe log updates write out of the log buffer before a cached

store writes out of the cache hierarchy. Section IV-C andSection VI further discuss and evaluate this log buffer.C. Decoupling Cache FWBs and Transaction ExecutionWrites are seemingly persistent once their logs are writtento NVRAM. In fact, we can commit a transaction once logging of that transaction is completed. However, this does notguarantee data persistence because of the circular structureof the log in NVRAM (Section II-A). However, insertingcache write-back instructions (such as clflush and clwb)in software can impose substantial performance overhead(Section II-A). This further complicates data persistencesupport in multithreading (Section II-C).We eliminate the need for forced write-back instructionsand guarantee persistence in multithreaded applications bydesigning a cache Force-Write-Back (FWB) mechanism inhardware. FWB is decoupled from the execution of eachtransaction. Hardware uses FWB to force certain cacheblocks to write-back when necessary. FWB introduces aforce write-back bit (fwb) alongside the tag and dirty bit ofeach cache line. We maintain a finite state machine in eachcache block (Section IV-D) using the fwb and dirty bits.Caches already maintain the dirty bit: a cache line updatesets the bit and a cache eviction (write-back) resets it. Acache controller maintains our fwb bit by scanning cachelines periodically. On the first scan, it sets the fwb bit indirty cache blocks if unset. On the second scan, it forceswrite-backs in all cache lines with {f wb, dirty} {1, 1}.If the dirty bit ever gets reset for any reason, the fwb bitalso resets and no forced write-back occurs.Our FWB design is also decoupled from software multithreading mechanisms. As such, our mechanism is impervious to software context switch interruptions. That is, whenthe OS requires the CPU to context switch, hardware waitsuntil ongoing cache write-backs complete. The frequencyof the forced write-backs can vary. However, forced writebacks must be faster than the rate at which log entrieswith uncommitted persistent updates are overwritten in thecircular log. In fact, we can determine force write-backfrequency (associated with the scanning frequency) based onthe log size and the NVRAM write bandwidth (discussed inSection IV-D). Our evaluation shows the frequency determination (Section VI).D. Instant Transaction CommitsPrevious designs require software or hardware memorybarriers (and/or cache force-write-backs) at transaction commits to enforce write ordering of log updates (or persistentdata) into NVRAM across consecutive transactions [13],[26]. Instead, our design gives transaction commits a “freeride”. That is, no explicit instructions are needed. Ourmechanisms also naturally enforce the order of intra- andinter-transaction log updates: we issue log updates in theorder of writes to corresponding working data. We alsowrite the log updates into NVRAM in the order they areissued (the log buffer is a FIFO). Therefore, log updates ofsubsequent transactions can only be written into NVRAMafter current log updates are written and committed.E. Putting It All TogetherFigure 3(b) and (c) illustrate how our hardware loggingworks. Hardware treats all writes encompassed in persistenttransactions (e.g., write A in the transaction delimited bytx begin and tx commit in Figure 2(b)) as persistentwrites. Those writes invoke our HWL and FWB mechanisms. They work together as follows. Note that log updatesgo directly to the WCB or NVRAM if the system does notadopt the log buffer.The processor sends writes of data object A (a variable orother data structure), consisting of new values of one or morecache lines {A01 , A02 , .}, to the L1 cache. Upon updating anL1 cache line (e.g., from old value A1 to a new value A01 ):1) Write the new value (redo) into the cache line (Ê).a) If the update is the first cache line update ofdata object A, the HWL mechanism (which hasthe transaction ID and the address of A fromthe CPU) writes a log record header into the logbuffer.b) Otherwise, the HWL mechanism writes the newvalue (e.g., A01 ) into the log buffer.2) Obtain the undo data from the old value in the cacheline (Ë). This step runs parallel to Step-1.a) If the cache line write request hits in L1 (Figure 3(b)), the L1 cache controller immediatelyextracts the old value (e.g., A1 ) from the

Figure 1. Comparison of executing a transaction in persistent memory with (a) undo logging, (b) redo logging, and (c) both undo and redo logging. B. Why Undo Redo Logging While prior persistent memory designs only employ either undo or redo logging to maintain data persistence, we observe that using both can substantially relax the afore-