CMU SCS 15-721 (Spring 2018) :: Larger-than-Memory Databases

Transcription

Lecture #23ADVANCEDDATABASESYSTEMSLarger-than-Memory Databases@Andy Pavlo // 15-721 // Spring 2018

2ADMINISTRIVIASnowflake Guest Speaker: May 2ndFinal Exam Handout: May 2ndCode Review #2: May 2ndProject #3 Final Presentation: May 14th @ 8:30amCMU 15-721 (Spring 2018)

3D ATA B A S E N E W SCMU 15-721 (Spring 2018)

3D ATA B A S E N E W SCMU 15-721 (Spring 2018)

3D ATA B A S E N E W SCMU 15-721 (Spring 2018)

4BackgroundImplementation IssuesReal-world ExamplesEvaluationCMU 15-721 (Spring 2018)

5M O T I VAT I O NDRAM is expensive, son.It would be nice if our in-memory DBMS coulduse cheaper storage.CMU 15-721 (Spring 2018)

6L A R G E R-T H A N - M E M O R Y D ATA B A S E SAllow an in-memory DBMS to store/access dataon disk without bringing back all the slow partsof a disk-oriented DBMS.Need to be aware of hardware access methods In-memory Storage Tuple-Oriented Disk Storage Block-OrientedCMU 15-721 (Spring 2018)

7OLAPOLAP queries generally access the entire table.Thus, there isn’t anything about the workload forthe DBMS to exploit that a disk-oriented bufferpool can’t handle.In-MemoryDisk DataAZone Map (A)MIN ##COUNT ##MAX ##AVG ###SUM ## STDEV ###A CMU 15-721 (Spring 2018)

8O LT POLTP workloads almost always have hot and coldportions of the database. We can assume that txns will almost always access hottuples.The DBMS needs a mechanism to move cold dataout to disk and then retrieve it if it is ever neededagain.CMU 15-721 (Spring 2018)

9L A R G E R-T H A N - M E M O R Y D ATA B A S E SIn-MemoryIndexIn-MemoryTable HeapCold-DataStorageTuple #00Tuple #01Tuple #02Tuple #03Tuple #04CMU 15-721 (Spring 2018)

9L A R G E R-T H A N - M E M O R Y D ATA B A S E SIn-MemoryTable HeapCold-DataStorageTuple #00headerTuple #02Tuple #01Tuple #03Tuple #04Evicted Tuple BlockIn-MemoryIndexCMU 15-721 (Spring 2018)

9L A R G E R-T H A N - M E M O R Y D ATA B A S E SIn-MemoryTable HeapCold-DataStorageTuple #00header?Tuple #02?Tuple #01Tuple #03Tuple #04Evicted Tuple BlockIn-MemoryIndexCMU 15-721 (Spring 2018)

9L A R G E R-T H A N - M E M O R Y D ATA B A S E SIn-MemoryTable HeapTuple #00?SELECT * FROM tableWHERE id Tuple #01 Tuple #02?Cold-DataStorageheaderTuple #01Tuple #03Tuple #04Evicted Tuple BlockIn-MemoryIndex?CMU 15-721 (Spring 2018)

10AGAIN, WHY NOT MMAP?Write-ahead logging requires that a modified pagecannot be written to disk before the log recordsthat made those changes is written.There are no mechanisms for asynchronous readahead or writing multiple pages concurrently.IN-MEMORY PERFORMANCE FOR BIG DATAVLDB 2014CMU 15-721 (Spring 2018)

11O LT P I S S U E SRun-time Operations Cold Tuple IdentificationEviction Policies Timing Evicted Tuple MetadataData Retrieval Policies Granularity Retrieval Mechanism Merging back to memoryCMU 15-721 (Spring 2018)

12C O L D T U P L E I D E N T I F I C AT I O NChoice #1: On-line The DBMS monitors txn access patterns and tracks howoften tuples are used. Embed the tracking meta-data directly in tuples.Choice #2: Off-line Maintain a tuple access log during txn execution. Process in background to compute frequencies.CMU 15-721 (Spring 2018)

13EVICTION TIMINGChoice #1: Threshold The DBMS monitors memory usage and begins evictingtuples when it reaches a threshold. The DBMS has to manually move data.Choice #2: OS Virtual Memory The OS decides when it wants to move data out to disk.This is done in the background.CMU 15-721 (Spring 2018)

14E V I C T E D T U P L E M E TA D ATAChoice #1: Tombstones Leave a marker that points to the on-disk tuple. Update indexes to point to the tombstone tuples.Choice #2: Bloom Filters Use approximate data structure for each index. Check both index filter for each query.Choice #3: OS Virtual Memory The OS tracks what data is on disk. The DBMS does notneed to maintain any additional metadata.CMU 15-721 (Spring 2018)

15E V I C T E D T U P L E M E TA D ATAIn-MemoryIndexIn-MemoryTable HeapCold-DataStorageTuple #00Tuple #01Tuple #02Access FrequencyTuple #00Tuple #01Tuple #02Tuple #03Tuple #04Tuple #05Tuple #03Tuple #04CMU 15-721 (Spring 2018)

15E V I C T E D T U P L E M E TA D ATAIn-MemoryIndexIn-MemoryTable HeapCold-DataStorageTuple #00Tuple #01Tuple #02Access FrequencyTuple #00Tuple #01Tuple #02Tuple #03Tuple #04Tuple #05Tuple #03Tuple #04CMU 15-721 (Spring 2018)

15E V I C T E D T U P L E M E TA D ATAIn-MemoryIndexIn-MemoryTable HeapCold-DataStorageTuple #00headerTuple #02Tuple #01Tuple #03Tuple #04Access FrequencyTuple #00Tuple #01Tuple #02Tuple #03Tuple #04Tuple #05CMU 15-721 (Spring 2018)

15E V I C T E D T U P L E M E TA D ATAIn-MemoryIndexIn-MemoryTable HeapCold-DataStorageTuple #00headerTuple #02Tuple #01Tuple #03Tuple #04CMU 15-721 (Spring 2018)

15E V I C T E D T U P L E M E TA D ATAIn-MemoryIndexIn-MemoryTable HeapCold-DataStorageTuple #00headerTuple #02Tuple #01Tuple #03Tuple #04 Block,Offset Block,Offset Block,Offset CMU 15-721 (Spring 2018)

15E V I C T E D T U P L E M E TA D ATAIn-MemoryIndexIn-MemoryTable HeapCold-DataStorageTuple #00headerTuple #02Bloom FilterTuple #01Tuple #03Tuple #04IndexCMU 15-721 (Spring 2018)

16D ATA R E T R I E VA L G R A N U L A R I T YChoice #1: Only Tuples Needed Only merge the tuples that were accessed by a query backinto the in-memory table heap. Requires additional bookkeeping to track holes.Choice #2: All Tuples in Block Merge all the tuples retrieved from a block regardless ofwhether they are needed. More CPU overhead to update indexes. Tuples are likely to be evicted again.CMU 15-721 (Spring 2018)

17R E T R I E VA L M E C H A N I S MChoice #1: Abort-and-Restart Abort the txn that accessed the evicted tuple. Retrieve the data from disk and merge it into memorywith a separate background thread. Restart the txn when the data is ready. Cannot guarantee consistency for large queries.Choice #2: Synchronous Retrieval Stall the txn when it accesses an evicted tuple while theDBMS fetches the data and merges it back into memory.CMU 15-721 (Spring 2018)

18MERGING THRESHOLDChoice #1: Always Merge Retrieved tuples are always put into table heap.Choice #2: Merge Only on Update Retrieved tuples are only merged into table heap if theyare used in an UPDATE query. All other tuples are put in a temporary buffer.Choice #3: Selective Merge Keep track of how often each block is retrieved. If a block’s access frequency is above some threshold,merge it back into the table heap.CMU 15-721 (Spring 2018)

19R E A L-W O R L D I M P L E M E N TAT I O N SH-Store – Anti-CachingHekaton – Project SiberiaEPFL’s VoltDB PrototypeApache Geode – Overflow TablesMemSQL – Columnar TablesCMU 15-721 (Spring 2018)

20H -S T O R EANTI-CACHINGOn-line IdentificationAdministrator-defined ThresholdTombstonesAbort-and-restart RetrievalBlock-level GranularityAlways MergeANTI-CACHING: A NEW APPROACH TO DATABASEMANAGEMENT SYSTEM ARCHITECTUREVLDB 2013CMU 15-721 (Spring 2018)

21H E K AT O NPROJECT SIBERIAOff-line IdentificationAdministrator-defined ThresholdBloom FiltersSynchronous RetrievalTuple-level GranularityAlways MergeTREKKING THROUGH SIBERIA: MANAGING COLDDATA IN A MEMORY-OPTIMIZED DATABASEVLDB 2014CMU 15-721 (Spring 2018)

22E P F L V O LT D BOff-line IdentificationOS Virtual MemorySynchronous RetrievalPage-level GranularityAlways MergeENABLING EFFICIENT OS PAGING FOR MAINMEMORY OLTP DATABASESDAMON 2013CMU 15-721 (Spring 2018)

23E P F L V O LT D BIn-MemoryTable HeapCold-DataStorageTuple #00Hot TuplesTuple #01Tuple #02Cold TuplesCMU 15-721 (Spring 2018)

23E P F L V O LT D BIn-MemoryTable HeapCold-DataStorageTuple #00Hot TuplesTuple #01Tuple #02Cold TuplesCMU 15-721 (Spring 2018)

23E P F L V O LT D BIn-MemoryTable HeapCold-DataStorageTuple #00Hot TuplesTuple #03Tuple #02Cold TuplesTuple #01CMU 15-721 (Spring 2018)

23E P F L V O LT D BIn-MemoryTable HeapCold-DataStorageTuple #00Hot TuplesTuple #03Tuple #02Cold TuplesTuple #01CMU 15-721 (Spring 2018)

23E P F L V O LT D BIn-MemoryTable HeapCold-DataStorageTuple #00Hot TuplesTuple #03Tuple #02Cold TuplesTuple #01CMU 15-721 (Spring 2018)

23E P F L V O LT D BIn-MemoryTable HeapCold-DataStorageTuple #00Hot TuplesTuple #03Tuple #02Cold TuplesTuple #01CMU 15-721 (Spring 2018)

23E P F L V O LT D BIn-MemoryTable HeapCold-DataStorageTuple #00Hot TuplesTuple #03Tuple #02Cold TuplesCMU 15-721 (Spring 2018)

23E P F L V O LT D BIn-MemoryTable HeapCold-DataStorageTuple #00Hot TuplesTuple #03Tuple #02Cold TuplesCMU 15-721 (Spring 2018)

24A PA C H E G E O D EO V E R F L O W TA B L E SOn-line IdentificationAdministrator-defined ThresholdTombstones (?)Synchronous RetrievalTuple-level GranularityMerge Only on Update (?)Source: Apache Geode DocumentationCMU 15-721 (Spring 2018)

25MEMSQLC O L U M N A R TA B L E SAdministrator manually declares a table as adistinct disk-resident columnar table. Appears as a separate logical table to the application. Uses mmap to manage buffer pool. Pre-computed aggregates per block always in memory.Manual IdentificationNo Evicted Metadata is needed.Synchronous RetrievalAlways MergeSource: MemSQL DocumentationCMU 15-721 (Spring 2018)

26E VA L U AT I O NCompare different design decisions in H-Storewith anti-caching.Storage Devices: Hard-Disk Drive (HDD)Shingled Magnetic Recording Drive (SMR)Solid-State Drive (SSD)3D XPoint (3DX)Non-volatile Memory (NVRAM)LARGER-THAN-MEMORY DATA MANAGEMENT ON MODERNSTORAGE HARDWARE FOR IN-MEMORY OLTP DATABASE SYSTEMSDAMON 2016CMU 15-721 (Spring 2018)

27MICROBENCHMARK10m tuples – 1KB each50% Reads / 50% Writes – Synchronization Enabled1KB Read1KB Write64KB Read64KB Write101E 08Latency (nanosec)81E 061061E 041041E 021021E 00100HDDSMRSSD3D XPointNVRAMDRAMCMU 15-721 (Spring 2018)

28MERGING THRESHOLDYCSB Workload – 90% Reads / 10% Writes10GB Database using 1.25GB MemoryThroughput (txn/sec)Merge (Update-Only)250000200000Merge (Top-5%)Merge (Top-20%)Merge (All)DRAM150000100000500000HDD (AR) HDD (SR) SMR (AR) SMR (SR)SSD3DXNVMRAMCMU 15-721 (Spring 2018)

29C O N F I G U R AT I O N C O M PA R I S O NGeneric Configuration Abort-and-Restart Retrieval Merge (All) Threshold 1024 KB Block SizeOptimized Configuration Synchronous Retrieval Top-5% Merge Threshold Block Sizes (HDD/SMR - 1024 KB) (SSD/3DX - 16 KB)CMU 15-721 (Spring 2018)

30TAT P B E N C H M A R KThroughput (txn/sec)Optimal Configuration per Storage Device1.25GB HDDSMRSSD3D XPointNVRAMCMU 15-721 (Spring 2018)

31VOTER BENCHMARKThroughput (txn/sec)Optimal Configuration per Storage Device1.25GB SSD3DXNVRAMCMU 15-721 (Spring 2018)

32PA R T I N G T H O U G H T SToday was about working around the blockoriented access and slowness of secondary storage.None of these techniques handle index memory.Fast & cheap byte-addressable NVM will make thislecture unnecessary.CMU 15-721 (Spring 2018)

33NEXT CLASSHardware! NVM! GPUs! HTM!CMU 15-721 (Spring 2018)

LARGER-THAN-MEMORY DATABASES. Allow an in-memory DBMS to store/access data on disk . without. bringing back all the slow parts of a disk-oriented DBMS. . CONFIGURATION COMPARISON. Generic Configuration. Abort-and-Restart Retrieval Merge (All) Threshold 1024 KB Block Size.