CAGC: A Content-aware Garbage Collection Scheme For Ultra-Low Latency . PDF Free Download

1y ago

26 Views

1 Downloads

422.76 KB

10 Pages

Report/dmca

Download PDF

Transcription

CAGC: A Content-aware Garbage Collection Scheme forUltra-Low Latency Flash-based SSDsSuzhen Wu‡ , Chunfeng Du‡ , Haijun Li‡ , Hong Jiang , Zhirong Shen‡ , Bo Mao‡‡ School of Informatics at Xiamen University, Xiamen, Fujian, China Department of Computer Science & Engineering at University of Texas-Arlington, Texas, USACorresponding author: Bo Mao (maobo@xmu.edu.cn)In order to address the aforementioned problems, researchers have proposed to use hardware coprocessors andsampling techniques to reduce the latency overhead caused byhash fingerprint calculations. For example, both CA-SSD [11]and paper [18] use on-chip hash calculation coprocessors tospeed up the hash fingerprint calculation. CA-FTL [2] usessampling and pre-hash techniques to reduce the number ofdata blocks that need expensive fingerprint calculation, thusreducing the latency overhead of fingerprint processing. However, the hash-coprocessor schemes introduce extra hardwareoverhead, and the sampling technique also needs to performthe hash fingerprint calculation on other data blocks. Noneof them can eliminate the latency overhead caused by thefingerprint calculation and lookup operations along the writeI/O path.With the rapid development and application of new flashstorage technologies, such as Z-NAND and XL-Flash [4],[42], the performance of solid-state disks based on these flashmedias has been improved so vastly that they are now referredto as ultra-low latency flash-based SSDs [13], [21]. However,directly applying the inline data deduplication technologyto ultra-low latency flash-based SSDs will notably increasethe response latency of user requests because the ultra-lowlatency on the critical path makes the deduplication-inducedlatency overhead much more pronounced, thus reducing theperformance of deduplication-based flash storage devices. Preliminary experimental results show that a direct application ofthe inline data deduplication technology on Samsung Z-NANDSSDs increases the response latency by up to 71.9%, with anaverage increase of 43.1%.In addition to the user’s read and write requests, the flashbased SSDs also need to perform GC operations to reclaimthe invalid data pages, and perform block erase operations tofree up space for subsequent new user write data. Generallyspeaking, the GC procedure includes selecting the victim flashblock, migrating the valid data pages in the victim flash blockto other free flash blocks, erasing the victim flash block andmarking it as free. The basic unit of erase operation is a flashblock consisting multiple (hundreds of) pages, while the basicunit of a read and write request is a page. The latency of ablock erase operation is an order of magnitude higher than thatof a read or write request [1]. Thus, GC operations are verytime-consuming background tasks inside flash-based SSDsthat directly affect the foreground user read and write requests,Abstract—With the advent of new flash-based memory technologies with ultra-low latency, directly applying inline datadeduplication in flash-based storage devices can degrade thesystem performance since key deduplication operations lie onthe shortened critical write path of such devices. To addressthe problem, we propose a Content-Aware Garbage Collectionscheme (CAGC), which embeds the data deduplication intothe data movement workflow of the Garbage Collection (GC)process in ultra-low latency flash-based SSDs. By parallelizingthe operations of valid data pages migration, hash computingand flash block erase, the deduplication-induced performanceoverhead is alleviated and redundant page writes during theGC period are eliminated. To further reduce data writes andwrite amplification during GC, CAGC separates and stores datapages in different regions based on their reference counts. Theperformance evaluation of our CAGC prototype implemented inFlashSim shows that CAGC significantly reduces the number offlash blocks erased and data pages migrated during GC, leadingto improved user I/O performance and reliability of ultra-lowlatency flash-based SSDs.Index Terms—Ultra-Low Latency Flash-based SSDs, GarbageCollection, Data Deduplication, Reference Count, Data PlacementI. I NTRODUCTIONFlash memory technology is disrupting the storage mediamarket, leading to a significant evolutionary investment andinnovation in the storage systems market [7]. Flash-basedSolid State Disks (SSDs) have emerged as attractive alternatives to Hard Disk Drives (HDDs), increasingly replacing orcoexisting with HDDs in smartphones, personal laptops, enterprise storage systems and large-scale data centers [14], [28].However, due to the unique features of flash memory, suchas asymmetric read-write performance, limited erase cycles,and garbage collection, data writing has an important impacton the performance and reliability of flash-based SSDs [1],[27], [30], [31]. Due to its ability to identify and reduce thewriting of redundant data pages, the inline data deduplicationtechnology can obviously improve the storage efficiency andreliability of flash-based memory systems, attracting extensiveattention from both academia and industry [2], [11]. Whileinline data deduplication effectively reduces the amount ofredundant write data to a flash storage device, it also increasesthe write response times due to the additional and expensiveoverhead of fingerprint calculations and lookup operations onthe critical write path [37].1

which significantly increases the user request response timesand causes serious performance variability.In view of the above challenges facing the direct applicationof the data deduplication technology in ultra-low latency flashbased SSDs, this paper proposes a two-pronged approachcalled Content-Aware Garbage Collection scheme (CAGC).On one hand, CAGC embeds the data deduplication technology into the GC process, which not only hides the overhead offingerprint calculation, but also eliminates the write operationsof redundant data pages. On the other hand, by exploiting thereference count feature of the data deduplication technologyand grouping flash blocks into hot and cold regions, data pageswith similar reference counts (larger than a threshold) arestored together in flash blocks of the cold region, and datapages with reference count of exactly 1 are stored togetherin flash blocks of the hot region. The overhead of movingdata pages with high reference counts during GC is reducedbecause these data pages with high reference counts areusually much less likely to become invalid (as elaborated inSection III-C), which further improves the GC efficiency ofultra-low latency flash-based SSDs. The performance resultsshow that the CAGC scheme significantly reduces the numberof flash blocks erased and data pages migrated during GC,thus improving the performance and reliability of ultra-lowlatency flash-based SSDs. This paper makes the followingcontributions:(1) From our preliminary experiments, we find that inlinedata deduplication significantly degrades the performance of ultra-low latency flash-based SSDs.(2) To address the above challenge when directly applying data deduplication for ultra-low latency SSDs, wepropose a content-aware GC scheme by exploiting thefeatures of both GC and data deduplication.(3) We conduct extensive experiments on a lightweightCAGC prototype and the evaluation results show thatCAGC significantly improves the GC efficiency forultra-low latency SSDs.The rest of this paper is organized as follows. Backgroundand motivation are presented in Section II. We describethe design details of the Content-Aware Garbage Collectionscheme in Section III. The performance evaluation is presentedin Section IV. The related work is presented in Section V. Weconclude this paper in Section VI.File 1:ABCFile 2:EBFFile 3:DABFile 4:BGData PagesReferencecountABCDEFG2412111DData DeduplicationFig. 1. An example of how data deduplication works.(ULL) SSDs can be up to 10 times shorter than that of ahigh-performance NVMe SSD [13], [21], [42]. For example,Samsung’s Z-NAND completes a 4KB-sized read servicewithin 3us [4] while a general high-performance NVMe SSDcompletes an I/O service within 47-52us, including data transfer and FTL execution latencies [42]. In addition, other flashmemory vendors also develop and promote low-latency flashmemory chips, such as Toshiba’s XL-Flash technology [42].Although ultra-low latency SSDs have low read and writelatency, they are very sensitive to the amount of write data dueto the asymmetric read and write performance and the limitedendurance (erase cycles) of flash memory. The amount of datawritten to the flash memory directly affects the performanceand reliability of flash-based SSDs. Therefore, reducing theamount of write data helps improve the performance andreliability of flash-based SSDs and extend their life.B. Inline data deduplication for flash storageInline data deduplication means that data deduplicationprocess is performed on the user write path before the datais written to the storage devices. Data deduplication is aspecific type of data compression. It splits files into multipledata chunks that are uniquely identified by a fingerprint(i.e., a hash signature) of each individual data chunk. Theredundant data chunks in a file are replaced by pointers totheir stored unique copies. Figure 1 shows an example of howdata deduplication works. The data deduplication technologyhas been demonstrated to be very effective in shortening thebackup window and saving the network bandwidth and storagespace in cloud backup, archiving and primary storage systems(e.g. flash-based storage systems).Recent studies have shown that the ability of data deduplication to reduce the write traffic can significantly improvethe performance and reliability of the flash storage systems. Infact, inline data deduplication has become a commodity featurein flash-based storage products for many leading companies,such as HPE Nimble Storage [34] and Pure Storage [5], [6],for the purpose of enhancing system performance, reliabilityand space efficiency.However, despite of its great benefits, data deduplication hastwo important drawbacks, namely, expensive calculation andmemory overheads on the critical I/O path, which adverselyaffects the performance of systems, especially for ultra-lowlatency flash-based SSDs. With the Z-NAND and XL-Flashtechnologies, the I/O latency of next-generation flash storage issignificantly lower than earlier generations. On the other hand,II. BACKGROUND AND M OTIVATIONIn this section, we first describe the development of ultralow latency flash-based SSDs. Then we elaborate on whyinline data deduplication is not suitable for ultra-low latency flash-based SSDs. Finally, we present the GC workflowin flash-based SSDs to motivate our new Content-AwareGarbage Collection optimization for ultra-low latency flashbased SSDs.A. Ultra-low latency flash-based SSDsWith the advent of Samsung’s Z-NAND and Toshiba’s XLFlash technologies, the I/O latency of Ultra-Low Latency2

Normalized Response Time2: Valid pageNote:Baseline: Free page: Invalid pageInline-DedupeP1P2P3P41.51BDP1P2P3P4Read BP1P2P3P4Read DP5P6P7P80.50HomesWebmailEFStep 1: Select the victimflash block to eraseMailFig. 2. The performance results of an ultra-low latency flash-based SSD with(Inline-Dedup) and without (Baseline) inline data deduplication, normalizedto Baseline.P5P6P7P8EFBDWriteStep 2: Read the validpages and write themP5P6P7P8EFBDStep 3: Erase the victimflash block to free the spaceFig. 3. 3 steps of a typical garbage collection process in SSDs.pages, thus called greedy approach [10]. However, since thecold data pages of the CAGC system are stored together andthe cold data pages are less likely to be invalidated, the useof greedy algorithms may lead to an uneven wear-levelingproblem. To address this problem, a third approach, costbenefit approach, is proposed to comprehensively considerboth the number of invalid pages in the victim flash blockand the erasing history of the flash block to decide whichflash block is selected to be erased [16].Equally important, since each memory cell in a flash blockhas a limited number of erase cycles (before the cell becomesunusable), GC also significantly affects the reliability (i.e.,endurance) of SSD-based storage systems. Therefore, how toaddress the performance and reliability issues caused by GCof SSDs has become a critically important challenge whendeploying flash-based SSDs in HPC and enterprise storagesystems [19], [41].Figure 3 shows the 3 steps of a typical GC process in SSDs,that is: (1) select the victim flash block to be erased; (2) readthe valid data pages in the victim flash block and write themto other free flash blocks; (3) erase the victim flash block tomark it a free and available one for subsequent write data.Among the 3 steps, the erase latency of the flash block is thelargest, usually at the ms level, which is much greater thanthe read and write latency of the flash page, usually at the uslevel.With the development and application of the ultra-lowlatency Z-NAND and XL-Flash technologies that greatly amplifies the hash compute latency of the data deduplicationprocess as it lies on the write critical path, it is no longeradvisable to directly apply data deduplication to flash-basedSSDs based on such technologies. To address the problem,we propose a Content-Aware Garbage Collection scheme,called CAGC, to embed the data deduplication into the datamovement workflow of the GC process in ultra-low latencyflash-based SSDs. The main idea behind CAGC is to hidethe hash compute overhead by computing hash values of datachunks in parallel with the relatively long-latency operationsof valid-page copying and flash block erase, thus alleviating the performance degradation induced by the inline datadeduplication operations. To further reduce data writes, CAGCdivides and writes data pages into hot or cold region based onthe hash computing latencies of SHA-1/256 in deduplicationbased systems remain largely unchanged, especially for thelimited computing capability in flash-based SSDs. As a result,the performance bottleneck has been shifted from the flashstorage to the deduplication process in such ultra-low latencyflash storage systems [37].Figure 2 shows the normalized performance results of anultra-low latency flash-based SSD with and without inline datadeduplication, driven by three FIU workloads [9]. We can seethat inline data deduplication degrades the system performancefor all the three workloads, even for the Mail workload witha 93% deduplication ratio. The reason is that all the incomingdata blocks must go through the expensive hash computingand search processes whose latency is much larger than theI/O latency of the ultra-low latency flash storage.C. Garbage collection and motivationDue to the unique physical features of NAND flash, writerequests are serviced out-of-place rather than in-place. In flashbased SSDs, data can only be written to erased pages (a.k.a.,free pages), where the in-place (before-write) pages becomeinvalid (stale) after out-of-place write operations. After that,the invalid pages in a block, called a victim flash block,must be freed by copying (reads followed by writes) thedata of the valid pages in the victim block into a free blockbefore the victim block is erased, which makes free blockavailable for subsequent write data. This process is known asgarbage collection process that significantly affects the userI/O performance of SSD-based storage systems [15], [23],[39].Reclaiming space used by invalid data without losing validdata is a major performance bottleneck in GC for flash storagedevices. Thus, a typical victim flash block selection algorithmusually chooses the flash blocks that contain the maximuminvalid data pages while simultaneously considering the blockerase count and age information [33], [17], [10]. There arethree main approaches that existing victim-block selectionalgorithms taking for garbage collection in flash-based SSDs.The first one, referred to as random approach, randomlyselects the victim blocks with invalid pages for erasing, forease of wear leveling and low selection overhead [29]. Thesecond one selects the victim blocks with the most invalid3

VValidIInvalidFYRef. FWrite to HotRegionVHit?NNNote：PromotionQuery the ash EngineHot VVIVFRead the validdata pagesIVVVVVVVVVVVVBlock BlockPagePlacementCold RegionBlockCold RegionSelect the victimflash ction(GC)ModuleLPN-PPNMapping TableGC startsWriteVIIVIReadHot RegionUpdate the metadataFreeFig. 4. Systems architecture of CAGC.Complete?Erase the victim flash blockstheir reference counts, which improves the GC efficiency andreduces the write amplification.Fig. 5. The workflow of content-aware garbage collection.III. D ESIGN OF CAGCis located in the Flash Translation Layer (FTL) of SSDs.Although CAGC takes garbage collection as the one of itscentral component, it does not change the internal SSD GCworkflow and algorithm. Therefore, CAGC is orthogonal toand easily incorporated with any existing GC algorithms, tofurther improve the GC efficiency.As shown in Figure 4, the Hot Region refers to the groupof flash blocks whose data pages are frequently updated tobecome invalid, and the Cold Region refers to the group offlash blocks whose data pages are rarely updated or deleted.When a file is deleted, the reference count of the pagesrelated the file will only be decremented by 1 from its currentreference counts. A page becomes invalid only when itsreference count is reduced to 0, meaning that a data page witha high reference count will not likely be invalidated. Therefore,as shown in Figure 4, those data pages with a reference countof 1 are stored in the Hot Region, and those data pages witha higher reference count are stored in the Cold Region.In this section, we first present a system overview of the proposed Content-Aware Garbage Collection (CAGC), followedby a description of workflow in CAGC. The reference countbased data page placement strategy of CAGC is illustrated atthe end of this section.A. System overviewThe main idea behind the CAGC scheme is to removethe data deduplication operation from the foreground criticalI/O path and embed it into the background GC process inultra-low latency flash-based SSDs. CAGC not only hides theentire deduplication-induced overheads, but also avoids thenegative impact of inline data deduplication on the user I/Operformance. Meanwhile, CAGC leverages the characteristicsof data page reference counts to separate the data pages withdifferent reference counts into cold or hot region, which furtherreduces the number of valid data pages copied and the numberof flash blocks erased during the GC period, thus improvingboth the performance and reliability of ultra-low latency flashbased SSDs.Figure 4 shows the system architecture overview of CAGC,which mainly consists of three modules, i.e., Garbage Collection module, Data Deduplication module and Page Placementmodule. The Garbage Collection module is mainly responsiblefor selecting the victim flash blocks to be freed, migrating thevalid data pages in the victim flash blocks, and then erasingthese victim flash blocks. The Data Deduplication module ismainly responsible for conducting data deduplication and thevalid data page migrations during the GC period to eliminatethe write operations of redundant data pages, and passing thereference count information of the data pages to the PagePlacement module. The Page Placement module is responsiblefor organizing and managing the data pages in the flash blocksaccording to the reference count. As shown in Figure 4, CAGCB. Workflow of content-aware garbage collectionIn addition to serving user read and write requests, flashbased SSDs also need to conduct GC internally to release thoseflash blocks occupied by invalid data pages, so that those flashblocks can be reused for subsequent write data. Generally,flash-based SSDs utilize the system idle periods to conductGC in the background to reclaim invalid data pages to obtainfree space. However, when the free space in the SSD is lowerthan a preset threshold, the GC process is triggered to selectthe victim flash blocks that meet certain conditions specifiedby a given GC algorithm.Figure 5 shows the workflow of content-aware garbagecollection. When a victim flash block to be erased is selected,the valid data pages are read and the hash fingerprints of thesedata pages are calculated. Then CAGC searches the fingerprintin the fingerprint index to determine whether it is redundant4

123 3Before GCSet Ref.Set based dataplacementAfter GCSet Ref.Set igh RefNormalInvalidFree pageFig. 7. An example of reference count-based data page placement.Fig. 6. The distribution of invalid pages generated from pages of differentreference count.are desirable candidates for victim blocks since they are likelyto contain very few valid pages that need to be migrated.Besides, each deletion or update operation on a flash datapage in the Cold Region will only cause its reference countto be decremented by 1, rather than invalidating the page,meaning that the corresponding flash block will not likely haveany invalid data pages. Thus, CAGC can reduce the numberof valid data pages migrated and the number of flash blockserased during GC.Figure 7 shows an example of reference count-based datapage placement, and the Set refers to an area composed ofmultiple flash blocks. In the traditional deduplication-basedflash storage, data pages with different reference count aremixed and stored together. CAGC embeds the reference countbased data page placement along with the migration processof valid data pages during the GC period, thus eliminatingthe additional data page read and write operations from flashstorage.As shown in Figure 7, the reference count information ofeach data page can be obtained from the fingerprint index indeduplication-based storage systems. Then the reference countwill be compared with a preset reference count threshold (e.g.,1). If it is larger than the threshold, the data page will bestored in the Cold Region. Otherwise, it will be stored in theHot Region. By exploiting the reference count feature of datadeduplication to guide the data page placement and leveragingthe capacity optimization advantages of the data deduplicationtechnology, the performance and reliability of the ultra-lowlatency flash-based SSDs can be further improved.Figure 8 shows an example of the comparison between thetraditional GC scheme and the CAGC scheme of writing 4 filesand then deleting 2 of them. In the traditional SSD GC scheme,since the content redundancy of data pages is not known, datapages with the same content are redundantly stored. Aftersome files are deleted or updated, invalid data pages appearin many different flash blocks. Therefore, it needs to migratemany more valid data pages and erase many more flash blocksthan CAGC during the GC period.As shown in Figure 8(a), the traditional GC process requires12 valid data page write operations and 2 flash block eraseoperations, but only 6 flash data pages are freed. By contrast,CAGC can conduct data deduplication for the migrated datapages during the GC period, thus eliminating redundant flash(hit/matched) or not (missed/unmatched). If the data page isnot redundant, CAGC writes the data page into the hot regionand updates the fingerprint index. Otherwise, CAGC does notwrite the data page but simply updates the correspondingmetadata, including increasing the reference count.At the same time, when the reference count of the redundantdata page is equal to the preset threshold, the data page willbe migrated to the Cold Region before the victim flash blockis erased. After all valid pages in the victim flash block havebeen migrated, the victim flash block is erased.The flash-block erase time is generally at the millisecondlevel, which is much higher than the microsecond-level calculation and search overhead of the hash fingerprint. This indicates that embedding the data deduplication process into theGC process does not cause significant performance overhead.On the contrary, by using the CAGC scheme, redundant datapages will not be repeatedly written, thus improving the GCefficiency for ultra-low latency flash-based SSDs.C. Reference count-based data page placementThe reference count of a given data page in data deduplication indicates how many different files share this data page.Intuitively, the higher the reference count for a page is, thesmaller the possibility of this data page being deleted (i.e.,that of all files sharing this page being deleted). It will alsobe verified empirically next.Figure 6 shows the distribution of invalid data pages generated from pages of different reference count for the threeFIU traces. More than 80% of invalid data pages come fromflash pages with a reference count of 1, while the percentageof invalid data pages from a reference count of 3 or more isless than 1%. The analysis on these workloads shows that datapages with higher reference count have a much longer lifetimethan data pages with lower reference count, and are less likelyto become invalid. Therefore, data pages with high referencecounts can be considered as cold data pages and stored inthe Cold Region. In addition, the data pages with a referencecount equal to or lower than a preset threshold (e.g., 1) canbe considered as hot data pages and stored in the Hot Region.The data pages in the Hot Region have higher updatefrequency and higher probability of being invalid than those inthe Cold Region. Therefore, the flash blocks in the Hot Region5

File 1: ABC DFile 2: EBFFile 3: DABFile 4: BGWriteEBFABCDABBGABCDEBGCDFDABBGDelete Files 2/4ABCDEBFDABBG(a) An example of traditional garbage collectionFile 1: ABC DFile 2: EBFFile 3: DABFile 4: BGWriteEBFABCDABBGCAGCCEFGABDDelete Files 2/4CEFGABDD(b) An example of content-aware garbage collectionFig. 8. An example of writing four files and deleting two files between the two garbage collection schemes.TABLE IT HE CONFIGURATION OF SSDTABLE IIT HE WORKLOAD CHARACTERISTICSTypeValueTypeValueTracesWrite RatioDedup. RatioAver. Req. SizePage Size4KBRead12usMail69.8%89.3%14.8KBBlock Size256KBWrite16usHomes80.5%30.0%13.1KBOP Space7%Erase usWorkloadsFIU [9]GC Watermark20%SyLab of FIU [22] and cover a duration of three weeks.They were collected from a virtual machine running a fileserver (Homes), two web-servers (Web-vm) and an emailserver (Mail), respectively. Each request in the traces includesthe hash value of the requested data. The characteristics ofthe three traces are shown in Table II [9], [22]. The FIUworkloads with fingerprints have been widely used in thestorage community to study the performance of deduplicationbased storage systems.In the experiments, we compare CAGC with the ultra-lowlatency flash-based SSDs without embedding the inline datadeduplication during the GC process (Baseline), and the ultralow latency flash-based SSDs with inline data deduplicationembedded on the foreground user write path (Inline-Dedupe).By default, almost all the experiments in this paper are basedon the greedy algorithm to select the victim flash block forall the schemes. The sensitivity study on different victimflash block selecting algorithms is presented and analyzed inSection IV-C.page write operations. As shown in Figure 8(b), CAGC onlyneeds 7 valid data page write operations and 1 flash block eraseoperation in the GC process, and 11 flash data pages can befreed. Compared with the traditional SSD garbage collectionscheme, CAGC can significantly improve the GC efficiency.IV. P ERFORMANCE E VALUATIONSIn this section, we first describe the experimental setup andmethodology. Then we evaluate the performance and effectiveness of our proposed CAGC scheme driven by deduplicatingworkloads (i.e., workload traces collected from real productionsystems but instrumented to enable content identification asexplained later), including the comparison in the number offlash blocks erased and the number of pages written duringGC. We present the testing and analysis of sensitivity study atthe end of this section.A. Experimental setup and methodologyB. Performance result and analysisWe implement a prototype system of CAGC, which isextended on the basis of an event-driven simulator for flashbased SSDs, FlashSim [20]. It has built in various FTLmanagement strategies and can generate response time, number of flash blocks erased and many additional performancestatistics. Based on the performance parameters of Samsung’scommercially available Z-NAND flash memory, the specificparameter settings of the ultra-low latency flash-based SSD inthe experiment are shown in Table I.We replay the deduplicating workloads to evaluate theperformance of the CAGC prototype system. In the tracedriven experiments, the three traces were obtained from theFigure 9 compares CAGC with the Baseline system in termsof the number of flash blocks erased, driven by the threededuplicating workloads. CAGC erases significantly smallernumbers of flash blocks than the non-deduplication Baselinesystem, by 23.3%, 48.3%, and 86.6%, under the Homes, Webvm, and Mail workloads, respectively. CAGC performs datadeduplication during the data page migration process of theSSD GC periods, which reduces the writing of redundant datapages and further reduces the number of erased flash blocks.For the Mail workload with highest deduplication ratio, CAGCreduces the largest number of erased flash blocks.6

2BaselineNormalzied Response TimesNumber of Flash Blocks Erasedx sWeb-vmMailCAG

data blocks must go through the expensive hash computing and search processes whose latency is much larger than the I/O latency of the ultra-low latency ﬂash storage. C. Garbage collection and motivation Due to the unique physical features of NAND ﬂash, write requests are serviced out-of-place rather than in-place. In ﬂash-