Sliding Look-Back Window Assisted Data Chunk Rewriting For . - USENIX

Transcription

Sliding Look-Back Window Assisted Data Chunk Rewriting forImproving Deduplication Restore PerformanceZhichao Cao1, Shiyong Liu2, Fenggang Wu1, Guohua Wang3,Bingzhe Li1, and David H.C. Du11University2Oceanof Minnesota, Twin CitiesUniversity of China3SouthChina University of Technology02/26/2019

Agenda Deduplication and RestoreData Chunk Rewrite PreliminaryContainer Capping Introduction and LimitationsProposed Solutions– Flexible Container Referenced Count based Design (FCRC)– Sliding Look-Back Window (LBW) Evaluations Conclusions and Future WorkCenter for Research inIntelligent Storage

Agenda Deduplication and RestoreData Chunk Rewrite PreliminaryContainer Capping Introduction and LimitationsProposed Solutions– Flexible Container Referenced Count based Design (FCRC)– Sliding Look-Back Window (LBW) Evaluations Conclusions and Future WorkCenter for Research inIntelligent Storage

Deduplication Process[1]Byte Stream822 1825Sliding WindowRecipe22 238Active container22 1818 19 20 218Center for Research inIntelligent Storage91017 2525102322 2310 13 22 23Container Storage5Indexing Table

Deduplication Process[1]Byte Stream22 23822 182510Sliding WindowRecipe822 1818 19 20 218Center for Research inIntelligent Storage917 2525102322 2310 13 22 23Active containerContainer Storage5Indexing Table

Deduplication Process[1]Byte Stream22 23822 182510Sliding WindowRecipe822 1818 19 20 218Center for Research inIntelligent Storage917 2525102322 2310 13 22 23Active containerContainer Storage5Indexing Table

Deduplication Process[1]Byte Stream22 23822 182510Sliding WindowRecipe8Active container22 1818 19 20 218Center for Research inIntelligent Storage917 2525102322 2310 13 22 2314Container Storage5Indexing Table

Deduplication Process[1]Byte Stream22 23822 182510Sliding WindowRecipe8Active container22 1818 19 20 218Center for Research inIntelligent Storage917 2525102322 2310 13 22 2314Container Storage5Indexing Table

Deduplication Process[1]Byte Stream22 23822 182510Sliding WindowRecipe22 238Active container22 1818 19 20 2125102322 2310 135Indexing TableNot Found917 25 8Center for Research inIntelligent Storage14Container Storage

Deduplication Process[1]Byte Stream22 23822 182510Sliding WindowRecipe22 238Active container22 1818 19 20 2125102322 2310 13514Indexing TableNot Found917 25 8Center for Research inIntelligent Storage14Container Storage

Deduplication Process[1]Byte Stream22 23822 182510Sliding WindowRecipe22 238Active container22 1818 19 20 212510 142310 1322 23514Indexing TableNot Found917 25 8Center for Research inIntelligent Storage14Container Storage

Deduplication Process[1]Byte Stream22 23822 182510Sliding WindowRecipe822 1818 19 20 218917 252510 1423Active container22 2351410 13 22 2314Container StorageCenter for Research inIntelligent Storage[1] Zhu B, Li K, Patterson R H. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System[C]//Fast. 2008, 8: 1-14.Indexing Table

Deduplication Process[1]Byte Stream22 23822 182510Sliding WindowRecipe822 1818 19 20 218917 252510 142310 13Active container22 23514 22 2314Container StorageCenter for Research inIntelligent Storage[1] Zhu B, Li K, Patterson R H. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System[C]//Fast. 2008, 8: 1-14.Indexing Table

Deduplication Process[1]Byte Stream22 23822 18251014Sliding WindowRecipe822 1818 19 20 218917 252510 142310 1322 23514 22 23Active containerContainer StorageCenter for Research inIntelligent Storage[1] Zhu B, Li K, Patterson R H. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System[C]//Fast. 2008, 8: 1-14.Indexing Table

Deduplication Process[1]Byte Stream22 23822 18251014Sliding WindowRecipe822 1818 19 20 218917 252510 142310 1322 23514 22 23Active containerContainer StorageCenter for Research inIntelligent Storage[1] Zhu B, Li K, Patterson R H. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System[C]//Fast. 2008, 8: 1-14.Indexing Table

Deduplication Process[1]Byte Stream22 23822 18251014Sliding WindowRecipe822 1818 19 20 218917 252510 142310 1322 23Active container514 22 235Container StorageCenter for Research inIntelligent Storage[1] Zhu B, Li K, Patterson R H. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System[C]//Fast. 2008, 8: 1-14.Indexing Table

Deduplication Process[1]Byte Stream22 23822 18251014Sliding WindowRecipe22 238522 1818 19 20 212510 142310 1322 23Active container514Found917 25 8Container StorageCenter for Research inIntelligent StorageIndexing Table[1] Zhu B, Li K, Patterson R H. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System[C]//Fast. 2008, 8: 1-14.

Deduplication Process[1]Byte Stream22 23822 18251014Sliding WindowRecipe22 238522 1818 19 20 212510 142310 1322 23Active container514Found917 25 8Container StorageCenter for Research inIntelligent StorageIndexing Table[1] Zhu B, Li K, Patterson R H. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System[C]//Fast. 2008, 8: 1-14.

Deduplication Process[1]Byte Stream22 23822 182510145Sliding WindowRecipe22 23822 1818 19 20 212510 1423Active container510 1322 23514Found917 25 8Container StorageCenter for Research inIntelligent StorageIndexing Table[1] Zhu B, Li K, Patterson R H. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System[C]//Fast. 2008, 8: 1-14.

Restore Process with Chunk-based Caching[2]Recipe822 182510 14 18 13 22328 23 12 13 32 23 286 22 23Chunk CacheAssembling BufferRestored Data Storage3Center for Research inIntelligent Storage522131812 14 285252310 13310 131318 19 20 212822 23917 255Container Storage14 22 23Container Read Buffer231822141352310

Restore Process with Chunk-based Caching[2]Recipe822 182510 14 18 13 22328 23 12 13 32 23 286 22 23Chunk CacheAssembling BufferRestored Data Storage3Center for Research inIntelligent Storage522131812 14 285252310 13310 131318 19 20 212822 23917 255Container Storage14 22 23Container Read Buffer231822141352310

Restore Process with Chunk-based Caching[2]Recipe822 182510 14 18 13 22328 23 12 13 32 23 286 22 23Chunk CacheAssembling BufferRestored Data Storage3Center for Research inIntelligent Storage522131812 14 285252310 13310 131318 19 20 212822 23917 255Container Storage14 22 23Container Read Buffer231822141352310

Restore Process with Chunk-based Caching[2]Recipe822 182510 14 18 13 22328 23 12 13 32 23 286 22 23Chunk CacheAssembling BufferRestored Data Storage3Center for Research inIntelligent Storage522131812 14 28525102310 13310 131318 19 20 212822 23917 255Container Storage14 22 23Container Read Buffer231822141352310

Restore Process with Chunk-based Caching[2]Recipe822 182510 14 18 13 22328 23 12 13 32 23 286 22 23Chunk CacheAssembling BufferRestored Data Storage3Center for Research inIntelligent Storage522131812 14 28525102310 13310 131318 19 20 212822 23917 255Container Storage14 22 23Container Read Buffer2318221413510

Restore Process with Chunk-based Caching[2]Recipe822 182510 14 18 13 22328 23 12 13 32 23 286 22 23Chunk CacheAssembling BufferRestored Data Storage3Center for Research inIntelligent Storage522 1813212 14 2855102310 13310 131318 19 20 212822 23917 255Container Storage14 22 23Container Read Buffer2318221413510

Restore Process with Chunk-based Caching[2]Recipe822 182510 14 18 13 22328 23 12 13 32 23 286 22 23Chunk CacheAssembling BufferRestored Data Storage3Center for Research inIntelligent Storage522 1813212 14 2855102310 13310 131318 19 20 212822 23917 255Container Storage14 22 23Container Read Buffer2318221413510

Restore Process with Chunk-based Caching[2]Recipe822 182510 14 18 13 22328 23 12 13 32 23 286 22 23Chunk CacheAssembling BufferRestored Data Storage3Center for Research inIntelligent Storage522 1813212 14 2855102310 13310 131318 19 20 212822 23917 255Container Storage14 22 23Container Read Buffer2318221413510

Restore Process with Chunk-based Caching[2]Recipe822 182510 14 18 13 22328 23 12 13 32 23 286 22 23Chunk CacheAssembling BufferRestored Data Storage3Center for Research inIntelligent Storage522 1813212 14 285510142310 13310 131318 19 20 212822 23917 255Container Storage14 22 23Container Read Buffer2318221413510

Restore Process with Chunk-based Caching[2]Recipe822 182510 14 18 13 22328 23 12 13 32 23 286 22 23Chunk CacheAssembling BufferRestored Data Storage3Center for Research inIntelligent Storage522 1813212 14 285510142310 13310 131318 19 20 212822 23917 255Container Storage14 22 23Container Read Buffer2318221413510

Restore Process with Chunk-based Caching[2]Recipe822 182510 14 18 13 22328 23 12 13 32 23 286 22 23Chunk CacheAssembling BufferRestored Data Storage3Center for Research inIntelligent Storage522 1813212 14 285510142310 13310 131318 19 20 212822 23917 255Container Storage14 22 23Container Read Buffer2318221413510

Restore Process with Chunk-based Caching[2]Recipe822 182510 14 18 13 22328 23 12 13 32 23 286 22 23Chunk CacheAssembling BufferRestored Data Storage3Center for Research inIntelligent Storage522 1813212 14 28551014 182310 13310 131318 19 20 212822 23917 255Container Storage14 22 23Container Read Buffer2318221413510

Agenda Deduplication and RestoreData Chunk Rewrite PreliminaryContainer Capping Introduction and LimitationsProposed Solutions– Flexible Container Referenced Count based Design (FCRC)– Sliding Look-Back Window (LBW) Evaluations Conclusions and Future WorkCenter for Research inIntelligent Storage

Limitations of Caching SchemeRestore DirectionRecipe2382222510 1418 17 22328 23 12 1332 23 28Container Read BufferRestored Data Storage22 23822251312 1451318 3533 66412 513 10135222 636123 42514 3Assembling Buffer5679 1781Container Storage Intelligent Storage13 226 Center for Research in5

Limitations of Caching SchemeRestore DirectionRecipe2382222510 1418 17 22328 23 12 1332 23 28Container Read BufferRestored Data Storage22 23822251312 1451318 3533 66412 513 10135222 636123 42514 3Assembling Buffer5679 1781Container Storage Intelligent Storage13 226 Center for Research in5

Limitations of Caching Scheme 4 container reads to restore 4 chunksRestore DirectionRecipe2382222510 1418 17 22328 23 12 1332 23 28Container Read BufferRestored Data Storage22 23822251312 1451318 3533 66412 513 10135222 636123 42514 3Assembling Buffer5679 1781Container Storage Intelligent Storage13 226 Center for Research in5

Limitations of Caching Scheme 4 container reads to restore 4 chunks Other chunks cannot benefit restoreRestore DirectionRecipe2382222510 1418 17 22328 23 12 1332 23 28Container Read BufferRestored Data Storage22 23822251312 1451318 3533 66412 513 10135222 636123 42514 3Assembling Buffer5679 1781Container Storage Intelligent Storage13 226 Center for Research in5

Limitations of Caching Scheme 4 container reads to restore 4 chunks Other chunks cannot benefit restorechinRecipe2382222510 1418 17 22328 23 12 13ses32 23 28Restored Data Storage22 238222513312 14Assembling Buffer51318 3533 66412 513 1013525679 178122 636123 42514Container Storage6itspContainer Read Buffer Intelligent Storage13 22 Center for Research in5g lo CaRestore Directionower

Data Chunk Rewrite SchemesDeduplication DirectionIntelligent Storage23822182510 14 18 175Active container1821 22 6656 79 1781412513 10 5213226123 42635 14Container Storage Center for Research in13 22 5

Data Chunk Rewrite SchemesDeduplication DirectionIntelligent Storage23822182510 14 18 17510Active container1821 22 6656 79 1781412513 10 5213226123 42635 14Container Storage Center for Research in13 22 5

Data Chunk Rewrite SchemesDeduplication DirectionIntelligent Storage23822182510 14 18 1751014Active container1821 22 6656 79 1781412513 10 5213226123 42635 14Container Storage Center for Research in13 22 5

Data Chunk Rewrite SchemesDeduplication DirectionIntelligent Storage23822182510 14 18 1751014 18Active container1821 22 6656 79 1781412513 10 5213226123 42635 14Container Storage Center for Research in13 22 5

Data Chunk Rewrite SchemesDeduplication DirectionIntelligent Storage23822182510 14 18 1751014 18 17Active container1821 22 6656 79 1781412513 10 5213226123 42635 14Container Storage Center for Research in13 22 5

Data Chunk Rewrite SchemesDeduplication DirectionIntelligent Storage23822182510 14 18 1751014 18 17Active container1821 22 6656 79 1781412513 10 5213226123 42635 14Container Storage Center for Research in13 22 5 1 container reads can restore 4 chunks

Data Chunk Rewrite SchemesDeduplication DirectionIntelligent Storage23822182510 14 18 1751014 18 17Active container1821 22 6656 79 1781412513 10 5213226123 42635 14Container Storage Center for Research in13 22 5 1 container reads can restore 4 chunks Tradeoff between space saving andrestore performance

Related Work Nam et al. introduced the Chunk Fragmentation Level based data chunk rewrite [4-5] The mismatch level between byte stream context and data chunk disk context is used to decide the datachunks to be rewrite, which is presented by Kaczmarczyk et al. [6] Fu et al. proposed a History-Aware Rewriting algorithm (HAR) which identifies and rewrites sparsecontainers [7-8] Tan et al. proposed a Fine-Grained defragmentation approach (FGDefrag) to identify and rewrite thefragmental chunks [9] Container capping was proposed by Lillibridge et al. [3] Center for Research inIntelligent Storage

Challenges of Rewrite Rewrite sacrifices deduplication ratio (space saving), how to make better tradeoffsbetween deduplication ratio and restore performance is challenging; Rewrite is done during the deduplication process, information is limited. Most relatedstudies are based on the past statistic information to make the decision. How to decidethe data chunks to be rewritten with limited information is challenging; The restore caching effectiveness should be considered during the dedup to reduceunnecessary rewrites. However, how to integrate caching with rewrite is not clearlyinvestigatedCenter for Research inIntelligent Storage

Agenda Deduplication and RestoreData Chunk Rewrite PreliminaryContainer Capping Introduction and LimitationsProposed Solutions– Flexible Container Referenced Count based Design (FCRC)– Sliding Look-Back Window (LBW) Evaluations Conclusions and Future WorkCenter for Research inIntelligent Storage

Introduction of Container CappingIntelligent StorageDeduplication Direction Center for Research in[3]Active container

Introduction of Container CappingIntelligent StorageDeduplication Direction Center for Research in[3]Active container

Introduction of Container CappingIntelligent StorageDeduplication Direction Center for Research in[3]SegmentActive container

Introduction of Container CappingIntelligent StorageDeduplication Direction Center for Research in[3]SegmentActive container

Introduction of Container Capping[3]Deduplication DirectionUnique chunkIntelligent Storage Center for Research inSegmentActive container

Introduction of Container Capping[3]Deduplication DirectionUnique chunkIntelligent Storage Center for Research inSegmentActive container

Introduction of Container Capping[3]Deduplication DirectionUnique chunkIntelligent Storage Center for Research inSegmentActive container

Introduction of Container Capping[3]Deduplication DirectionUnique chunkIntelligent Storage Center for Research inSegment8 chunks4 chunks3 chunks1 chunk1 chunkActive container

Introduction of Container Capping[3]Deduplication DirectionUnique chunkIntelligent Storage Center for Research inSegment8 chunksActive container4 chunks3 chunksCapping level 31 chunk1 chunk

Introduction of Container Capping[3]Deduplication DirectionUnique chunkIntelligent Storage Center for Research inSegment8 chunks4 chunksActive containerKeep the original referencing3 chunksCapping level 31 chunk1 chunk

Introduction of Container Capping[3]Deduplication DirectionUnique chunkIntelligent Storage Center for Research inSegment8 chunks4 chunksActive containerKeep the original referencing3 chunksCapping level 31 chunk1 chunkRewrite and reference to thenew container

Introduction of Container Capping[3]Deduplication DirectionUnique chunkIntelligent Storage Center for Research inSegment8 chunks4 chunksActive containerKeep the original referencing3 chunksCapping level 31 chunk1 chunkRewrite and reference to thenew container

Introduction of Container Capping[3]Deduplication DirectionUnique chunkIntelligent Storage Center for Research inSegment8 chunks4 chunksActive containerKeep the original referencing3 chunksCapping level 31 chunk1 chunkRewrite and reference to thenew container

Introduction of Container Capping[3]Deduplication DirectionUnique chunkIntelligent Storage Center for Research inSegment8 chunks4 chunksActive containerKeep the original referencing3 chunksCapping level 31 chunk1 chunkRewrite and reference to thenew containerPros:1) simple and efficient2) guarantee the higherbound of container reads

Introduction of Container Capping[3]Deduplication DirectionUnique chunkIntelligent Storage Center for Research inSegment8 chunks4 chunksActive containerKeep the original referencing3 chunksCapping level 31 chunk1 chunkRewrite and reference to thenew containerPros:1) simple and efficient2) guarantee the higherbound of container readsCons:1) a fixed capping levelcannot adapt to the workload2) might make wrongrewrite decision due to thesegment cut3) deduplication ratio isnot guaranteed

1009080706050403020100Use oldcontainersSegment1Capping LevelBe rewritten1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21Rank of old containersThe number of referencedchunksThe number of referencedchunksCapping Limitation 1: Fixed Capping Level1009080706050403020100Use oldcontainersSegment2Capping LevelBe rewritten1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21Rank of old containers The number of data chunks referenced by one container in a segment is called container referenced count (CNRC) If we sort the CNRC of containers in a segment, we can get the distributions above, the distributions of differentsegment can be very different Capping level is a fixed threshold, containers with CNRC ranked lower than the capping level are rewrittenCenter for Research inIntelligent Storage

1009080706050403020100Center for Research inIntelligent StorageSegment11 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21Rank of old containersThe number of referencedchunksThe number of referencedchunksCapping Limitation 1: Fixed Capping Level1009080706050403020100Segment21 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21Rank of old containers

1009080706050403020100Center for Research inIntelligent StorageSegment11 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21Rank of old containersThe number of referencedchunksThe number of referencedchunksCapping Limitation 1: Fixed Capping Level1009080706050403020100Segment21 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21Rank of old containers Use the same capping level, we will have 20 old container reads, and need rewrite232 chunks

1009080706050403020100Center for Research inIntelligent StorageSegment11 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21Rank of old containersThe number of referencedchunksThe number of referencedchunksCapping Limitation 1: Fixed Capping Level1009080706050403020100Segment21 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21Rank of old containers Use the same capping level, we will have 20 old container reads, and need rewrite232 chunks

1009080706050403020100Center for Research inIntelligent StorageSegment11 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21Rank of old containersThe number of referencedchunksThe number of referencedchunksCapping Limitation 1: Fixed Capping Level1009080706050403020100Segment21 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21Rank of old containers Use the same capping level, we will have 20 old container reads, and need rewrite232 chunks

1009080706050403020100Center for Research inIntelligent StorageSegment11 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21Rank of old containersThe number of referencedchunksThe number of referencedchunksCapping Limitation 1: Fixed Capping Level1009080706050403020100Segment21 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21Rank of old containers Use the same capping level, we will have 20 old container reads, and need rewrite232 chunks Use different actual capping level for different segments, the total old containerreads are still 20, but we rewrite fewer data chunks (187 chunks)

Capping Limitation 2: Fixed Segment Cutting Issuet faSegmentSegment 1Be rewritten NoBe rewrittenirlyevaluated SegmentBe rewrittenSegment 1 SegmentSegment 1Data chunks close to the cutting boundary have higher probability to be rewrittenCenter for Research inIntelligent Storage

Capping Limitation 3: Restore CachingCache effective rangeContainer read triggeredBe rewritten SegmentSegment 1When caching is applied, data chunks from one container must be cached at least asmall range of restore. The data chunks that are covered by the cache effective rangeshould not be rewrittenCenter for Research inIntelligent Storage

Agenda Deduplication and RestoreData Chunk Rewrite PreliminaryContainer Capping Introduction and LimitationsProposed Solutions– Flexible Container Referenced Count based Design (FCRC)– Sliding Look-Back Window (LBW) Evaluations Conclusions and Future WorkCenter for Research inIntelligent Storage

Address limitation 1: flexible container referenced count based thresholdFlexible Container Referenced Count based Design (FCRC)SegmentCenter for Research inIntelligent Storage To be adaptive to the ContaiNer Referenced Count (CNRC)distributions, we can use a CNRC value as the threshold Higher threshold ! rewrite more chunks (lower deduplicationratio), but fewer container reads (e.g., threshold 4) Lower threshold ! rewrite fewer chunks (higherdeduplication ratio), but more container reads (e.g., threshold 1)5 chunks4 chunks4 chunks2 chunk2 chunk1 chunk1 chunkHow can we decide the threshold for eachsegment?According to the target container reads andtarget deduplication ratio to calculate(estimate) the lower bound and higherbound of the threshold in each segment.

Set Two BoundsCenter for Research inIntelligent StorageSegment4 chunks4 chunks4 chunks2 chunk2 chunk1 chunk1 chunk

Set Two BoundsCenter for Research inIntelligent StorageSegment4 chunks4 chunks4 chunks2 chunk2 chunk1 chunk1 chunk According to the deduplication ratio reduction limit, wecan estimate, at most, how many data chunks can berewritten in this segment ! higher bound

Set Two BoundsSegment4 chunks4 chunksHigher boundCenter for Research inIntelligent Storage4 chunks2 chunk2 chunk1 chunk1 chunk According to the deduplication ratio reduction limit, wecan estimate, at most, how many data chunks can berewritten in this segment ! higher bound

Set Two BoundsSegment4 chunks4 chunksHigher boundCenter for Research inIntelligent Storage4 chunks2 chunk2 chunk1 chunk1 chunk According to the deduplication ratio reduction limit, wecan estimate, at most, how many data chunks can berewritten in this segment ! higher bound According to the target container reads, we can estimate,at most, how many old containers can be referenced in thissegment !lower bound

Set Two BoundsSegment4 chunks4 chunksHigher bound4 chunks2 chunk2 chunkLower boundCenter for Research inIntelligent Storage1 chunk1 chunk According to the deduplication ratio reduction limit, wecan estimate, at most, how many data chunks can berewritten in this segment ! higher bound According to the target container reads, we can estimate,at most, how many old containers can be referenced in thissegment !lower bound

Set Two BoundsSegment4 chunks4 chunksHigher bound4 chunks2 chunk2 chunkLower boundCenter for Research inIntelligent Storage1 chunk1 chunk According to the deduplication ratio reduction limit, wecan estimate, at most, how many data chunks can berewritten in this segment ! higher bound According to the target container reads, we can estimate,at most, how many old containers can be referenced in thissegment !lower bound The actual CNRC threshold is in between (if cannot beset, satisfy deduplication ratio first)

Set Two BoundsSegment4 chunks4 chunksHigher bound4 chunksThreshold is 32 chunk2 chunkLower boundCenter for Research inIntelligent Storage1 chunk1 chunk According to the deduplication ratio reduction limit, wecan estimate, at most, how many data chunks can berewritten in this segment ! higher bound According to the target container reads, we can estimate,at most, how many old containers can be referenced in thissegment !lower bound The actual CNRC threshold is in between (if cannot beset, satisfy deduplication ratio first)

Set Two BoundsSegment4 chunks4 chunksHigher bound4 chunksThreshold is 32 chunk2 chunkLower boundCenter for Research inIntelligent Storage1 chunk1 chunk According to the deduplication ratio reduction limit, wecan estimate, at most, how many data chunks can berewritten in this segment ! higher bound According to the target container reads, we can estimate,at most, how many old containers can be referenced in thissegment !lower bound The actual CNRC threshold is in between (if cannot beset, satisfy deduplication ratio first) If we rewrite fewer chunks or referenced fewer containersin this segment, we can accumulate the “credits” ofcontainer reads or rewrites savings for the futuresegments to extend the two bounds

Set Two BoundsSegment4 chunks4 chunksHigher bound4 chunksThreshold is 32 chunk2 chunkLower boundCenter for Research inIntelligent Storage1 chunk1 chunk According to the deduplication ratio reduction limit, wecan estimate, at most, how many data chunks can berewritten in this segment ! higher bound According to the target container reads, we can estimate,at most, how many old containers can be referenced in thissegment !lower bound The actual CNRC threshold is in between (if cannot beset, satisfy deduplication ratio first) If we rewrite fewer chunks or referenced fewer containersin this segment, we can accumulate the “credits” ofcontainer reads or rewrites savings for the futuresegments to extend the two bounds e.g., for next segment, we can move the lower bound 3containers “down” and higher bound 4 chunks “up”.

Agenda Deduplication and RestoreData Chunk Rewrite PreliminaryContainer Capping Introduction and LimitationsProposed Solutions– Flexible Container Referenced Count based Design (FCRC)– Sliding Look-Back Window (LBW) Evaluations Conclusions and Future WorkCenter for Research inIntelligent Storage

Address limitation 2: Sliding look-back windowSliding Look-back Window Assisted Rewrite Basic idea: using a sliding window to cover a range of data chunks, ensure that each chunkis evaluated with the same amount of past and future information To be efficient, the window is moved in container size (4 chunks in this example) Data chunks, whose rewrite decision cannot be made, will be temporally cached until thewindow moves to cover its subsequence. In this way, we can finally make the rewritedecisionIntelligent StorageRecipe entry cache Center for Research inRewrite candidate cacheLook-back windowActive container

ArchitectureCenter for Research inIntelligent StorageRecipe cacheRecipePersistent storeLook back window Candidate chunkNon-rewrite chunkImmutable recipe entryRewrite candidate cacheActive containerMutable recipe entryContainerPersistent Storage

Address limitation 3: consider cache effective range when making rewrite decisionsConsidering Restore Caching During Rewrite According to the caching algorithm and cache space size, we can estimate thecache effective range (all data chunks from read-in container are cached at least #data chunks restore)For example: FAA: the cache effective range is the FAA size (guaranteed) Chunk-based cache: if the cache space is S chunks, x% data chunks in a containerare used in average, container size is C chunks, the cache effective range is:(not guaranteed, just the estimate in average)Center for Research inIntelligent Storage

ExampleRecipe cache LBW size: 2 containers (8 chunks) Cache effective range: 3 containers (12 chunks) Rewrite condition: The container has not been referenced for the LBW size chunks The container reference count of aforementioned container is always 2Rewrite candidate cacheUnique chunkIntelligent Storage Center for Research inLook-back windowActive container

Example LBW size: 2 containers (8 chunks) Cache effective range: 3 containers (12 chunks) Rewrite condition: The container has not been referenced for the LBW size chunks The container reference count of aforementioned container is always 2Recipe cacheRewrite candidate cacheUnique chunkIntelligent Storage Center for Research inLook-back windowActive container

Example LBW size: 2 containers (8 chunks) Cache effective range: 3 containers (12 chunks) Rewrite condition: The container has not been referenced for the LBW size chunks The container reference count of aforementioned container is always 2Recipe cacheRewrite candidate cacheUnique chunkIntelligent Storage Center for Research inLook-back windowActive container

Example LBW size: 2 containers (8 chunks) Cache effective range: 3 containers (12 chunks) Rewrite condition: The container has not been referenced for the LBW size chunks The container reference count of aforementioned container is always 2Recipe cacheRewrite candidate cacheUnique chunkIntelligent Storage Center for Research inLook-back windowAct

Sliding Look-Back Window Assisted Data Chunk Rewriting for Improving Deduplication Restore Performance Zhichao Cao1, Shiyong Liu2, Fenggang Wu1, Guohua Wang3, Bingzhe Li1, and David H.C. Du1 1University of Minnesota, Twin Cities 2Ocean University of China 3South China University of Technology 02/26/2019