Dell PowerScale: CloudPools And Amazon Web Services

Transcription

Technical White PaperDell PowerScale: CloudPools and Amazon WebServicesArchitectural overview, considerations, and best practicesAbstractThis white paper provides an overview of Dell PowerScale CloudPools softwarein OneFS 9.4.0.0. It describes its policy-based capabilities that can reducestorage costs and optimize storage by automatically moving infrequentlyaccessed data to Amazon Web Services (AWS).April 2022H17747.6

RevisionsRevisionsDateDescriptionApril 2019Initial releaseOctober 2019Updated snapshot efficiencyJune 2020Updated best practicesOctober 2020Updated CloudPools operationsApril 2021Updated best practicesOctober 2021Updated performanceApril 2022Updated reportingAcknowledgmentsAuthor: Jason He (Jason.He@dell.com)Dell and the authors of this document welcome your feedback on this white paper.This document may contain certain words that are not consistent with Dell's current language guidelines. Dell plans to update the document oversubsequent future releases to revise these words accordingly.This document may contain language from third party content that is not under Dell's control and is not consistent with Dell's current guidelines for Dell'sown content. When such third party content is updated by the relevant third parties, this document will be revised accordingly.The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in thispublication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.Use, copying, and distribution of any software described in this publication requires an applicable software license.Copyright 2019—2022 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC, Dell and other trademarks are trademarks of Dell Inc. or itssubsidiaries. Other trademarks may be trademarks of their respective owners. [3/31/2022] [Technical White Paper] [H17747.6]2Dell PowerScale: CloudPools and Amazon Web Services H17747.6

Table of contentsTable of contentsRevisions.2Acknowledgments .2Table of contents .3Executive summary.6Audience .61CloudPools solution architectural overview .71.1PowerScale.71.1.1 SmartPools .81.1.2 SmartLink files .81.1.3 File pool policies .81.2AWS .101.2.1 Cloud metadata object .101.2.2 Cloud data object .101.3CloudPools operations .101.3.1 Archive .101.3.2 Recall .111.3.3 Read .121.3.4 Update .132CloudPools 2.0 .152.1AWS signature v4 authentication support .152.2Commercial Cloud Services support .162.3NDMP and SyncIQ support .162.4Nondisruptive upgrade support .172.5Snapshot efficiency .172.5.1 Scenario 1.182.5.2 Scenario 2.182.5.3 Scenario 3.192.5.4 Scenario 4.192.5.5 Scenario 5.20332.6Sparse files handling .212.7Quota management .212.8Anti-virus integration .222.9WORM integration .22Best practices for PowerScale storage and AWS .23Dell PowerScale: CloudPools and Amazon Web Services H17747.6

Table of contents3.1PowerScale configuration .233.1.1 CloudPools settings .233.1.2 File pool policy .233.1.3 Other considerations .243.2AWS configuration .263.3Protecting SmartLink files .263.3.1 SyncIQ .263.3.2 NDMP .273.445Performance .28Reporting .294.1CloudPools network stats introduction .294.2Query network stats by CloudPools account .294.3Query network stats by file pool policy .294.4Query history network stats .304.5Cloud statistics namespace with CloudPools .30Commands and troubleshooting .315.1Commands .315.1.1 CloudPools archive .315.1.2 CloudPools recall .315.1.3 CloudPools job monitoring.315.2Troubleshooting .325.2.1 CloudPools state.325.2.2 CloudPools logs .33AStep-by-step configuration example .34A.1AWS configuration .34A.1.1 S3 .34A.1.2 C2S S3 .35A.2PowerScale configuration .36A.2.1 Verify licensing.36A.2.2 Cloud storage account for S3 .37A.2.3 Cloud storage account for C2S S3 .38A.2.4 CloudPool for S3.40A.2.5 CloudPool for C2S S3 .41A.2.6 File pool policy .42A.2.7 Run SmartPools job for CloudPools .44A.2.8 SyncIQ policy .454Dell PowerScale: CloudPools and Amazon Web Services H17747.6

Table of contentsA.3SmartLink files protection .47A.3.1 Fail over to the secondary PowerScale cluster .48A.3.2 Fail back to primary PowerScale cluster .49BTechnical support and resources .52B.15Related resources .52Dell PowerScale: CloudPools and Amazon Web Services H17747.6

Executive summaryExecutive summaryThis white paper describes how Dell PowerScale CloudPools in OneFS 9.4.0.0 integrates with Amazon WebServices (AWS) and it covers the following topics: CloudPools solution architectural overviewCloudPools 2.0 introduction with a focus on the following improvements:- AWS signature v4 authentication supportCommercial Cloud Services (C2S) supportPowerScale NDMP and PowerScale SyncIQ supportNon-disruptive upgrade (NDU) supportSnapshot efficiencySparse files handlingQuota managementAnti-virus integrationWORM integrationGeneral considerations and best practices for a CloudPools implementationCloudPools reporting, commands, and troubleshootingAudienceThis white paper is intended for experienced system administrators, storage administrators, and solutionarchitects interested in learning how CloudPools works and understanding the CloudPools solutionarchitecture, considerations, and best practices.This guide assumes the reader has a working knowledge of the following: Network-attached storage (NAS) systemsPowerScale scale-out storage architecture and PowerScale OneFS operating systemAWSThe reader should also be familiar with PowerScale and AWS documentation resources including thefollowing: 6OneFS release notes, available on Dell Support, containing important information about resolvedand known issuesDell PowerScale OneFS Best PracticesAmazon Web Services (AWS)Dell PowerScale: CloudPools and Amazon Web Services H17747.6

CloudPools solution architectural overview1CloudPools solution architectural overviewThe CloudPools feature of OneFS allows tiering cold or infrequently accessed data to lower-cost cloudstorage. It is built on the PowerScale OneFS SmartPools file pool policy framework, which provides granularcontrol of file placement on a PowerScale cluster.CloudPools extends the PowerScale namespace to the public cloud, AWS, as shown in Figure 1. It allowsapplications and users to seamlessly retain access the data through the same network path and protocolsregardless of where the file data physically resides.Extended OneFSnamespaceAWSDell PowerScaleApplicationsClientsSMB NFS HDFS S3CloudPools solution overviewNote: A SmartPools license and a CloudPools license are required on each node of the PowerScale cluster.A minimum of Dell Isilon OneFS version 8.0.0 is required for CloudPools 1.0, and Isilon OneFS version 8.2.0for CloudPools 2.0.Policies are defined on the PowerScale cluster and drive the tiering of data. Clients can access the archiveddata through various protocols including SMB, NFS, HDFS, and S3.1.1PowerScaleThis section describes key CloudPools concepts including the following: 7SmartPoolsSmartLink filesFile pool policiesDell PowerScale: CloudPools and Amazon Web Services H17747.6

CloudPools solution architectural overview1.1.1SmartPoolsSmartPools is the OneFS data tiering framework, of which CloudPools is an extension. SmartPools alonetiers data between different node types within a PowerScale cluster. CloudPools also adds to tier data outsideof a PowerScale cluster.1.1.2SmartLink filesAlthough file data is moved to cloud storage, the files remain visible in OneFS. After file data has beenarchived to the cloud storage, the file is truncated to an 8 KB file. The 8 KB file is called a SmartLink file orstub file. Each SmartLink file contains a data cache and a map. The data cache is used to retain a portion ofthe file data locally, and the map points to all cloud objects.Figure 2 shows the contents of a SmartLink file and the mapping to cloud objects.SmartLink file1.1.3File pool policiesBoth CloudPools and SmartPools use the file pool policy engine to define which data on a cluster should liveon which tier or be archived to a cloud storage target. The SmartPools and CloudPools job has acustomizable schedule that runs once a day by default. If files match the criteria specified in a file pool policy,the content of those files is moved to cloud storage during the job execution. A SmartLink file is left behind onthe PowerScale cluster that contains information about where to retrieve the data. In CloudPools 1.0, theSmartLink file sometimes referred to as a stub, which is a unique construct that does not behave like a normalfile. In CloudPools 2.0, the SmartLink file is an actual file that contains pointers to the CloudPool target wherethe data resides.This section describes the key options when configuring a file pool policy, which includes the following: 1.1.3.1EncryptionCompressionFile matching criteriaLocal data cacheData retentionEncryptionCloudPools provides an option to encrypt data before it is sent to the cloud storage. It leverages thePowerScale key management module for data encryption and uses AES-256 as the encryption algorithm. Thebenefit of encryption is that only encrypted data is being sent over the network.8Dell PowerScale: CloudPools and Amazon Web Services H17747.6

CloudPools solution architectural overview1.1.3.2CompressionCloudPools provides an option to compress data before it is sent to the cloud storage. It implements blocklevel compression using the zlib compression library. CloudPools does not compress data that is alreadycompressed.1.1.3.3File-matching criteriaWhen files match a file pool policy, CloudPools moves the file data to the cloud storage. File matching criteriaenable defining a logical group of files as a file pool for CloudPools. It defines which data should be archivedto cloud storage.File matching criteria include the following: File namePathFile typeFile attributeModifiedAccessedMetadata changedCreatedSizeAny number of file matching criteria can be added to refine a file pool policy for CloudPools.1.1.3.4Local data cacheCaching is used to support local reading and writing of SmartLink files. It reduces bandwidth costs byeliminating repeated fetching of file data for repeated reads and writes to optimize performance.Note: The data cache is used for temporarily caching file data from the cloud storage on PowerScale diskstorage for files that have been moved off cluster by CloudPools.The local data cache is always the authoritative source for data. CloudPools looks for data in the local datacache first. If the file being accessed is not in the local data cache, CloudPools fetches the data from thecloud. CloudPools writes the updated file data in the local cache first and periodically sends the updated filedata to the cloud.CloudPools provides the following configurable data cache settings: 9Cache expiration: This option is used to specify the number of days until OneFS purges expiredcache information in SmartLink files. The default value is one day.Writeback frequency: This option is used to specify the interval at which OneFS writes the datastored in the cache of SmartLink files to the cloud. The default value is nine hours.Cache read ahead: This option is used to specify the cache read ahead strategy for cloud objects(partial or full). The default value is partial.Accessibility: This option is used to specify how data is cached in SmartLink files when a user orapplication accesses a SmartLink file on the PowerScale cluster. Values are cached (default) and nocache.Dell PowerScale: CloudPools and Amazon Web Services H17747.6

CloudPools solution architectural overview1.1.3.5Data retentionData retention is a concept used to determine how long to keep cloud objects on the cloud storage. There arethree different retention periods: Cloud data retention period: This option is used to specify the length of time cloud objects areretained after the files have been fully recalled or deleted. The default value is one week.Incremental backup retention period for NDMP incremental backup and SyncIQ: This option isused to specify the length of time that CloudPools retains cloud objects referenced by a SmartLinkfile. And SyncIQ replicates the SmartLink file or NDMP backs up the SmartLink file using anincremental NDMP backup. The default value is five years.Full backup retention period for NDMP only: This option is used to specify the length of time thatOneFS retains cloud data referenced by a SmartLink file. And NDMP backs up the SmartLink fileusing a full NDMP backup. The default value is five years.Note: If more than one period applies to a file, the longest period is applied.1.2AWSThis section describes the following cloud objects in AWS: 1.2.1Cloud metadata objectCloud data objectCloud metadata objectA cloud metadata object (CMO) is a CloudPools object in AWS that is used for supportability purposes.1.2.2Cloud data objectA cloud data object (CDO) is a CloudPools object that stores file data in AWS. File data is split into 2 MBchunks to optimize performance before sending it to AWS. The chunk is called a CDO. If file data is less thanthe chunk size, the CDO size is equal to the size of the file data.Note: The chunk size is 1 MB in CloudPools 1.0 and in OneFS releases before version 8.2.0.1.3CloudPools operationsThis section describes the workflow of CloudPools operations: 1.3.1ArchiveRecallReadUpdateArchiveThe archive operation is the CloudPools process of moving file data from the local PowerScale cluster tocloud storage. Files are archived either using the SmartPools Job or from the command line. The CloudPoolsarchive process can be paused or resumed. See the section 5.1 for details.10Dell PowerScale: CloudPools and Amazon Web Services H17747.6

CloudPools solution architectural overviewFigure 3 shows the workflow of the CloudPools archive.Dell PowerScalePowerScaleAWS1. A file matches afile pool policy.AWS2. The file data issplit into chunks(CDO).4SmartLinkCMOCDO14File pool policyPDF CDOCDO3CDO23. The chunks are sentfrom the PowerScalecluster to AWS.4. The file is truncated intoa SmartLink file and a CMOis written to AWS.Archive workflowMore workflow details include the following: The file pool policy in step 1 (see section 1.1.3) specifies a cloud target and cloud-specificparameters. Example policies include the following: 1.3.2Encryption (section 1.1.3.1)Compression (section 1.1.3.2)Local data cache (section 1.1.3.4)Data retention (section 1.1.3.5)When chunks are sent from the PowerScale cluster to AWS in step 3, a checksum is applied for eachchunk to ensure data integrity.RecallThe recall operation is the CloudPools process of reversing the archive process. It replaces the SmartLink fileby restoring the original file data on the PowerScale cluster and removing the cloud objects in AWS. Therecall process can only be performed using the command line. The CloudPools recall process can be pausedor resumed. See the section 5.1 for detailed instructions on commands.11Dell PowerScale: CloudPools and Amazon Web Services H17747.6

CloudPools solution architectural overviewFigure 4 shows the workflow of CloudPools recall.Dell PowerScalePowerScaleAWS1. OneFS retrieves the CDOsfrom the AWS to thePowerScale cluster.AWSCDOSmartLink21PDF 2. The SmartLinkfile is replaced byrestoring theoriginal file data.CDOCDOCDOCMO33. The cloud objects are removed inthe AWS asynchronously if the dataretention period is expired.Recall workflow1.3.3ReadThe read operation is the CloudPools process of client data access, known as inline access. When a clientopens a file for read, the blocks will be added to the cache in the associated SmartLink file by default. Thecache can be disabled by setting the accessibility. For more detail, see the section local data cache.12Dell PowerScale: CloudPools and Amazon Web Services H17747.6

CloudPools solution architectural overviewFigure 5 shows the workflow of CloudPools read by default.Dell PowerScalePowerScaleAWS1. Client accesses thefile through theSmartLink file.AWS2. OneFS retrieves CDOs from AWSto the local cache on thePowerScale cluster.SmartLinkCDOCDOLocal cache2 CDOCDOCMO4133. File data is sent tothe client from thelocal cache on thecluster.4. OneFS purgesexpired cacheinformation for theSmartLink file.ClientsRead workflowStarting from OneFS 9.1.0.0, cloud object cache is introduced to enhance CloudPools functions forcommunicating with cloud. In step 1, OneFS looks for data in the object cache first and OneFS retrieves datafrom the object cache if the data is already in the object cache. Cloud object cache reduces the number ofrequests to AWS when reading a file.Prior to OneFS 9.1.0.0, OneFS looks for data in the local data cache first in step 1. It moves to step 3 if thedata is already in the local data cache.Note: Cloud object cache is per node. Each node maintains its own object cache on the cluster.1.3.4UpdateThe update operation is the CloudPools process that occurs when clients update data. When clients changeto a SmartLink file, CloudPools first writes the changes in the data local cache and then periodically sends theupdated file data to AWS. The space used by the cache is temporary and configurable. For more information,refer to the section local data cache.13Dell PowerScale: CloudPools and Amazon Web Services H17747.6

CloudPools solution architectural overviewFigure 6 shows the workflow of the CloudPools update.Dell PowerScalePowerScaleAWS1. Client accesses thefile through theSmartLink file.AWS2. OneFS retrieves CDOs from AWS,putting the file data in the local cache.CDOSmartLinkCDOLocal cache512CMO43. Client updates thefile and thosechanges are stored inthe local cache.34. OneFS sends the updated file data fromthe local cache to AWS.5. OneFS purgesexpired cacheinformation for theSmartLink file.ClientsUpdate workflow14Dell PowerScale: CloudPools and Amazon Web Services H17747.6 CDOCDO

CloudPools 2.02CloudPools 2.0CloudPools 2.0 is the next generation of CloudPools, released in OneFS 8.2.0. This section describes thefollowing improvements in CloudPools 2.0: 2.1AWS signature v4 authentication supportCommercial Cloud Services (C2S) supportNDMP and SyncIQ supportNon-disruptive upgrade (NDU) supportSnapshot efficiencySparse files handlingQuota managementAnti-virus integrationWORM integrationAWS signature v4 authentication supportCloudPools 2.0 supports AWS signature version 4 (V4) with signature version 2 (V2). V4 provides an extralevel of security for authentication with the enhanced algorithm and no action is required from end users. Formore information about V4, refer to the article Authenticating Requests: AWS Signature V4.CloudPools 2.0 handles the compatibility of SyncIQ for data replication and NDMP for data backup andrestore. When the source and target PowerScale clusters use different authentication versions, consider thefollowing points for CloudPools features: With SyncIQ, when the source PowerScale cluster is running OneFS 8.2.0 and the targetPowerScale cluster is running a version of OneFS before version 8.2.0: With NDMP, when files are restored from tape to the target PowerScale cluster: If the CloudPools cloud storage account is using V2 or V4 on the source PowerScale cluster, V2is used on the target PowerScale cluster.If the CloudPools cloud storage account is using V4 on the target PowerScale cluster, V4 is used.If the CloudPools cloud storage account is using V2 on the target PowerScale cluster, V2 is used.With NDU, when upgrading OneFS to version 8.2.0: Once the PowerScale cluster is COMMITTED to OneFS 8.2.0, it automat

A cloud data object (CDO) is a CloudPools object that stores file data in AWS. File data is split into 2 MB chunks to optimize performance before sending it to AWS. The chunk is called a CDO. If file data is less than the chunk size, the CDO size is equal to the size of the file data.