Dell EMC Isilon: CloudPools And Microsoft Azure

Transcription

Technical White PaperDell EMC PowerScale: CloudPools and MicrosoftAzureArchitectural overview, considerations, and best practicesAbstractThis white paper provides an overview of Dell EMC PowerScale CloudPoolssoftware in OneFS 9.1.0.0. It describes its policy-based capabilities that canreduce storage costs and optimize storage by automatically moving infrequentlyaccessed data to Microsoft Azure .April 2021H17746.4

RevisionsRevisionsDateDescriptionApril 2019Initial releaseOctober 2019Updated snapshot efficiencyJune 2020Updated best practicesOctober 2020Updated CloudPools operationsApril 2021Updated best practicesAcknowledgmentsAuthor: Jason He (Jason.He@dell.com)Dell EMC and the authors of this document welcome your feedback on this white paper.This document may contain certain words that are not consistent with Dell's current language guidelines. Dell plans to update the document oversubsequent future releases to revise these words accordingly.This document may contain language from third party content that is not under Dell's control and is not consistent with Dell's current guidelines for Dell'sown content. When such third party content is updated by the relevant third parties, this document will be revised accordingly.The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in thispublication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.Use, copying, and distribution of any software described in this publication requires an applicable software license.Copyright 2019 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or itssubsidiaries. Other trademarks may be trademarks of their respective owners. [4/25/2021] [Technical White Paper] [H17746.4]2Dell EMC PowerScale: CloudPools and Microsoft Azure H17746.4

Table of contentsTable of contentsRevisions.2Acknowledgments .2Table of contents .3Executive summary .5Audience .51CloudPools solution architectural overview .61.1PowerScale.61.1.1 SmartPools .71.1.2 SmartLink files .71.1.3 File pool policies .71.2Microsoft Azure .91.2.1 Cloud metadata object .91.2.2 Cloud data object .91.3CloudPools operations .91.3.1 Archive .91.3.2 Recall .101.3.3 Read .111.3.4 Update .122CloudPools 2.0 .142.1NDMP and SyncIQ support .142.2Non-disruptive upgrade support .152.3Snapshot efficiency .152.3.1 Scenario 1 .162.3.2 Scenario 2 .162.3.3 Scenario 3 .172.3.4 Scenario 4 .172.3.5 Scenario 5 .1832.4Sparse files handling .192.5Quota management .192.6Anti-virus integration .202.7WORM integration .20Best practices for PowerScale storage and Microsoft Azure .213.1PowerScale configuration .213.1.1 CloudPools settings .213Dell EMC PowerScale: CloudPools and Microsoft Azure H17746.4

3.1.2 File pool policy .223.1.3 Other considerations .223.2Microsoft Azure configuration .243.3Protecting SmartLink files .243.3.1 SyncIQ .243.3.2 NDMP .253.445Performance .26Reporting .274.1CloudPools network stats .274.2Query network stats by CloudPools account .274.3Query network stats by file pool policy .274.4Query history network stats .28Commands and troubleshooting .295.1Commands .295.1.1 CloudPools archive .295.1.2 CloudPools recall .295.1.3 CloudPools job monitoring .295.2Troubleshooting .305.2.1 CloudPools state.305.2.2 CloudPools logs .31AStep-by-step configuration example .32A.1Microsoft Azure configuration .32A.2PowerScale configuration .33A.2.1 Verify licensing.34A.2.2 Cloud storage account .34A.2.3 CloudPool .35A.2.4 File pool policy .36A.2.5 Run SmartPools job for CloudPools .38A.2.6 SyncIQ policy .39A.3SmartLink files protection .41A.3.1 Fail over to the secondary PowerScale cluster .42A.3.2 Fail back to primary PowerScale cluster .43BTechnical support and resources .46B.14Related resources.46Dell EMC PowerScale: CloudPools and Microsoft Azure H17746.4

Executive summaryExecutive summaryThis white paper describes about how Dell EMC PowerScale CloudPools in OneFS 9.0 integrates withMicrosoft Azure and it covers the following topics: CloudPools solution architectural overviewCloudPools 2.0 introduction with a focus on the following improvements:- Dell EMC PowerScale NDMP and Dell EMC PowerScale SyncIQ supportNon-disruptive upgrade (NDU) supportSnapshot efficiencySparse files handlingQuota managementAnti-virus integrationWORM integrationGeneral considerations and best practices for a CloudPools implementationCloudPools reporting, commands, and troubleshootingAudienceThis white paper is intended for experienced system administrators, storage administrators, and solutionarchitects interested in learning how CloudPools works and understanding the CloudPools solutionarchitecture, considerations, and best practices.This guide assumes the reader has a working knowledge of the following: Network-attached storage (NAS) systemsDell EMC PowerScale scale-out storage architecture and Dell EMC PowerScale OneFSoperating systemMicrosoft AzureThe reader should also be familiar with PowerScale and Azure documentation resources including thefollowing: 5Dell EMC OneFS release notes, available on Dell EMC Support, containing important informationabout resolved and known issuesDell EMC PowerScale OneFS Best PracticesMicrosoft AzureDell EMC PowerScale: CloudPools and Microsoft Azure H17746.4

CloudPools solution architectural overview1CloudPools solution architectural overviewThe CloudPools feature of OneFS allows tiering cold or infrequently accessed data to lower-cost cloudstorage. It is built on the Dell EMC PowerScale SmartPools file pool policy framework, which providesgranular control of file placement on a PowerScale cluster.CloudPools extends the PowerScale namespace to the public cloud, Microsoft Azure, as illustrated in Figure1. It allows applications and users to seamlessly retain access to data through the same network path andprotocols regardless of where the file data physically resides.Microsoft AzureExtended OneFSnamespaceDell EMC PowerScaleApplicationsClientsSMB NFS HDFS S3CloudPools solution overviewNote: A SmartPools license and a CloudPools license are required on each node of the PowerScale cluster.A minimum of Dell EMC Isilon OneFS version 8.0.0 is required for CloudPools 1.0, and Dell EMC IsilonOneFS version 8.2.0 for CloudPools 2.0.Policies are defined on the PowerScale cluster and drive the tiering of data. Clients can access the archiveddata through various protocols including SMB, NFS, HDFS, and S3.1.1PowerScaleThis section describes key CloudPools concepts including the following: 6SmartPoolsSmartLink filesFile pool policiesDell EMC PowerScale: CloudPools and Microsoft Azure H17746.4

CloudPools solution architectural overview1.1.1SmartPoolsSmartPools is the OneFS data tiering framework of which CloudPools is an extension. SmartPools alone tiersdata between different node types within a PowerScale cluster. CloudPools also adds to tier data outside of aPowerScale cluster.1.1.2SmartLink filesAlthough file data is moved to cloud storage, the files remain visible in OneFS. After file data has beenarchived to the cloud storage, the file is truncated to an 8 KB file. The 8 KB file is called a SmartLink file orstub file. Each SmartLink file contains a data cache and a map. The data cache is used to retain a portion ofthe file data locally, and the map points to all cloud objects.Figure 2 shows the contents of a SmartLink file and the mapping to cloud objects.SmartLink file1.1.3File pool policiesBoth CloudPools and SmartPools use the file pool policy engine to define which data on a cluster should liveon which tier or be archived to a cloud storage target. The SmartPools and CloudPools job has acustomizable schedule that runs once a day by default. If files match the criteria specified in a file pool policy,the content of those files is moved to cloud storage during the job execution. A SmartLink file is left behind onthe PowerScale cluster that contains information about where to retrieve the data. In CloudPools 1.0, theSmartLink file is sometimes referred to as a stub, which is a unique construct that does not behave like anormal file. In CloudPools 2.0, the SmartLink file is an actual file that contains pointers to the CloudPool targetwhere the data resides.This section describes the key options when configuring a file pool policy, which includes the following: 1.1.3.1EncryptionCompressionFile matching criteriaLocal data cacheData retentionEncryptionCloudPools provides an option to encrypt data before it is sent to the cloud storage. It leverages thePowerScale key management module for data encryption and uses AES-256 as the encryption algorithm. Thebenefit of encryption is that only encrypted data is being sent over the network.7Dell EMC PowerScale: CloudPools and Microsoft Azure H17746.4

CloudPools solution architectural overview1.1.3.2CompressionCloudPools provides an option to compress data before it is sent to the cloud storage. It implements blocklevel compression using the zlib compression library. CloudPools does not compress data that is alreadycompressed.1.1.3.3File matching criteriaWhen files match a file pool policy, CloudPools moves the file data to the cloud storage. File matching criteriaenable defining a logical group of files as a file pool for CloudPools. It defines which data should be archivedto cloud storage.File matching criteria include the following: File namePathFile typeFile attributeModifiedAccessedMetadata changedCreatedSizeAny number of file matching criteria can be added to refine a file pool policy for CloudPools.1.1.3.4Local data cacheCaching is used to support local reading and writing of SmartLink files. It reduces bandwidth costs byeliminating repeated fetching of file data for repeated reads and writes to optimize performance.Note: The data cache is used for temporarily caching file data from the cloud storage on PowerScale diskstorage for files that have been moved off cluster by CloudPools.The local data cache is always the authoritative source for data. CloudPools looks for data in the local datacache first. If the file being accessed is not in the local data cache, CloudPools fetches the data from thecloud. CloudPools writes the updated file data in the local cache first and periodically sends the updated filedata to the cloud.CloudPools provides the following configurable data cache settings: 8Cache expiration: This

9 Dell EMC PowerScale: CloudPools and Microsoft Azure H17746.4 1.1.3.5 Data retention Data retention is a concept used to determine how long to keep cloud objects on the cloud storage. There are three different retention periods: Cloud data retention period: This option is used to spec