VNX WITH THE CLOUD TIERING APPLIANCE - Dell USA

Transcription

White PaperVNX WITH THE CLOUD TIERING APPLIANCEA Detailed ReviewAbstractThis paper describes the EMC Cloud Tiering Appliance (CTA). TheCTA enables NAS data tiering, allowing administrators to moveinactive data from high-performance storage to less-expensivearchival storage, thus enabling cost-effective use of file storage.The CTA also facilitates data migration which moves data to newshares or exports.July 2012

Copyright 2012 EMC Corporation. All Rights Reserved.EMC believes the information in this publication is accurate asof its publication date. The information is subject to changewithout notice.The information in this publication is provided “as is.” EMCCorporation makes no representations or warranties of any kindwith respect to the information in this publication, andspecifically disclaims implied warranties of merchantability orfitness for a particular purpose.Use, copying, and distribution of any EMC software described inthis publication requires an applicable software license.For the most up-to-date listing of EMC product names, see EMCCorporation Trademarks on EMC.com.VMware and VMware ESX are registered trademarks ortrademarks of VMware, Inc. in the United States and/or otherjurisdictions. All other trademarks used herein are the propertyof their respective owners.Part Number h10777VNX with the Cloud Tiering Appliance2

Table of ContentsExecutive summary . 4Business case . 4Solution overview . 4Introduction . 5Scope . 5Audience . 5Terminology. 5Archiving . 6Overview . 6Hierarchical Storage Management . 6FileMover. 7Archive Policies . 7Providing data to the CTA. 8Scheduler . 8Simulation. 8Recall . 8Recall using CTA-HA . 9Archive requirements for the source NAS server. 9Archive requirements for the target repository server . 10Compression and encryption . 10Multi-tiered archive . 11CTA database . 11Stub scanner jobs. 12Orphans . 12Reporting. 13Migration . 13Overview . 13Migration source. 13Migration targets . 14The migration process . 14Other CTA interactions with VNX . 15Miscellaneous topics. 16Summary . 16VNX with the Cloud Tiering Appliance3

Executive summaryThe EMC Cloud Tiering Appliance (CTA) optimizes primary NAS storage byautomatically archiving inactive files to less-expensive secondary storage. Thesecondary storage can be of lower cost, such as an NL-SAS or SATA disk on a NASdevice, or it can consist of public or private clouds platforms. After a file is archived, asmall stub file remains on the primary storage, so that the file appears to the user asif it were in its original location. File tiering dramatically improves storage efficiency,and shortens the time to back up and restore data.In addition to archiving data from primary to secondary storage, the Cloud TieringAppliance can also permanently migrate files from a source to a destination withoutleaving a stub, as when NAS server hardware requires a technology refresh.Business caseFrom the early days of computing, storage has existed in tiers or levels, withdifferences in cost, performance, availability, or redundancy, and so forth. Forexample, newer flash storage outperforms older storage which consists of NL-SAS orSATA, but costs more per gigabyte.NAS data also exists in tiers and does not all have the same value. Typically, as dataages, users access it less frequently. To optimize use of a normal tiered-NAS-storageenvironment, a customer must ensure that the less-valuable, less-frequentlyaccessed data does not consume high-speed, expensive storage resources. Highspeed storage should be reserved for the active, important data, while the less-activedata should reside on cheaper storage.In addition, the storage tiering process must work automatically, so that it does notadd to the storage administration overhead.Solution overviewThe CTA employs Hierarchical Storage Management (HSM), which has been a stapleof the mainframe world for decades. Hierarchical Storage Management moves a filefrom primary storage to lower-cost, secondary storage, and leaves a small stubpointer file in the file’s original location. This process of relocating data and stubbingdescribes archiving or tiering.The CTA acts as a policy engine by interacting with a VNX share or export, andidentifying files that fit predefined criteria. For these files, the CTA initiates movementto a lower tier repository, for example NAS, Centera, or cloud, and leaves a stub file onthe VNX share. When a client that is accessing the VNX share or export tries to readthe stub, the CTA recalls the original file from the repository tier. To the user, the fileappears to be in its original location on high-performance VNX storage. However,instead of the space required to store the entire file, only an 8 KB stub file is on theprimary tier.If the storage administrator wants to move data in the share or export to anotherlocation, for example, to replace on old Celerra with a new VNX, the CTA can help. TheVNX with the Cloud Tiering Appliance4

CTA migration feature relocates multiprotocol data including stub files, from oneshare or export to another.When used for archiving or tiering, the CTA will automatically move inactive data tolower tiers of storage. This allows more efficient use of the most expensive, highestperforming NAS storage.When used for file migration, the CTA enables relocation of NAS data, within a NASserver, across NAS servers, and across NAS servers from different vendors.IntroductionScopeThis paper outlines CTA features, how CTA functions, and the business problems thatCTA helps to solve. In a technical overview of CTA, this paper also describes how tomanage CTA and implement solutions in a VNX NAS environment.AudienceThis white paper is intended for users who have a basic understanding of VNX UnifiedStorage or, a general grasp of NAS storage concepts.TerminologyArchive repository – Lower tier storage than the NAS storage that is accessible to theCIFS or NFS clients. The repository is the target of a file archival process. In anarchiving operation, CTA moves data from the primary or source tier to the repositoryand leaves a stub file on the primary tier. The stub points to the file in the repository.A repository tier is a NAS share/export, an EMC Centera, an EMC Atmos cloud, or theAmazon S3 cloud.File archiving – A primary CTA function that scans a NAS file server for files that meetdefined criteria, and moves the files to a lower tier of storage. CTA replaces the file onthe NAS server with a stub file that points to the real file on the archive repository.File migration – The movement of files from one export or share to another as whenreplacing a NAS server.FileMover –VNX Data Movers include FileMover software. FileMover or DHSM enablesstubbing and recall of archived files, and provides an API that the CTA uses for botharchiving and migration. To use file archiving with a Celerra or VNX file system,export, or share, enable FileMover for the Data Mover.File tiering – See “File archiving.”FPolicy –NetApp Filers include FPolicy software. FPolicy enables stubbing and recallof archived files, and provides an API that the CTA uses for both archiving andmigration. The CTA uses the FPolicy interface to archive files from NetApp servers.Orphan file – When a file has been archived, a stub on the source NAS server pointsto the archived file. Deleting a stub does not automatically delete the archived file.VNX with the Cloud Tiering Appliance5

Instead, the archived file becomes an orphan or a repository file without a stubpointing to it. To delete the orphan, the CTA user runs an orphan delete job on therepository.Policy – Rules for migration, or a rules and one or more repository destinations forarchiving and tiering. For example, an archiving policy can send a file that has notbeen accessed in 1 year, to a company’s private Atmos cloud server and a that filehas not been accessed in 2 years, to a public Amazon S3 cloud.Primary storage – The storage tier that CIFS and NFS clients mount on the VNX.Source tier – See “Primary storage.”ArchivingOverviewThe CTA provides two primary functions: archiving or tiering, and migration. Archivingmoves inactive data to a lower tier of storage and leaves behind a stub file.Hierarchical Storage ManagementHistorically, mainframe computer users implemented a tiered storage systembecause disk storage was expensive and tape storage was inexpensive. To maintainthe limited free disk space on the mainframe, the user would manually move lessimportant data to tape. To retrieve data later, the user had to record the tape volumewhere the data was stored.Mainframe system developers began to consider solutions to the problem of how tomove data to cheaper, external storage, without requiring the user to rememberwhere it was stored and to manually run commands to recall the data when it wasneeded.The solution was Hierarchical Storage Management (HSM). This system would scanthe data, find files that had not been accessed for some time, and automaticallymove them to a lower tier of storage. In place of the file, the system would store asmall stub that contained an internal pointer to the file. The user would see the stubas if it were the actual file, but when trying to access it, the system would tell the userto wait while it automatically retrieved the file and restored it to the original locationon the user’s disk.HSM is the basis for the CTA archiving function. The concept is straightforward,robust, and time-tested. The CTA supports archiving to disk storage (cloud, CAS, orNAS). The existence of storage tiers in most customer environments make the HSMtiering solution as important today as it was decades ago, during the mainframe era.VNX with the Cloud Tiering Appliance6

FileMoverFileMover is NFS and CIFS file services software for the VNX and Celerra file systemthat allows HSM-style archiving. On the VNX, the primary FileMover command isDHSM, which shows its HSM roots.At a basic level, FileMover intercepts client access to data, and takes action beforethe client accesses the data. CTA is an external system that can direct FileMover. TheCTA user defines a task that directs FileMover to perform a series of actions, forexample: To scan a share for files that are more than 60 days old To move the files to the archive To replace the files with stubs To activate the CIFS offline bit on the stubsOnce the stubs are in place, FileMover monitors the stubs. When a client tries to reador write to the file, FileMover will intercept that access, and recall the data usinginformation contained in the stub.Every VNX or Celerra that functions as an archive source must have the FileMover APIenabled and configured. The configuration is part of the CTA setup.FPolicy is an API for the NetApp that is similar to FileMover.CTA needs an API similar to FileMover or FPolicy to archive from a NAS system.Because these types of APIs are not available on other platforms such as Linux orWindows, CTA can only archive data from VNX, Celerra, and NetApp.Archive PoliciesA policy in CTA archiving or tiering context consists of rules and one or moredestinations. An example of a simple policy is “if this file has not been accessed insix months, send it to the Atmos cloud, and replace it with a stub.” The one-rule, onedestination is common, and many CTA users will use this type of policy on their data.However, CTA rules are flexible. You can create more complex rules, that archive tomultiple tiers. For example, a single policy could be:“If any file has not been accessed in more than one year, and is larger than 1 MB insize, send it to my Isilon, unless it’s a PDF file, in which case don’t archive it. Thenwhen these files have not been accessed for two years, move them from the Isilon tothe Atmos cloud, and update the stub file to point to their new location.”Policy rules are based on attributes such as access time, modify time, inode changetime, file size, file name, or directory name. The archive policy action is to “archive”or “don’t archive.” A single expression or a combination of expressions define thearchive policy.VNX with the Cloud Tiering Appliance7

A policy does not contain a share name. You can define one policy to evaluate sharesA, B, and C, but define another policy to evaluate share D. Or you can define severaldifferent policies to evaluate a single share.Providing data to the CTAAdministrators usually direct a CTA archive policy to evaluate a file system, CIFSshare, or NFS export. The CTA scans the files, and applies the policy rules to each file,one at a time. If there are multiple rules in the file, the CTA contines to apply the rulesuntil a rule evaluates to “true.” It then takes the action associated with the rule, suchas archive, or don’t archive, and moves to the next file.There is another way to provide files to a CTA policy. Instead of directing the CTA toscan the files in a share, the CTA administrator imports a list of filenames, and theCTA only scans and applies the archive policy to the files in that list. This feature,called “file ingest,” is primarily for third-party vendors with software products thathave their own scanning systems, but want to use the CTA archive engine. The CloudTiering Appliance Imported File List Archive Task Technical Notes document describesthe file ingest feature and is available from Powerlink.SchedulerYou use the scheduler to set the job start time. For example, a CTA administratorschedules a batch job to start at 2:00 a.m. on Saturday, scan share01, and evaluatethe files with a policy for archiving.An administrator usually schedules a job to run weekly, every other week, or monthly.The first time the archiving job runs, the policy will often select and archive at leasthalf of the data. So the first archive job can require a long time to run and can move alot of data. Future jobs will move incremental amounts of data.SimulationThe CTA can simulate archive jobs. You can schedule an archive job with a policy, butrun it in simulation mode. The CTA will scan the source share and apply the policyrules against each file, but not take any archive action. Instead, the CTA tracks thenumber of files and amount of data it would have archived, and at the end of thesimulation, it displays a report. Simulation is a good way to test the effectiveness of apolicy and to edit the policy rules, before running a real archive job.RecallWhen a file has been archived to a repository, leaving a stub on the source NASshare, the NAS client expects the stub to look and behave like the original file. “Filerecall”is the process by which the user clicks on the stub file and quickly accessesthe original file.The stub file contains all the information needed to find the actual file. The VNX setthe offline bit on the stub when the file was archived. When a user attempts to read astub file, FileMover interacts with CTA and intercepts the read request to begin theprocess of recalling the file from the repository. If the repository is on a CIFS or NFSVNX with the Cloud Tiering Appliance8

share, FileMover recalls the file using CIFS or NFS. If the repository is a CAS or cloud(such as Centera, Atmos, or Amazon), then the VNX sends the recall request to theCTA, which will effect the recall and pass the file to the VNX.After recalling the file, the VNX either: Provides the file to the user, but leaves the stub in place, known as “passthroughrecall.” Writes the file back to its original location and deletes the stub, known as“fullrecall.”A FileMover command on the VNX Control Station sets the recall style, which can beset on a file system-by-file system basis.Recall using CTA-HAIf an archive or migration job batch job fails, there is no loss of data, You wouldcorrect the problem and rerun the job. For this reason, the complexity of a HighAvailability (HA) configuration for archival or migration is not necessary or justified.However, recall is mission-critical, because it affects the ability of clients to accesstheir data. For this reason, configurations where the CTA is in the recall path, such asarchiving from VNX/Celerra to Centera, Atmos, or Amazon, or all archival from NetApp,require an HA.The HA configuration pairs the CTA-HA physical appliance or the CTA/VE-HA virtualappliance with one or more CTA or CTA/VE systems. The CTA-HA system is a recallonly version of the CTA. If the source NAS server cannot perform the recall, either theCTA recall host or its CTA-HA partner can perform the recall.By creating a DNS hostname that maps to the IP addresses of both the CTA and theCTA-HA, and by configuring the VNX to find the CTA using that hostname, the VNX canuse both recall hosts alternately, in a round-robin fashion. This balances the recallload, and if one recall host fails, the other can perform recalls until the failed hostreturns to service. This configuration also allows maintenance of one host while theother continues to perform recalls.The CTA-HA also performs keystore replication for encryption keys that are generatedon the CTA when the encryption feature is selected during archival to Atmos orAmazon clouds as described in Compression and encryption on page 10.Archive requirements for the source NAS serverCTA can archive data from CIFS or NFS shares on VNX, Celerra, or NetApp NAS servers.FileMover for VNX or Celerra, and FPolicy for NetApp, both provide archiving services.To identify stub files, both FileMover and FPolicy read the offline bit on the stubs. TheCIFS protocol supports offline bits, but NFS does not, so the VNX/Celerra will handleoffline bits internally for NFS-only archival. The CTA communicates with VNX andCelerra using the DHSM API. Before archiving data from a VNX or Celerra, the CTAconfiguration must include the VNX or Celerra properties and the DHSM connectionsthat link file systems to one or more repositories. The CTA and FileMoverVNX with the Cloud Tiering Appliance9

automatically create the DHSM connections when needed. The Cloud TieringAppliance Getting Started Guide provides the configuration procedure for the CTAwith the VNX or Celerra.Deleting the DHSM connection on a VNX or Celerra Control Statio will optionallytrigger a recall of all stubbed data from the repository linked to the VNX or Celerra filesystem that uses that connection.The CTA and CTA-HA must have full control of the source shares. If the source includesNFS exports, these exports must have root and read/write permission for the CTA andCTA-HA IP addresses.When archiving from CIFS shares, the source server must belong to a domain and theCTA configuration settings for that server require a username from that domain. Theusername must be in the local administrator’s group of the CIFS server associatedwith the source.Archive requirements for the target repository serverThe CTA can archive to three kinds of repositories: NAS (CIFS or NFS), such as VNX, Celerra, VNXe, Data Domain, Isilon, Windows, orNetApp CAS such as Centera Cloud such as Atmos or Amazon S3Each repository has slightly different configuration requirements. Requirements for NAS repositories are similar to the requirements for sourceservers. A CIFS domain user must be in the local admin group of the CIFS server.NFS exports must have root and read/write permission for the CTA and CTA-HA IPs. Centera configuration requires a PEA file or “anonymous.” Cloud configuration requires a tenant user for Atmos or a bucket user for AmazonS3.One repository can serve as an archive target for multiple CTAs.One CTA can archive to multiple repositories.A CTA repository migration job will move all the archived data from one repository toanother, and update the stubs to point to the new location.Only the CTA or CTA-HA can have access to the CTA repositories. The NAS share thatserves as the repository is visible, but the layout of archived data is proprietary.Changes to the repository can render archived data unrecallable.Compression and encryptionCompression and encryption are options when archiving to either of the two cloudrepository tiers, Atmos, or Amazon S3. A compression style such as fast or strong is apolicy option. Encryption is also a policy option, but encryption prerequisites are:VNX with the Cloud Tiering Appliance10

1. You must configure keystore replication between the CTA and a CTA-HA machine.2. You must generate a key using the CTA GUI.The CTA stores the key in the keystore and replicates it to the CTA-HA. Every archivetask that uses a policy with encryption uses the key. If CTA generates and replicates anew key, it applies the key to new encrypted archive tasks. The old keys that remainin the keystore continue to apply for files encrypted using the old keys.Keystore replication will be sufficient for normal outages, but after generating a newkey, back up the CTA configuration to preserve the keystore.Multi-tiered archiveConsider the following example:“Find NAS files that have not been accessed in six months, and archive them to myprivate cloud storage. Find NAS files that have not been accessed in one year, andsend them to the public cloud such as Amazon S3 or AT&T’s Atmos-based cloud. Andif any files archived on the private cloud have not been accessed for one year, movethe files from the private to the public cloud, and update the stub files to point to thenew location.”This scenario describes CTA’s multi-tiered archiving feature. By creating a multi-tieredpolicy type with several rules, each with a different repository, you can design anarchiving scheme to fit this example. This multi-tiered archiving policy would have thefollowing rules: if atime 1 year, archive to Amazon S3 cloud if atime 6 months, archive to private Atmos cloudThe order of the rules is important. If a policy has several rules, the rules are appliedone at a time. When the first rule evaluates as true, CTA takes the “archive” or “don’tarchive” action. CTA does not apply the subsequent rules, and the policy moves on tothe next file.In this example, if the order of rules were reversed, you would get an unintendedresult. The 6-month old rule would be applied first, and the 1-year old rule wouldnever be applied, because any file older than 1 year is also older than 6 months. Alldata older than 6 months would be archived to the private Atmos cloud, and no fileswould be archived to the Amazon S3.CTA databaseWhen a file is archived to a repository, the stub on the source NAS tier points to thefile location in the repository. However, the file in the repository has no pointer backto the source. The repository files have no connection to the source. The CTAdatabase solves this problem. Each time a file is archived, an entry in the CTAdatabase records the file’s original location on the source, and the file’s location inthe repository.VNX with the Cloud Tiering Appliance11

CTA does not use the database for recalls, because the stub on the source includesthe information necessary to locate the file in the repository. However, the databaseinlcudes entries for every archived file. It contains statistical data and orphaninformation as described in Orphans on page 12. To protect the database, scheduleregular CTA backups. If a CTA fails, import the most recent backup into a new CTA.Stub scanner jobsFor every scheduled archive job, the CTA automatically schedules a monthly stubscanner job. The stub scanner is a utility that reads the stubs in a share andcompares them to the entries in the CTA database. If stubs move to different locationsor orphans appear, the stub scanner will ensure that the CTA database is keptcurrent.Because a stub on the source has the information necessary to recall a file from therepository, CTA does not need to query stub and repository file locations in the CTAdatabase. However, CTA can manage repository storage more efficiently ifinformation in the database matches the stub and repository file locations on thesystem, so stub scanner jobs run every 30 days by default.OrphansIf stub files are deleted from the source repository, the actual files in the repositorybecome orphans, and are not automatically deleted for the following reason.Generally a storage administrator will back up stubs when backing up CTA-archivedshares. Many NDMP-based backup programs back up stubs by default. Withprotected repositories and small stubs, a NAS server that employs CTA benefitsgreatly from faster backups and smaller backup windows. However, when a backup isrestored, the stubs need to point to something. If the CTA had deleted archived fileswhen the stubs were deleted, restoring the backup would restore stubs that point tonothing.To delete orphan files, and recover space on the repository, run an “orphan delete”job periodically. Do not delete orphans until you are certain that you will not restorestubs that point to the orphans. For example, if you keep backups for six months,then define the orphan deletion job to delete files that have been orphans for at leasteight months.The CTA database and the stub scanner play important roles in the management oforphan files. Every time the stub scanner sees a stub, the CTA records a “last seen”time in the database. If the stub is deleted, the stub scanner identifies the file in therepository that was linked to the stub as an orphan. Because the “last seen” time isin the database, the CTA knows how long the file has been an orphan. CTA uses theorphan age to determine which orphans to delete.If the CTA database is lost, the location and age of orphan files in the repository islost. CTA database backup is therefore an important process.VNX with the Cloud Tiering Appliance12

ReportingCTA generates reports on the files it archives or migrates. For archived files, CTAreports will display the size, number of files archived, and breakdown by file types,but the CTA does not give a detailed profile of the data in the file system. You can runarchive simulations to obtain information on file ages. For example, multiplesimulations, filtering for access times of various ages, can yield an age profile for thefiles in the file system.MigrationOverviewThe CTA provides two primary functions: archiving (or tiering), and migration.Migration moves files from one share or export to another.NAS migration copies data from one share or export on one system to a another.Migration is useful when replacing servers. Administrators use CTA to move data fromold to new servers with minimum disruption to the NAS client users. The CTA canperform mu

FPolicy -NetApp Filers include FPolicy software. FPolicy enables stubbing and recall of archived files, and provides an API that the CTA uses for both archiving and migration. The CTA uses the FPolicy interface to archive files from NetApp servers. Orphan file - When a file has been archived, stub on the sa ource NAS server points