HOW IMPLEMENTING EMC VNX FILE SYSTEM FEATURES

Transcription

HOW IMPLEMENTING EMC VNXFILE SYSTEM FEATURES CAN SAVEYOUR LIFEPiotr BiziorSystems AdministratorEMCSA (VNX), EMCIE (VNX, DD),EMCBA, EMCISA, MCP, MCTSpiotr.bizior@gmail.com

Table of ContentsIntroduction . 3File System Quotas and File Filtering . 5Concept . 5Quota Types . 6Findings. 6File System Filtering . 7Deduplication . 9Concepts . 9Considerations .10How it works .10Performance .12VNX File System Replicator .12Disaster Recovery Planning .12Configuration .13Considerations .15VNX File System Checkpoints .17How it works .17Checkpoint Virtual File System (CVFS) .18VNX File System Checkpoint considerations .21Conclusion .23Appendix .24About the Author .24Disclaimer: The views, processes, or methodologies published in this article are those of theauthor. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies.2014 EMC Proven Professional Knowledge Sharing2

IntroductionI could write about all the new features of EMC VNX like FAST VP and FAST Cache, but I’d liketo write few words about file system features that VNX offers. Don’t get me wrong, FAST VPand FAST Cache are both great, and they both helped our environment to improve performanceand response time of our mission critical applications, but what really made the difference, andsaved a lot of storage space on the long run, was implementing file storage features like filesystem quotas, deduplication, replication and checkpoints.If you’re a storage administrator like me, you are probably aware that overall data is predicted togrow by 50 times by 2020 (Figure 1). Meanwhile, the number of storage administrators will onlygrow by 1.5 by 2020. This means that not only do I need to find more efficient way to store thedata, back it up and plan DR activities, but also improve efficiency on my day-to-day activities.Figure 1: Predicted overall growth of data and personnel managing it by 2020This is where EMC VNX and its file storage features come into the picture. Not only doesimplementing them save money and improve RPO and RTO for your company, it can also makeyour life easier. What if I told you that you can control the growth of data that users store on2014 EMC Proven Professional Knowledge Sharing3

their home drives, monitor the usage, and block certain files from being saved? What if I toldyou that you can deduplicate file systems, which can save you up to 72% of original data size?Sounds pretty good, right? Add to the list, file system checkpoints to control point-in-timebackup capabilities and file systems replication to address Disaster Recovery (DR) concerns,and you have a solid and unified solution for file storage that addresses every aspect of itslifecycle.2014 EMC Proven Professional Knowledge Sharing4

File System Quotas and File FilteringConceptIf your organization is similar to mine, file storage is a big part of storage planning and design,especially CIFS configured to serve as home directories. Home directory is nothing more thanaccess controlled by file systems permission, share carved off the file system, and assigned to auser—in my example, as a network drive. Implementing home directories not only enables moreefficient data protection and backup of user files, along with faster and simple data recovery, italso improves data availability and enables users to access it while authenticated via VPN frombasically anywhere.Allowing end users to store their data on their home drives is very convenient for them;however, it carries lots of risk in regard to storage utilization. Currently, my organization is usingover 16TB of storage for home directories; almost 300% growth since 2009 – Figure 2:Figure 2: File system growth dedicated to home drivesThis is where implementing file system quotas comes in handy. I can control file system diskspace utilization by capping total use of bytes for storage that can be used, or the number offiles that can be created, or both. Quotas give me variety of options. They can limit total amountof data based on the user creating the data, number of files created, or can limit the totalamount of data that can be stored in a specific, new directory (tree quotas) – very usefulwhenever I wish to control data growth for a directory to which multiple users have access.2014 EMC Proven Professional Knowledge Sharing5

Quota TypesCombining quotas and their multiple settings provides solid, comprehensive control over thefiles system and its utilization. Quota types and its settings that you should know beforeimplementing include: Soft and Hard quotas – soft quotas (also called Preferred) are always set lower thanhard (also called Absolute) quotas. Depending on end user client environment, softquotas violators will receive warning message saying that soft quota limit has beenexceeded, but they’ll be able to save the file on the storage system. Meanwhile, usersthat trigger a hard quota event will see the error message. However, deny disk spaceflag will be set and the system will not store the file. A user would have to delete the filesor request a quota limit increase in order to store new files. Both soft and hard quotasevents are logged in the system and can be used to generate reports,Grace period comes hand in hand with soft quotas. It provides end users a systemdefined period of time after hitting soft quota, during which they can clean up the spaceand bring it to below soft quota limit mark. During the grace period, the system allowsusers to save new data. However, once the grace period expires, such requests will bedenied. The purpose of grace period is to allow end users to react and address possiblespace issues, while not disrupting their business activities, Tree quotas can limit the total amount of data that is allowed to be stored in a newdirectory. There are some limitations for tree quotas: they cannot be nested norimplemented on existing directories or root of the file system. I implement tree quotaswhen I want to limit storage on a project basis, where multiple users are saving files toone directory,FindingsIn my environment, the majority of cases when quotas have been implemented were for endusers’ home drives, using CIFS protocol. In a few cases, multiple users were saving files to thesame directory – in this case tree quotas were implemented. As seen in Figure 2, over the lastfew months, file system utilization for the home drives has remained stable. This is not acoincidence. Storage growth on the home drives file system has been very stable starting fromthe moment I implemented file system quotas. Refer to the Figure 3 and Figure 4 below – two2014 EMC Proven Professional Knowledge Sharing6

biggest file systems for my organization home drives storage. What makes me happy is that“Predict Full” values show that file system “will never fill its current capacity”Figure 3: File system properties – 1Figure 4: File system properties - 2File System FilteringAnother storage saving opportunity that I’ve noticed and later implemented was to deny specificfiles from being saved on shares. This not only helps with storage utilization, it also enablescontrol over what kind of files can be saved on the CIFS shares – which might be beneficialwhen your organization is going through the e-Audit. I like to think of this EMC VNX feature as afirewall mechanism for saving files on the shares. File firewall that can allow or block certaintype of files from being stored on the share, based on the access control list (ACL) and fileextension. For example: I can create a share and block everyone except “marketing” groupmembers from saving video files (.avi, .mpg, .mp4) and audio files (.mp3, .wav) on the newly2014 EMC Proven Professional Knowledge Sharing7

created share, or I can create a share and allow only .pdf documents and Microsoft Word (.doc,.docx) documents to be saved there. There are some conditions that need to be consideredwhen implementing file extension filtering: some applications – let’s take Microsoft Word as anexample – create different document extension depending on:-Application version (.doc, .docx, .rtf),-Type of document – regular document (.doc, .docx), macro-enabled document (.docm),template (.dotx), macro-enabled template (.dotm),-Temporary files that an application creates when a file is opened by end user (.tmp),-When auto-recovery option is enabled on the application (.asd),-When OLE (Object Linking and Embedding) object (i.e. link to PowerPoint presentation)is embedded into a Word document, the application creates a temporary file (.wmf),All of the above need to be reviewed just for one file extension – in my example, a MicrosoftWord document. Research on your own for each file extension you’re planning to enablefiltering and test it before implementing. This is a very powerful tool that, if used properly, hasgreat space saving potential.2014 EMC Proven Professional Knowledge Sharing8

DeduplicationConceptsEMC VNX is capacity-optimized array that provides compression for block data, anddeduplication for file data. I’d like to focus on deduplication, which is Secure Hash Algorithm(SHA-1) data compression. Unlike another flagship EMC product, Data Domain , the VNX usesdeduplication, a post-process, low-impact, low-priority for the array task that compresses filesbased on their age. It doesn’t affect the time required to access compressed files with thedefault policy which is designed to filter out files that have substantial Input/Output (I/O) access.Nonetheless, even though it sounds like it will not affect the performance of the array much,before I enable deduplication for the file system, I carefully inspect it, and analyze how often andheavily data is being utilized. I do it because I don’t want to deduplicate “hot” data – data whichis being accessed often. This will negatively impact end users, because every time they accessdeduplicated data, VNX would have to uncompresse it, possibly affecting response time.Thankfully, to manage deduplication more efficiently and granularly, VNX provides an entirepage of settings in Unisphere software - Figure 5:Figure 5: Unisphere default Deduplication settings page on file system level2014 EMC Proven Professional Knowledge Sharing9

ConsiderationsRegarding hot data I previously referred to – there are settings where I can configurededuplication not to process hot files on the file system, and at the same time avoidperformance penalty. I can define hot data by how recently an end user has modified oraccessed the file. The nature of VNX deduplication is to process files that are aged and avoidprocessing active files – newly created or modified. VNX scans the file system for whichdeduplication is enabled on a daily basis, but frequency of that can be adjusted by theAdministrator, as well as initiating file system scan immediately. I suggest not modifying defaultsettings of deduplication, which are configured for the most effective use. By doing so, you willnot be affecting the Input/Output (I/O) of the files that haven’t been processed by thededuplication since the system will not deduplicate active files.Deduplication can be set either on the Data Mover level, or on the file system level. I do nothave deduplication enabled at the Data Mover level, since I do not desire all the file systemsresiding on the array to be deduplicated. However, I do have deduplication configured for quitea few file systems, and I love it. It saves me lots of space – see Figure 6 below - up to 72% ofthe entire file system!Figure 6: Impressive savings on the file system thanks to enabling deduplicationHow it worksLet me explain how deduplication actually works on VNX. It is a background asynchronousprocess that runs on file after data is written to the file system, which increases storage2014 EMC Proven Professional Knowledge Sharing10

efficiency by eliminating redundant data from files. It is smart enough to not affect file accessservice levels, by not enabling deduplication if the space savings is minimal, and also bypreventing files too big, too small, or files too frequently accessed from being processed. Filesidentified as non-compressible are not processed and are stored without a change.Deduplication uses SHA-1 algorithm to create hash for compressed file, and to decide if the filehas been pinpointed before. If no, it copies the file to the hidden space on the file system, andupdates the internal metadata of the original data with the reference to the hidden location. If thefile has been pinpointed before, there is no need to move the data, since it’s already in thehidden location – just the internal metadata of the file is being updated.I find read access to deduplication data very interesting – VNX is using its memory todecompress the data and then pass it over to the client requesting it while data on disk arraystays unchanged. If there is a request to read file, and only a fragment of it is compressed, onlythat portion would be decompressed and presented to the client. Here is the interesting part:depending on the CPU load and requested data characteristic, accessing a file that isdeduplicated might take longer than accessing a file that is not deduplicated due to thedecompression happening. On the other hand, the exact opposite might also be the case –accessing a deduplicated file might be quicker than accessing a not-deduplicated file. This isbecause reading more data from the disk (accessing a non-deduplicated file) might take longerthan uncompressing the file (accessing a deduplicated file).What if there are files or folders on the CIFS that default deduplication policy hasn’t processed,but I would like them to be deduplicated? Or vise-versa: what if there are deduplicated files ordirectories that I want re-duplicated? In such a case, I can manually enable or disablededuplication on the CIFS file of folder level, by going into Advanced Attributes of the Propertiesof the file or folder, and checking/unchecking the “Compress contents to save disk space” box Figure 7:2014 EMC Proven Professional Knowledge Sharing11

Figure 7: Manually enabling compression on file/folder level on CIFSI can also easily tell which files are deduplicated because Windows Explorer marks them byapplying a different font color – blue by default.PerformanceAs I mentioned before, deduplication is a scheduled process, designed to have low impact onthe Data Mover but be a very efficient process at the same time, and as per EMC Deduplicationwhite paper1) VNX File Deduplication at the Data Mover level can: Scan up to 3.6 billion files per week at an average rate of 6,000 files per second Process 1.8TB (at 3MB/s) to 14TB (at 25MB/s) of data Use only 5 percent of CPU processing power (approximately)VNX File System ReplicatorDisaster Recovery PlanningAll of the above VNX features sound like a great addition to already efficient arrays, and whenthoughtfully implemented they make VNX more storage efficient. However, a conscientiousstorage administrator knows that data is only as good as the disaster recovery and backupsolution. After all, it doesn’t really matter if the data on the VNX file side is reduced by 72% dueto enabling VNX Deduplication and Compression, when the entire site is lost, right? No one willpat me on the back and tell me, “We lost the primary site, but hey, good job on configuringquotas, checkpoints, and enabling deduplication – that saved us a lots of storage space. Welost all of it, but still – we did it in the most space efficient way”. I think I’d have two options incase of storage system at main production site failure: I either stop answering my phone and2014 EMC Proven Professional Knowledge Sharing12

start polishing my resume, or take a deep breath and request tapes from secondary site/offsite.On the other hand, if I had VNX with either Remote Protection or Total Protection suite, myability to handle these kind of disastrous situations dramatically improves – I could easily starttransferring the CIFS and NFS responsibilities to secondary/disaster recovery site. That is if Ihad VNX Replicator configured for CIFS and NFS. Yes, some initial configuration is required,such as setting up Data Mover Interconnect over the WAN and the relationship between VNXsystems that I’m going to use for VNX replicator, and setting up the replication for each filesystem, but it really pays off. I can sleep better at night, knowing that, in case of primary VNXsystem going down, I can switch over all CIFS and NFS that I’m replicating to recovery site.Figure 8: Remote File System ReplicationConfigurationThere are three configurations for VNX Replicator: Local, Loopback, and Remote Replication.Local Replication is replication within the same VNX array, using Data Mover interconnects. Itcould be used to keep a copy of the file system on another Data Mover. Loopback Replication isalso happening on the same VNX system, but replication occurs within the same Data Mover.This replication is best for those who want to have a copy of the file system sitting on the sameData Mover. While I think there are better, more space-efficient ways to have a copy on thesame Data Mover, this is the quick way to copy a file system locally. Last, Remote Replication isthe most useful for me and my organization, from the Disaster Recovery perspective. RemoteReplication requires two storage arrays; one source and one remote array interconnect betweenlocal Data Mover and remote Data Mover and obviously file system(s) to replicate

deduplicated data, VNX would have to it, possibly affectuncompresseing response time. Thankfully, to manage deduplication more efficiently and granularly, VNX provides an entire page of settings in Unisphere software - Figure 5: Figure 5: Un