2258 An Alternative To NDMP For Network-Attached Storage

Transcription

2258An Alternative to NDMP forNetwork-Attached StorageBackupJoseph King – CTO, CAS SevernLars Henningsen – CTO, General Storage 2016 IBM Corporation#ibmedge

PresentersJoseph KingLars HenningsenCTO, VP Presales and Technical ServicesCAS Severnjking@cassevern.com443-668-0888CTOGeneral Storage Software GmbHlars.henningsen@general-storage.com 49 151 67 31 30 13#ibmedge2

Agenda History of NDMP Challenges with NDMP MAGS – An Alternative to NDMP3

History of NDMP Co-Developed by PDC/Legato and Network Appliance in 1995 First Specification Submitted – October 1996 NDMP v4 – Approximately 2000/2001 Allowed Proprietary Extensions NDMP v5 – Can’t find much past 2003-2006 Mostly Extensions (Security, DataMovers, etc) Storage Networking Industry Association Manages Specification4

History of NDMP – Data Movement Started with Direct Attached Tape Progressed to 3 way Backup Backup Server Received Data or Redirected Data Waning use of Tape Libraries – Enter the VTL Rise of VTL with Deduplication5

Questions How often do you do full backups? How long does it take? Have you ever done a restore? Have you ever done a restore of an entire system? Have you given up on NAS backup?6

Challenges with NDMP

File server backupHow are really big file servers backed up? (in ascending order of popularity)4.) NDMP (Slow. Doesn’t scale well. Requires regular full backups.)3.) SnapMirror to Tape (NetApp only. Mostly faster than NDMP but still requires fullbackups.)2.) SnapDiff (NetApp only. Needs new baseline from time to time. Errors difficult tosort out. Doesn’t help with restore. Similar situation with all forms of journal basedbackup on other platforms.)1.) Not at all.*) Combinations of various kinds of mirroring and snapshot technologies often substitute what is usually consideredto be backup. Almost all these methods are proprietary (can’t be simply reused after migrating to another file servertechnology), require disk for all data, get very expensive very fast (or even unusable) when more historic data has tobe kept etc.

File server backupIn an ideal world, you could simply use your backup tool (well, TSMsince it’s still the only one really doing “incremental forever”) and itsexisting infrastructure and operational integration to backup file serversof any size with any number of objects. Just like you do with any otherfile system in your environment.You wouldn’t have to worry when migrating your file services fromNetApp to EMC to IBM to Microsoft to xyz and back again becausebackup method and existing backup data stay the same (i.e. “as seen bythe user” and not “as seen by a specific file server”)

File server incremental backup scenario with TSMdsmc incr \\myfileserver\mytopshareTSM ServerTSM ClientNASFile ServerDBFile System

File server incremental backup scenario with TSMTSM client/file server/file systemLooks up directory and fileinformation to compare with metadata received from the server.Reads changed/new data (if/whenit finds any). All of that one by one.TSM client/server/DBMetadata gets sent to the clientfor comparison. Client sendschanged/new data (if/when itfinds any). All in bulk and ratherefficient – to a point.TSM ServerTSM ClientFile ServerDBFile System

File server incremental backup scenario with TSMLets say the process of lookingup an object in the file systemand deciding whether or not tobackup it up takes 2 ms onaverage .TSM ServerTSM ClientFile ServerDBFile System

File server incremental backup scenario with TSM which means scanning 500 objects/secondwhich means just 1,800,000 in an hourwhich means just 43,200,000 in a full daywhich means just 302,400,000 in a weekwhich means most file servers simply cannot be backed up in thisway. Too many objects – not enough time.

MAGS – An Alternative to NDMP

MAGSThe major challenge youface when backing up fileservers incrementally islatencyTSM ServerTSM ClientFile ServerDBFile System

MAGSYou could try to bring latency down a bit (use infiniband,keep metadata on SSD or in RAM, use faster disks, fasterCPUs, faster everything) but that wouldn’t really help.Speed it up by a factor of two (ambitious) and you stillend up with something probably orders of magnitude tooslow.TSM ServerTSM ClientFile ServerDBFile System

MAGSHowever - using twothreads rather than justone practically achievesthe same as cuttinglatency in half.TSM ServerTSM ClientFile ServerDBFile System

MAGSUsing four threads basicallyequals the effect of cuttinglatency to 25% etc.TSM ServerTSM ClientFile ServerDBFile System

MAGSSo the solution is running many “incrementals” in parallel, which is what MAGS doesautomatically MAGS is a program which runs on the same machine as your (Windows) TSM client Splits a file system into hundreds or thousands of more or less equal chunks and scansthem in as many parallel streams (TSM client runs) as you have licensed (20 streams perlicense package) Makes sure there is no overlapping and nothing left out Works as a single, scheduler-friendly job with a beginning and an end and a single returncode Does not interfere with data at all. Only the regular TSM client handles files anddirectories. MAGS merely points the client at the right directories at the right time.

MAGSSo instead of this TSM ServerTSM ClientFile ServerDBFile System

MAGS you get thisMAGSTSM ServerTSM ClientFile ServerDBFile System

MAGSDownload and install MAGS on the windows machine running your TSM clientLog on to the MAGS web interface and configure which file servers and whichshares to back upRun or schedule MAGS

Deployments Chemical / Germany\weird-application\\00\01\03\.60,000,000 filesMAGSTSM Servers

Deployments Financial / ItalyMAGS110,000,000 filesTSM Servers

Deployments Automotive / GermanyMAGS210,000,000 filesTSM Servers

Deployments Automotive / GermanyMAGS20Isilon XS nodes400,000,000 Files10SIsilon NL nodes10Isilon HD nodesGSCC Cluster (IP-onlydsmISI/parallelinternalaccessFlash /LinuxTSM Servers)MAGS24Isilon XSnodes650,000,000 Files10SIsilon NL nodes10Isilon HD nodes

Deployments Automotive / GermanyMAGSSSMAGSSS

MAGS FAQSo how fast is it?It depends, of course, but in most cases it scales almost linearly with the number of streams– so 20 streams is practically 20 times as fast as the same TSM client without MAGS. Fromexperience, the overhead for establishing new sessions with the TSM server has a negativeimpact with “smaller” file systems. So if you only have 30 million files and backup withoutMAGS takes 20 hours, you may expect two or three hours rather than just one. Rule ofthumb: the bigger the file system, the more of a linear scalability you can expect.

MAGS FAQWhat about compatibility?As already mentioned, MAGS doesn’t handle the data as such. From a TSM client and serverperspective, data look exactly like they would if you had backed up without MAGS. You canback up incrementally without MAGS based on MAGS backups, backup incrementally withMAGS on the basis of backups originally done without MAGS, restore data backed up withMAGS without using MAGS for the restore and (especially useful) restore data with MAGSwhich you haven’t backed up with MAGS.

MAGS FAQWait a minute . you said “restore”?Yes. Restoring with MAGS is as much of an accelerator as backing up with MAGS. Even ifdata come from tape (if you have more than one drive and more than one cartridge holdingthe data you want to restore), it is usually a lot faster than it would be without MAGS.Record so far is restoring half a petabyte (250 million files) of NetApp data to an Isilon in lessthan 6 days (about 1 GB/s).

MAGS FAQWhat about options x, y and z? Are they supported?Yes. Every TSM client session started by MAGS is using the options you specified in your dsmoption file, cloptset, include-exclude-list etc.

MAGS FAQAre snapshots supported?Yes. Backing up from snapshots is possible without compromising a name space. Thatsnapshot can reside in the original file server or in a synchronized, secondary file server. Tospeed things up even further, MAGS can spread the load across multiple file servers if theyhold copies of the same snapshot. With Isilons it can, on top of that, spread the load evenlyacross all nodes of one or even two clusters. Individual latencies are taken intoconsideration – so every additional source of data makes the entire process faster – even ifthat additional source is slower than other sources.

MAGS FAQSpeaking of loads doesn’t the file server break down during backup?No. Most file servers have no problem with hundreds of clients browsing through metadata. They’re usually optimized for handling many, many requests at the same time ratherthan trying to speed up a single, big one (which is why they don’t perform well with a singlethreaded TSM backup).The first, full backup is a different matter and may require some caution.

MAGS FAQWhat about our TSM server? Will that suffer?There will certainly be more load than for the same job without MAGS but for a shortertime. Keep an eye on stuff like maxnummp, number of mountpoints in device classes,maxsession etc.On the other hand, issues you may have with locking are usually a lot less persistentbecause a typical MAGS session only lasts minutes rather than days.

MAGS FAQHow many client machines do I need and how do I size them?Probably fewer than you may think. During a regular backup, most of the time is wasted on waiting forthe NAS server(s) to respond. With MAGS, you can actually use all the resources you have. Typically,you can calculate somewhere between 4 and 12 streams per Xeon core – depending on latency. 2,000files per second per core means 48,000 files per second for a 24 core machine which means up to 170million files per hour. That is more than most file servers can provide – so your file server is more likelyto be the limiting factor than the machine doing the backup. If in doubt (every environment isdifferent), use the free trial period to test it.In terms of RAM, you’ll need a lot less than for a regular backup. Normally, the client requests metadata about big parts of a file systems which then build up as a big chunk waiting to be workedthrough. With MAGS, there are smaller chunks which disappear as soon as the corresponding, smallerjob finished. There is a RAM limit setting in MAGS which prevents swapping if it gets too crammed. 64– 128 GB overall should work nicely for most users.

MAGS FAQWhat about NFS?Not yet. MAGS currently supports only CIFS/Windows. NFS V3 and V4 support is comingwith MAGS V1.2 which will be available in late Q3 or early Q4/2016. It requires ssh access toa Linux machine (any distribution supported by TSM) in addition to the Windows machinerunning MAGS. In mixed environments, NFS will require a separate TSM node name (so youcan start today with backing up all CIFS data via MAGS and your NFS data via a regular TSMclient which can then be controlled by MAGS once the functionality becomes available).

MAGS FAQNobody really understands TSM PVU licensing. How does backingup a file server effect that?Neither do we. From our understanding, you’ll have to license PVUs for the file server - NOTfor the TSM client machine actually running the backup – unless you also backup parts ofthat client machine’s disks.Consider if Front-End licensing makes better sense.

MAGS FAQWhere can I get MAGS?You can try to write down this uren/backup disaster recovery/tsm-isilon/dsmisi-mags/or just google “tsm mags download”

MAGS WrapupThere is an alternative to NDMPEliminates the occasional full backup headacheNAS backups can finish on timeWe use proxies for VMware backup, why not use it for NASThank you.

References http://www.snia.org/sites/default/files/technical work/NDMP/NDMP%20White%20Paper.pdf s https://en.wikipedia.org/wiki/NDMP 1/ndmp overview.pdf

backup method and existing backup data stay the same (i.e. “as seen by . Isilon X nodes 400,000,000 Files 24 Isilon X nodes 650,000,000 Files 10 Isilon NL nodes 10 Isilon NL nodes S S S S 10 . Yes. Restoring with MAGS is as much of an accelerator as backing up with MAGS. Even if