Clustered File Systems: No Limits - Storage Networking Industry Association

Transcription

Clustered File Systems: No LimitsJames Coomer, DDNJerry Lotto, MellanoxJohn Kim, SNIA-ESF Chair, MellanoxOctober 25, 2016

SNIA Legal Notice!!The material contained in this presentation is copyrighted by the SNIA unless otherwisenoted.Member companies and individual members may use this material in presentations andliterature under the following conditions:!!!!!Any slide or slides used must be reproduced in their entirety without modificationThe SNIA must be acknowledged as the source of any material used in the body of any document containing materialfrom these presentations.This presentation is a project of the SNIA.Neither the author nor the presenter is an attorney and nothing in this presentation is intendedto be, or should be construed as legal advice or an opinion of counsel. If you need legaladvice or a legal opinion please contact your attorney.The information presented herein represents the author's personal opinion and currentunderstanding of the relevant issues involved. The author, the presenter, and the SNIA do notassume any responsibility or liability for damages arising out of any reliance on or use of thisinformation.NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.

About SNIA3

Today’s PresentersJames CoomerTechnical DirectorDDNJerry LottoDirector, HPCMellanoxJohn KimSNIA-ESF ChairMellanoxAlex McDonaldSNIA-ESF Vice ChairNetApp4

Agenda! SNIA Background! Overview!!When should I consider a Clustered File System?What are Clustered File Systems?!!!Scale-out NAS and Parallel file systemsWhat choices are there and how do I choose?What sort of performance can I expect?5

Who Needs Clustered File Systems?! Traditional storage is block (SAN), file (NAS), or object!!!SAN: High performance but no data sharingNAS: Shared access but limited scalabilityObject: Scalable but often slow, no file writes/edits! What if you need it all?!!!More performance than one storage system can offerMany hosts share read/write access to the same dataExpand performance and capacity simultaneously6

What’s the Difference?! Clustered File Systems: multiple servers sharing IO load! Distributed File Systems: no shared back-end storage! Parallel File Systems: native, intelligent client! All can distribute data across multiple systems!!!Allow multiple clients to access data in parallelScale up to petabytes per clusterSupport high bandwidth7

Clustered File System TypesDistributed/Clustered File SystemsSANbasedScale-OutNASParallelFile systemsclusteredNFS/CIFS exportsscalability (size)Object StoresProtocol exports8

Scale-Out NAS vs. Parallel FileSystems!!Both feature high availability and shared namespace!Access via special clients or NAS protocolsDiffer in how much file system responsibility shared with the clients!!Scale-Out NAS: clients are relatively simple, NFS/CIFS someoptimization. Client-side setup is easy. All intelligence and scalingchallenges must be handled in the serversParallel File Systems: clients software must be installed on all clientswhich need high performance. More complex, but very largeperformance and scalability gains. The intelligence and scalingchallenges are shared with the clients.9

Parallel File System vs Scale-out NASclientsclientsParallel File SystemIntelligence spans all membersScale Out NASIntelligence primarily in ServersParallel File systemScale Out NAS

Parallel File System vs Scale-out NASclientsclientsParallel File SystemIntelligence spans all ndserverScale Out NASIntelligence primarily in ServersSingleClientIOtooneserveronlyParallel FilesystemScale Out NAS

Parallel File System vs Scale-out NASclientsclientsParallel File SystemIntelligence spans all entsretrievedatadirectlyfromwhereitresidesScale Out NASParallel FilesystemIntelligence primarily in ServersSingleClientIOtooneserveronlyServer- ‐sidedatamovementforeachtransac9onScale Out NAS

Parallel File System vs Scale-out NASclientsclientsParallel File System10/40/50/100Gb EthernetOmniPathInfiniBandIntelligence spans all le Out NASParallel FilesystemIntelligence primarily in ServersSingleClientIOtooneserveronlyServer- tworkIOScale-Out NAS

So what are my choices?! Parallel File Systems Include:!!!Lustre (Intel), Spectrum Scale/GPFS (IBM)BeeGFS (Fraunhofer), OrangeFS/PVFSOthers: StorNext, HDFS, MooseFS, Gluster, Ceph, etc.! Differences in data distribution, metadata, clients,licensing/cost, sharing/locks, data protection, etc.! We shall concentrate on the most widely-deployedfilesystems today: Lustre and Spectrum Scale

Lustre and Spectrum Scale! Both are benefitting from strong recentdevelopment efforts:!!Spectrum Scale: Active File Management, HighAvailability Write Cache, Local Read Only Cache,Encryption, GPFS Native RAIDLustre: QoS, JobStats, Security! Lustre development primarily at Intel, butsignificant features are developed by the widercommunity and other vendors15

Super-High Level ComparisonBoth Offer High Out-of-the-box Performance! Lustre!!!Optimized for large-scaleperformanceflexible per-file/dir/fsstriping policiesStrong QoS available! Spectrum Scale!!!!Optimized for small/medium-scaleperformanceMature SnapshotMulti-protocol supportData Policy ManagementBuilt-in16

What about NFS/SMB?! GPFS supports native Linux and Windows clients! Lustre only supports native Linux clients! Both support the ability to export the filesystem viaclustered NFS/SMB!!Combine extreme performance with native clients AND a rangeof other clients with NFS/SMBGPFS can deliver an extremely strong scalable NAS17

Multi-protocol Options! In addition to native clients, both Lustre and GPFSsupport protocol gateways! GPFS introduced a new protocol abstraction layer (CES)recently that supports any/all of object, SMB and NFSsimultaneously. Runs on dedicated protocol nodes.! Lustre supports re-exporting the filesystem with clientsacting as Samba (SMB) and/or NFS/pNFS servers! Performance typically less than native client access18

Performance! Both Lustre and GPFS can reach similar throughput withthe same hardware.!Pushing the bandwidth limits of the underlying storage devices! But the devil is in the details:!!!!IOPs requirementsMetadata performanceFile system block-size choicesApplication IO characteristics (small files, mmap, directIO)19

Parallel Filesystem PerformanceFilesystem Throughput (GB/s): 400 NL-SAS drives Like-for-like comparison on DDNGS14K EDRSimilar top-end throughputGPFS performance dependent onchoice of data allocation method20

Optimizing Clustered Filesystems! “I have never met a file system or data-intensiveworkload that didn't respond well to tuning"! Two key considerations!Architecture!!Choice and design of the file system and interconnect – both are important!Tuning!!Parallel clustered filesystems have extensive tuning optionsSpectrum Scale 4.2, for example, has more than 700 tuning parameters!– Only dozens are focused on end user tuning J 21

Architecture! Hardware - storage and server choices!Technology and Capacity – data and metadata!!!Disk aggregation!!!Performance-limiting metricFlash and archive tiers, from RAM to tape!hardware RAID, ZFS, GNR (GPFS Native RAID)Underlying filesystem(s)Scaling capacity and performance – today and tomorrow?!Building block approach22

Architecture - continued! Interconnect choices!Disk to servers!Internal vs External– shared access (HA)– share-nothing (ie: FPO)!!PCIe/NVMEoF, SAS, FC, InfiniBand, EthernetServers to clients!!InfiniBand, Ethernet, etcRDMA supported by both Lustre and Spectrum Scale23

Tuning! Configuration files and parameters!!!Operating system kernel tuningSpectrum Scale – mmlsconfig, mmchconfigLustre – /proc/fs/lustre and multiple configuration files! Memory!!!cache (L2) – storage controllers, server AND clientother uses (daemons, tables, kernel, policy engines)controlled (directly and indirectly) by many parameters24

Tuning - continued! Flash cache (L3) – write and/or read!!Integral vs. policy-driven placement and migrationSub-LUN, underlying file system level (L2ARC), AFM! File system blocksize(s) – sub-blocks too! Communications protocols and fabrics!!TCP/IP required, RDMA optional but important for performanceProtocol overhead and gateway architectures25

The Three “M”s! Monitoring, Management, Maintenance!Monitoring and Management!!Lustre – IEEL GUISpectrum Scale GUI – Originally GSS, now all– Excellent – health and performance monitoring– Good – code management, troubleshooting– “In development”– Deployment, configuration, tuning!!Ganglia, Nagios and Splunk pluginsDaemons and command line26

The Three “M”s! Monitoring, Management, Maintenance!Maintenance!!!!Lustre available as open sourceDifferent cost models for supportSimilar long-term costs for file system supportClient vs Server upgrades– Rolling vs. Downtime– Prerequisites and dependencies27

More Information! Spectrum er!Search for “Spectrum Scale”! oftware.html! Vendor partners – leveraging expertise and service!For example: DDN, IBM, Intel, Seagate, etc.28

After This Webcast!!!!!!Please rate this Webcast and provide us with feedbackThis Webcast and a PDF of the slides will be posted to the SNIAEthernet Storage Forum (ESF) website and available sA full Q&A from this webcast, including answers to questions wecouldn't get to today, will be posted to the SNIA-ESF blog:sniaesfblog.orgFollow us on Twitter @SNIAESFNeed help with all these terms? Download the 2016 SNIA 29

Thank You30

Lustre (Intel), Spectrum Scale/GPFS (IBM) ! BeeGFS (Fraunhofer), OrangeFS/PVFS ! Others: StorNext, HDFS, MooseFS, Gluster, Ceph, etc. ! Differences in data distribution, metadata, clients, licensing/cost, sharing/locks, data protection, etc. ! We shall concentrate on the most widely-deployed filesystems today: Lustre and Spectrum Scale