IBM Spectrum Archive

Transcription

IBM Spectrum ArchiveKhanh NgoSenior Technical Staff Member and Master InventorTape Storage Test ArchitectIBM Spectrum Archive Development

Store everywhere. Run anywhere.Remove data-related bottlenecks Challenge Managing data growth– Lowering data costs– Managing data retrieval & app support– Protecting business data Unified Scale-out Data Lake File In/Out, Object In/Out; Analytics on demand.High-performance native protocolsSingle Management PlaneCluster replication & global namespaceEnterprise storage features across file, object &HDFS Copyright IBM Corporation 2018POSIXHDFSNFSSMBSwift/S3IBM Spectrum ScaleSSDFastDiskSlowDiskTape

Agenda What is Linear Tape File System (LTFS)? What is IBM Spectrum Archive? Introduction into IBM Spectrum Archive Enterprise Edition (EE) and its features Common Use Cases HPC specific implementations Copyright IBM Corporation 2018

What is Linear Tape File System (LTFS)? Copyright IBM Corporation 2018

LTFS is the Data Format Standard File System designed for Long-Term Retention and Media Portability Award-winning technology, invented and maintained by IBMo Reference implementation available as open sourceo Hosted at GitHub Ecosystemmany more Copyright IBM Corporation 2018 Open International Standardo ISO/IEC 20919:2016o Data structure on tape» Two Partitions – Index Partition and Data Partitiono Industry Collaboration - SNIA Technical Working Group– Version 2.4 approved in 2017– Now discussing Version 2.5o Logo Program (LTFS Compatibility Testing) by LTO Consortium

Why the Data Format Matters? 3 typical use of tape storageBackup ApplicationDatabaseTapePOSIX APITAR CommandTapeLTFS tape What are requirements of Archival Storage?––––Where/how the metadata (information of tape contents) are stored?Is the tape portable across different locations or different applications?Is the metadata centralized or scattered?Can the files be accessible directly from end user application, or indirect? Copyright IBM Corporation 2018

What is IBM Spectrum Archive? Copyright IBM Corporation 2018

IBM Spectrum Archive: LTFS-based SDS software for data archive Member of IBM Spectrum Storage family Three Editions: Enterprise, Library, Single Drive Available as the standalone software or a part of IBM Spectrum Storage Suite(EE) only Copyright IBM Corporation 2018

IBM Spectrum Archive EditionsSwiftHLMLTFS Storage ManagerIBM Archive & EssenceManager (GBS asset)ResearchLTFS DMLibrary Edition (LE) 2.4Licensed SoftwareEnterprise Edition (EE) 1.3IntegratedSolution Integration of GPFS and LTFS Multi-node scale-out capability Policy based Data ManagementLicensed / Free Integrates the support of tape automation Scalable storage space by 1U TS2900 to TS4500 or DellOpenSourceSDEOnGitHubSingle Drive Edition (SDE) 2.4 Free download Support IBM LTO and Enterprise Tape TS11xx Supports Linux, Mac, and Windows Bundled with9 OEM tape drive Copyright IBM Corporation 2018FreeTape Appliance/Device from IHVStorage Toolkit

Introduction into IBM Spectrum Archive EnterpriseEdition (EE) and its features Copyright IBM Corporation 2018

IBM Spectrum Archive Enterprise Edition (EE) Persistent view of the data - tape storage under thesingle namespaceClient Applications Policy-based data placement for cold/idle data Recall data from tape on demandMultiple Protocol SupportSingle name space Integrated Tape TierCIOFinanceIBM Spectrum ScaleFlashGold PoolTier 1( )DiskSilver PoolIBM Spectrum Archive EETapeLTFS2 Tape LibrariesTier 2( )Up to 500 PB Tier 3(with TS115 5 Tape Drives)Up to 3 data replicasData Encryption with IBM SKLM server (LME)WORM tape for anti-tamperingOffline tapes to store the media in an isolatedenvironment – “air gap” for greater protection ofsensitive corporate data, or extend the storagecapacity beyond the library limit Automated Tape Validation available with TS4500 Engineering( )Linux:Orderable from AAS or PPA CopyrightIBM Corporation2018 from IBM Web siteTrialVersionavailable Export the LTFS tapes for data exchange Remove data from Scale namespace, and exporttapes for the use in other application

ESS with IBM Spectrum Archivefile and object services (NFS, SMB, )IBM Elastic Storage Server (ESS)Spectrum Scale Cluster(IB or 100/40/10GbE)SSDorDiskSSDorDiskSSDorDiskIBM Spectrum ScaleIBM Spectrum Scale(data management edition for ESS)IBM Spectrum Archive EE(client license)Tape gateway node(per-node license)FC or SAS Tape AttachmentTape Copyright IBM Corporation 2018TapeTapeTape

EE: Building Block Options for Scale OutNFSCIFSObject1. Tape Gateway Servers CPU - x86 or POWER Little EndianPer-node license2. Disk Storage3. Tape Drive and Tape Media4. Tape LibraryIBM SpectrumScale ServerApp ServerNSDsIBMSpectrum ArchiveGateway Node(s)Tape LibraryIBM Spectrum ArchiveMonitoring Node Copyright IBM Corporation 2018

Functional OverviewGlobal name spaceIBM Spectrum Scale Node 1Migration withoptional copy toother tapeIBM Spectrum Scale Node 2User file systemIBM Spectrum ArchiveNode 1IBM Spectrum ArchiveNode 2Recall withoption forbulk recallRebuild filesystemImport(only createsstubs in GPFS)Pool 2Pool 1LibraryExport withoption to keepstub in GPFSTape management: reclamation (free space) and reconcilation (synchronize) Copyright IBM Corporation 2018

IBM Spectrum Archive Update Sequencev1.2v1.2.2v1.2.4 New –E option toremoving tapes with no filereferences Support of IBM SpectrumScale Active FileManagement (AFM)Independent Writer (IW)mode Multiple tape libraryattachment (up to 2) supportto a single IBM SpectrumScale clusterData recording on WORMtape cartridges - TS1100only Expand storage capacitywith LTO7 support Performance improvementfor large-scale systems Flexibility in pool-based datamanagement includingtransparent recall retries Copyright IBM Corporation 2018 Improved performance ofadministrative commandsfor reconcile, import/export Automated the recoverprocess of write failurestapes Improved method forrecovering read failuretapes RESTful API Control node failover Monitoring dashboard TS1155 support IBM SwiftHLM supportv1.2.5/v1.2.5.1 LTO8/M8 Support Library ReplacementProcedure phase one(conversion method)1.2.6 Updates Library replacementprocedure phase two(translation method) Assisted tapetechnology upgrade forin-pool data migrationand pool-to-pool datamigration POWER Little Endianwith Linux (RHEL)version 7.4, or later New datamigratecommand fortechnology upgrade

December 2018 release Enterprise Edition (EE) 1.3.0.0– User Task Control and Reporting: Usability enhancements with new command-line interface (CLI) with additionalsupport for monitoring the progress and results of user operations, and for tape maintenanceo Active/Completed task listing including detailed information and output of commando Task results including file state transition resultso Ability to run the command in background, with –async option– Supports the Storage Networking Industry Association's LTFS format specification 2.4.– Expanded storage capacity with the TS1160 tape drive.– Supports the IBM Spectrum Scale backup function (mmbackup) for the same file system managed by IBM SpectrumArchive.– Bundles the open source package for the external monitoring of Spectrum Archive through a GUI/dashboard– Use of /dev/sgX device Copyright IBM Corporation 2018

REST API 7 GET endpoints returning json-formatted upshttp://localhost:7100/ibmsa/v1/tasks[{"id": "d9dcb712-2cc3-4a10-b6ac-bb54c520cb5d", Common GET parameters"model": "03584L22","name": "TS4500",– Pretty"serial": "0000078AA0040405"– fields}– sort Copyright IBM Corporation 2018

Dashboard/GUI IBM Spectrum Archive supports a dashboardto monitor system performance, statistics,and configuration, based on– Logstash, to collect data– Elasticsearch, to store the data– Grafana, to visualize ikis/home?lang trum%20Archive Copyright IBM Corporation 2018

IBM Spectrum Archive EE Dashboard Copyright IBM Corporation 2018 System Health ViewStorage ViewResource ViewPerformance ViewTask View22

Common Use Cases Copyright IBM Corporation 2018

Active ArchiveProduction/Archival DataGlobal NamespaceHybrid StorageFlash/DiskTape HOTData Copyright IBM Corporation 2018WARMDataCOLDDataResiliency Up to 3 copies on tape Up to 2 librariesData Collocation Global Pool or Separated Poolsper Application and/or UserRecall Options Transparent Recall Bulk Recall/Prefetch by command

IBM Spectrum Archive EE managing continuous data growthUsed capacity as percentage of totalstorage capacityAutomatic Migration with Thresholds[high threshold 80%, low threshold 60%]200180160140Cold Data120100high threshold806040low thresholdHot Data200 Copyright IBM Corporation 2018TIMETotal StorageOnline StorageAutomatic migration to reduceonline storage space

Operational StorageProduction DataDisk CacheArchival DataMigrate all to tapeNFS, SMB, IBM Spectrum Scaleor external Filer Copyright IBM Corporation 2018 Landing Zone for archive/retrievalFolder per Application/UserWith or without quotaMay use Immutability Flag WORM or Non-WORM TapesUp to 3 copiesUp to 2 librariesGlobal pool or Pool by App/User

HPC specific implementations Copyright IBM Corporation 2018

Archive of Genomics DataResearchersFastQ 150GB(compressed) Copyright IBM Corporation 2018VCFBAM100 - 200 MB 150GBRecallMigrationbiopsysequenceanalyzeSET A SET B

Repository area for long-term archive of important files Shared storage area across allHPC systems for backup ofhome directories and user’sFS1data for long term archiveFS2HomeScratch/WorkFS3 Files are initially backed upusing IBM Spectrum Protect Then files are migrated to tapeafter certain conditions are metsuch as older than 3 monthsand/or larger than 50-100MBScratch/WorkHomeShared StorageTapeBackup Copyright IBM Corporation 2018HomeScratch/Work

Customer EnvironmentInternal backupIBM Spectrum Scale Nodes Fileset to host backupfiles from 7 differentlarge file systems withbigger size backupimages on tape Archive 3rd copy ofhigh priority data fromanother replicated IBMSpectrum Scalesystems (ingestsmultiple PBs overhundreds of millions offiles) Copyright IBM Corporation 2018Customer suppliedx86 servers and SANIB SwitchesFC SwitchesIBM SpectrumArchiveSoftwareIBM Spectrum Scale Elastic Storage ServerGL6S8 TS1155 tape drivesIBM TS4500 tapelibrary

IBM Spectrum Archive Features Lower TCO by leveraging cost effective tape storage Seamless data access in continuous name space Automated, policy based movement from disk to tape Tape optimized recall to accelerate retrieves Standardized LTFS format facilitates data exchange Support for transparent tape encryption Data protection through multiple copies on tape Support for immutable files on WORM tapes Two site replication by stretch cluster or AFM IW Media export/import for data sharing and/or offsite storage Media health check with TS4500 Easy administration and management Copyright IBM Corporation 2018

Questions? Copyright IBM Corporation 2018

IBM Spectrum Archive Update Sequence v1.2 Multiple tape library attachment (up to 2) support to a single IBM Spectrum Scale cluster Data recording on WORM tape cartridges - TS1100 only Expand storage capacity with LTO7 support Performance improvement for large-scale systems Flexibility in pool-based data management including