NVMe : What You Need To Know For Next Year - NVM Express

Transcription

Architected for PerformanceNVMe : What you need to know for next yearSponsored by NVM Express organization, the owner of NVMe , NVMe-oF and NVMe-MI standards

SpeakersJanene Ellefson@jamminjaneneDavid AllenJ Metz@drjmetz2

NVMe AgendaIntro & 2 day AgendaMarket OutlookNVMe RoadmapNVMe-oF Q&A3

NVMe-202-1NVMe-201-1NVMe-102-1NVMe-101-1NVM Express Sponsored Track for Flash Memory Summit 2018TrackTitleSpeakers8/7/188:30-9:35NVM Express: NVM Express roadmaps and market data for NVMe, NVMeoF, and NVMe-MI - what you need to know for the next year.Janene Ellefson, MicronJ Metz, CiscoAmber Huffman, IntelDavid Allen, Seagate8/7/189:45-10:50NVMe architectures for in Hyperscale Data Centers, Enterprise DataCenters, and in the Client and Laptop space.Janene Ellefson, MicronChris Peterson, FacebookChander Chadha, ToshibaJonmichael Hands, Intel3:40-4:458/7/18NVMe Drivers and Software: This session will cover the software anddrivers required for NVMe-MI, NVMe, NVMe-oF and support from the topoperating systems.Uma Parepalli, CaviumAustin Bolen, Dell EMCMyron Loewen, IntelLee Prewitt, MicrosoftSuds Jain, VMwareDavid Minturn, IntelJames Harris, Intel4:55-6:008/7/18NVMe-oF Transports: We will cover for NVMe over Fibre Channel, NVMeover RDMA, and NVMe over TCP.Brandon Hoff, EmulexFazil Osman, BroadcomJ Metz, CiscoCurt Beckmann, BrocadePraveen Midha, Marvell8/8/188:30-9:35NVMe-oF Enterprise Arrays: NVMe-oF and NVMe is improving theperformance of classic storage arrays, a multi-billion dollar market.Brandon Hoff, EmulexMichael Peppers, NetAppClod Barrera, IBMFred Night, NetAppBrent Yardley, IBM8/8/189:45-10:50NVMe-oF Appliances: We will discuss solutions that deliver highperformance and low-latency NVMe storage to automated orchestrationmanaged clouds.Jeremy Werner, ToshibaManoj Wadekar, eBayKamal Hyder, ToshibaNishant Lodha, MarvellYaniv Romem, CTO,Excelero8/8/183:20-4:25NVMe-oF JBOFs: Replacing DAS storage with Composable Infrastructure(disaggregated storage), based on JBOFs as the storage target.Bryan Cowger,Kazan NetworksPraveen Midha, MarvellFazil Osman, Broadcom8/8/184:40-6:45Testing and Interoperability: This session will cover testing forConformance, Interoperability, Resilience/error injection testing to ensureinteroperable solutions base on NVM Express solutions.Brandon Hoff, EmulexTim Sheehan, IOLMark Jones, FCIAJason Rusch, ViaviNick Kriczky, Teledyne4

Follow NVMe nvmexpress.org5

About NVM Express NVM Express (NVMe ) is an open collection of standards and information to fully expose thebenefits of non-volatile memory in all types of computing environments from mobile to data center. NVMe is designed from the ground up to deliver high bandwidth and low latency storage access forcurrent and future NVM technologies.NVM Express Base SpecificationThe register interface and command set for PCI Express attached storage with industry standard software available for numerousoperating systems. NVMe is widely considered the defacto industry standard for PCIe SSDs.NVM Express Management Interface (NVMe-MI ) SpecificationThe command set and architecture for out of band management of NVM Express storage (i.e., discovering, monitoring, and updatingNVMe devices using a BMC).NVM Express Over Fabrics (NVMe-oF ) SpecificationThe extension to NVM Express that enables tunneling the NVM Express command set over additional transports beyond PCIe. NVMeover Fabrics extends the benefits of efficient storage architecture at scale in the world’s largest data centers by allowing the sameprotocol to extend over various networked interfaces.6

NVMe Market Landscape7

SSD Share of Revenue ( M)100%90%80%70%60%50%SSD Share of Units Tam 9SATASource: Micron2020SAS20212022PCIe82023

NVMe Feature Roadmap2014Q2Q32015Q4Q1Q2NVMe MIQ32016Q4NVMe 1.2 – Nov ‘14 Q1Q2Q32017Q4Q1NVMe 1.2.1 May’16Namespace ManagementController Memory BufferHost Memory BufferLive Firmware UpdateQ2Q3NVMe-oF 1.0 May’16 Transport and protocolRDMA bindingNVMe-MI 1.0 Nov’15Out-of-band managementDevice discoveryHealth & temp monitoringFirmware UpdateReleased NVMe specification2018Q4Q1Q2Q32019Q4NVMe 1.3 Sanitize Streams VirtualizationNVMe oFabricNVMe BaseQ1Q1Q2Q3Q4NVMe 1.4* IO Determinism Persistent Memory Region MultipathingNVMe-oF -1.1* Enhanced Discovery TCP Transport BindingNVMe-MI 1.1 SES Based Enclosure Management NVMe-MI In-band Storage Device EnhancementsPlanned release* Subject to change9

NVMe 1.4*Ever-Advancing Performance and Features I/O Determinism Persistent memory Region MultipathingData latency Improvement: I/O Determinism (IOD)High Performance Non-Volatile data needs Improvement: Persistent Memory RegionEase of Data sharing Improvements: Multi-Pathing access10

Management NeedsNVMe-MI 1.1 SES Based Enclosure Management NVMe-MI In-band Storage Device EnhancementsStandardized Management for ease of adoption Industry standard tools and complianceImprovements and updates to managing the subsystems and end devices Event logging Incorporating robust industry adopted enclosure management Diverse connections to end devices (SSDs) Additional In-band mechanisms11

Enterprise Networking NeedsNVMe-oF -1.1* Enhanced Discovery TCP Transport Binding Robustness in networking topologies Congestion Management New and interesting transport capabilities TCP bindings for NVMe-oF Improvements in automation Discovery Security Enhancements In-band authentication12

Architected for PerformanceNVMe 1.4Projected completion: 2019

What is NVMe I/O Determinism? Service isolation regionIncrease Read I/OPs and reduce max latencyProvides strict QoS profileSignificantly improves P99 and P9999 for a well-behaved hostNo I/O DeterminismWith I/O DeterminismWorkload AWorkload BWorkload CWorkload A1TBWorkload B1TBWorkload C1TBWorkload D1TB4TBWorkload D14

Persistent Memory Region (PMR)Controller Memory Buffer (CMB) Introduced in NVMe 1.2NVMeControllerMemory SpaceController Memory Buffer(CMB)Memory SpacePersistent Memory Region(PMR) PCI memory space exposed to host May be used to store commands and command data Contents do not persist across power cycles and resetsPersistent Memory Region (PMR) PCI memory space exposed to host May be used to store command data Content persist across power cycles and resets15

NVMe Multipathing and Namespace SharingTechnical Term: Asymmetric Namespace Access (ANA)NVMe Multipathing I/O refers to two ormore completely independent PCIExpress paths between a single host anda namespaceNamespace sharing enables two or morehosts to access a common sharednamespace using different NVM ExpresscontrollersHostPort APort BNVMe Controller 1NVMe Controller 3NVMe Controller 4NVMe Controller 1NVMe Controller 2NVMe Controller 3NVMe Controller 4NSD1NSD1NSD1NSD1NSD1NSD1NSD1NamespaceNVMe MultipathingPort CHost AHost BHost CNamespaceNamespace SharingBoth multi-path I/O and namespace sharing require that theNVM subsystem contain two or more controllers16

NVMe 1.4 Well Underway2017Q1Q2Q32018Q4Q1Q22019Q3Q4NVMe 1.3 May’17 Device Self TestSet TimestampBoot PartitionsSanitizeError Log UpdatesGlobally Unique NGUID/EUI64SGL Dword SimplifyStreams DirectiveDevice TelemetryVirtualizationHost Controlled Thermal MgmtNVMe-MI TunnelingGrab Bag (incl. Strict Mode)Para-virtualized Dev SupportQ1Q2Q3NVMe 1.4 Development StatusQ4NVMe 1.4* Persistent Memory RegionHMB EnhancementsIO DeterminismANA Base ProtocolNamespace Write ProtectTransport SGL DescriptorNVM Sets & Read Recovery LevelsTransport Error Codes8 TPsRatifiedAlready141068Phase 1Phase 2Phase 3RatifiedRatified TPs availablepublically Released NVMe specificationPlanned release* Subject to change 17

Architected for PerformanceNVMe Management Interface (NVMe-MI ) 1.1Projected completion: Early 2018

NVMe-MI 1.1 Key Work ItemsNVMe-MI 1.1 SES Based Enclosure Management NVMe-MI In-band Storage Device Enhancements SCSI Enclosure Services (SES) Based Enclosure Management Draft completed, NVMe-MI working through final technical items Comprehensive enclosure management Support for In-Band NVMe-MI Draft complete and in workgroup review NVMe Storage Device Enhancement – In work Native PCIe Enclosure Management (NPEM) Transport specific basic enclosure management Approved by PCI-SIG on August 10, 201719

NVMe-MI Out-of-Band Management Out-of-Band Management – Management that operates with hardware andcomponents that are independent of the operation system control NVMe Out-of-Band ManagementInterfacesHost ProcessorManagement Controller (BMC)Host Operating SystemBMC Operating SystemApplicationApplicationBMC Operating SystemNVMe-MI Driver SMBus/I2C PCIe Vendor Defined Messages (VDM) IPMI FRU Data (VPD) accessedover SMBus/I2CNVMe DriverPCIe RootPortPCIeBusPCIe RootPortPCIe PortSMBus/I2CPCIe VDMPCIe BusPCIe PortSMBus/I2CSMBus/I2CNVMe NVM Subsystem20

In-Band Management and NVMe-MI Host ProcessorManagement Controller (BMC)Host Operating SystemBMC Operating SystemApplicationApplicationBMC Operating SystemNVMe-MI DriverNVMe DriverPCIe RootPortPCIeBusPCIe RootPortPCIe PortPCIe VDMPCIe BusPCIe PortSMBus/I2CSMBus/I2CSMBus/I2CNVMe NVM Subsystem In-band mechanism allows application totunnel NVMe-MI commands throughNVMe driver Two new NVMe Admin commands––NVMe-MI SendNVMe-MI Receive Benefits Provides management capabilitiesnot available in-band via NVMe commands– Efficient NVM subsystem healthstatus reporting– Ability to manage NVMe at a FRUlevel– Vital Product Data (VPD) access– Enclosure management21

NVMe-oF 22

NVMe /TCPTitle: TP-8000 NVMe-oF TCP Transport BindingAbstract:Provides extensions for defining a NVMe transport binding (“Fabrics”) for non-RDMA“vanilla” networksStatus: Phase 3232

NVMe /TCPMaintains NVMe model: sub-systems,controllers namespaces, admin queues, dataqueuesiWARPTCPRoCEController Side TransportAbstraction242Next Gen FabricsIndependently scale storage & compute tomaximize resource utilization and optimize forspecific workload requirementsInfiniBand*Enables disaggregation of NVMe SSDswithout compromising latency and withoutrequiring changes to networking infrastructureHost Side Transport AbstractionFibre ChannelNVMe block storage protocol over standardTCP/IP transportNVMe Host Software

NVMe /TCP in a NutshellNVMe-oF commands sent overstandard TCP/IPsocketsEach NVMe queue pairmapped to a TCPconnectionTCP provides a reliabletransport layer forNVMe queueing model252

NVMe /TCP Data Path UsageEnables NVMe-oF I/O operations in existing IPDatacenter environments Software-only NVMe Host Driver with NVMe-TCPtransport Provides an NVMe-oF alternative to iSCSI for StorageSystems with PCIe NVMe SSDsMore efficient End-to-End NVMe Operations by eliminatingSCSI to NVMe translationsCo-exists with other NVMe-oF transportsTransport selection may be based on h/w support and/or policy262

NVMe /TCP Control Path UsageEnables use of NVMe-oF on ControlPath Networks (example: 1g Ethernet)Discovery Service UsageDiscovery controllers residing on acommon control network that isseparate from data-path networksNVMe-MI UsageNVMe-MI endpoints on controlprocessors (BMC, .) with simple IPnetwork stacksNVMe-MI on separate controlnetwork(1g Ethernet)Source: Dave Minturn (Intel)272

NVMe /TCP StandardizationExpect NVMe over TCP standard to be ratified in2H 2018282

DiscoveryA host connects to a DISCOVERY controller to findout what NVMe stuff is “out there” The discovery controller has a list of availabledevices (available NVMe subsystems, NVMeports) The host can then connect to the things it hasdiscovered and find namespaces to access One discovery service can point to otherdiscovery services (nesting)The “root” of discovery must be manually configuredA discovery service can’t tell a host if somethingchanges Like if a new device shows up; or If a new port shows up; or If a completely new discovery service shows upSpecial Thanks: Fred Knight, NetApp29

Enhanced Discovery How do I connect storage consumersto storage suppliers? Specification enhancement for efficient, dynamicresource management Fabric-transport specific mechanisms to determinewhere to get provisioning information from Allows the fabric to tell hosts when something changes Allows hosts to perform dynamic discovery of new stuff; or Adapt to removal of stuff from the NVMe-oFTMenvironment Dynamically find new paths; or know when old paths goaway; Now can be done over RDMA and TCP as well as FCSpecial Thanks to Phil Cayton, IntelSpecial Thanks: Fred Knight, NetApp303

Issues with NVMe-oF Discovery and ManagementThe current NVMe-oF specification and Linux implementation lacks: Dynamic resource discovery and enumeration of remote resourcesClear definition for methods of how to discover the proper discovery controllerdefining remote storage resource provisioningTo support large-scale deployment of NVMe-oF, more is needed Specification enhancement for efficient, dynamic resource managementFabric-transport specific mechanisms to determine where to get provisioninginformation fromLinux kernel driver stack changes as the specification evolvesManagement tools to enable NVMe-oF management and scale-outFinding the discovery root is still missing (manually configured)Discovery is still very weak on multiple fabric installations (no FABRIC ID in thediscovery service, so while you have a name and a port, you don’t know whichfabric to use to connect to it – IF you happen to be connected to multiple fabrics)Discovery is also still just discovery – NOT about management of theconfiguration or provisioning of anythingSpecial Thanks: Fred Knight, NetApp31

Summary - The Future of NVMe NVMe 1.4 IO Determinism Persistent Controller Mem Buffer and Event Log Multipathing (ANA)NVMe-MI 1.1 SCSI Enclosure Services (SES) NVMe-MI In-band Native Enclosure ManagementNVMe-oF 1.1 Enhanced Discovery TCP Transport Binding32

NVMe-202-1NVMe-201-1NVMe-102-1NVMe-101-1Track Plan for FMSTrackTitleChairSpeakers (Proposed)8/7/188:30-9:35NVM Express: NVM Express roadmaps and market data for NVMe, NVMe-oF, and NVMe-MI - whatyou need to know the next year.Janene Ellefson,MicronJ Metz, CiscoAmber Huffman, IntelDavid Allen, Seagate8/7/189:45-10:50NVMe architectures for in Hyperscale Data Centers, Enterprise Data Centers, and in the Client andLaptop space.Janene Ellefson,MicronChris Peterson, FacebookChander Chadah, ToshibaJonmichael Hands, Intel3:40-4:458/7/18NVMe Drivers and Software: This session will cover the software and drivers required for NVMeMI, NVMe, NVMe-oF and support from the top operating systems such as NVMe-oF with Linux,RedHat, Suse, Oracle, Microsoft, Vmware as well as NVMe and NVMe-oF for SPDK.Uma Parepalli,Western DigitalAustin Bolen, Dell EMCMyron Loewen, IntelLee Prewitt, MicrosoftSuds Jain, VMwareDavid Minturn, IntelJames Harris, Intel4:55-6:008/7/18NVMe-oF Transports: NVMe over Fabrics is designed to be transport agnostic, with all transportsbeing created equal from the perspective of NVM Express. We will cover for NVMe over FibreChannel, NVMe over RDMA, and NVMe over TCP.Brandon HoffBroadcomFazil Osman, BroadcomJ Metz, CiscoCurt Beckmann, BroadcomPraveen Midha, Marvell8/8/188:30-9:35NVMe-oF Enterprise Arrays: NVMe-oF and NVMe is improving the performance of classic storagearrays, a multi-billion dollar market. This session will cover NVMe and NVMe-oF for Enterprise AllFlash Arrays (AFAs) including SPDK with NVMe-oF.Brandon Hoff,BroadcomMichael Peppers, NetAppClod Barrera, IBM8/8/189:45-10:50NVMe-oF Appliances: These solutions are different than Enterprise Arrays because the targetsbeing more like JBOFs than Enterprise AFAs. We will discuss solutions that deliver highperformance and low-latency NVMe storage to automated orchestration-managed clouds.Jeremy Warner,ToshibaManoj Wadekar, eBayKamal Hyder, ToshibaNishant Lodha, MarvellLior Gal, Excelero8/8/183:20-4:25NVMe-oF JBOFs: By replacing DAS storage with Composable Infrastructure (disaggregatedstorage), based on JBOFs as the storage target, end-users benefit in terms of business agility,ease of hardware upgrades, and lowering of both CAPEX and OPEX.Bryan Cowger,Kazan NetworkdsPraveen Midha, MarvellFazil Osman, Broadcom8/8/184:40-6:45Testing and Interoperability: There are at least 9 different standards that NVMe solutions leveragefrom PCIe to NVMe to Transports for NVMe-oF. This session will cover testing for Conformance,Interoperability, Resilience/error injection testing to ensure interoperable solutions.Brandon Hoff,BroadcomTim Sheehan, IOLMark Jones, FCIA33

Architected for Performance

NVMe-oF JBOFs: Replacing DAS storage with Composable Infrastructure (disaggregated storage), based on JBOFs as the storage target. Bryan Cowger, Kazan Networks Praveen Midha, Marvell Fazil Osman, Broadcom 8/8/18 4:40-6:45 Testing and Interoperability: This session will cover testing for