Network Services In A Multivendor

Transcription

Executive TrackNetwork Services in a MultivendorEnterpriseSaikrishna KothaDirector of Global Network Engineering and OperationsPayPal

Agenda PayPal’s Core Network Services Automation Architecture Pillars AZ2.0 - Leaf/Spine Network Monitoring fabric Infrastructure as a Code (IaC) Network Services – a pragmatic approach 2019 PayPal Inc. Confidential and proprietary.3

Saikrishna M. KothaRoles/Responsibility @ PayPal: Director, Global Network Engineering & Operations PayPal’s Global Payments Network is to provide secure, resilient and efficient globalconnectivity to PayPal customers, merchants partners and business units.Educational & Overall Industry Experience: 15 Years industry experience; worked for LinkedIn, Xilinx, Dell, Nortel, Ciena,CDOT CAB Member: Cumulus, BigSwitch, Aporeto Web-scale datacenter network architecture, design, delivery & operations Systems Strategy for both Datacenter/Enterprise Networks Total 25 patents (issued/pending) focused in cloud networking, SDN, NFV areas. B.E (ECE); M.B.Ahttps://www.linkedin.com/in/saikrishnakotha/

PayPal’s Core Network Services (CNS)Provides a secure, reliable and efficient payments network to enable hybrid cloud deployments.Current Network fleet:Network-as-a-Service: 1000’s of devices Multi-generation, evolved Global backbone: POP locations, MPLS network HybridCloud, Extranet, Secure connectivityNetwork Security: Least privilege access & regularity policy auditsOperating Systems/ Tech: Autonomous Fleet: Business IntentLCM automation/APIsOperational SLAHybridCloud enablement ProgrammabilityAnomaly driventelemetryRemediation factorySelf-service enablementLove all traffic,Serve all Apps 1G/10G/25G/40G/100G & MACSec encryptionDrivers for Network Strategy:Create Strong Foundation: Always on & secure: create strong foundation Growth/modernization: Network-as-a-Service Zero-touch-operations: Autonomous fleet 2019 PayPal Inc. Confidential and proprietary. Innovation centeredstandardizationDrift avoidance Predictable reliabilityEnhanced enterprisetools Integration5

Network Designs Over The YearsIt is a journey. Network stack evolved over the years .Public CloudeBay BB/ISPBackbone ISPMulti-region Public CloudMulti CloudBackbone ISPBackbone ISPLeaf/Spine DC Networks( FWs and LBs)25G/100GLeaf/Spine DC Networks( LBs, Distributed FW)25G/100G/400G?Boarder connectivityWAN connectivityCore routers ( FWs and LBs)Infra Switching( FWs and LBs)DC Network Layer1G/10G – Layer2Core Network Layer10G/40G (ToR and Spine)Network in the compute layer(Physical world)Network in the compute layer(Physical, Virtual world)Network in the compute layer(Physical, Virtual, Containerworld)Network in the compute layer(Physical, Container world)2013-20172017-20192019-2021Mid 2000’s 2019 PayPal Inc. Confidential and proprietary.6

Core Network Services (CNS) –Architecture PillarsDisaggregation, Secure Global Network, Zero Touch EverythingDisaggregation Centered Networking Fleet:oDisaggregation of network HW and SW innovationoCloud scale economics by leveraging white-box innovationoSingle SKU for DC network : context-based networkfunctionsNetwork Security as a Service (NSaaS):o Security as a services through programmabilityTelemetryo Security policy visibility/automationo Distributed Firewall (DFW) for application level securityGlobal Payments Backbone as a Service:Disaggregationo Template based PoP designs; Extranet-as-a-ServicePaymentsBackboneo Flow level visibility; low-latency global customerconnectivityNSaaSo Multi-cloud/multi-region enablementTelemetry Driven Core Network Services (CNS):o Zero Touch Provisioning (ZTP): deployment agilityPowered by: CNS Shell# programmability & visibilityo Zero Touch Operations (ZTO): self-healing networkso Anomaly driven telemetry: HealthChecks & DVR for CNS 2019 PayPal Inc. Confidential and proprietary.7

PayPal Environment: Transforming DC NetworksSoftware Defined DC NetworkTransformation approach:Value Propositions: Reduce number of HW device types & spares Reduce network designs Common HW - leverage SW innovations fromvarious vendors Leverage ASIC evolution through 1RU form factor Adopt common HW SKU/network design (leaf/spine) Adopt 3rd party optics and cables to work with all vendors Build life cycle management (LCM) for entire fleet Monitoring fabric comes standard Qualify dual vendors for each component: Not to mix vendors in a given environment: forinterop/stability reasons Avoid supply chain shortagesFinancial Savings 2019 PayPal Inc. Confidential and proprietary.8

AZ2.0 Network requirementsNext generation network build out32x100G flex portplatformHW simplification: Single SKU: 32x100G switch device Flex port speed: 100G/50G/25G/10G network White-box switching OS powered Optics & cable consolidation: 3rd party optics/cablesDesign goals:100G CWDM4 3rd partyopticsSingle mode fiber Extensible leaf/spine architecture: Bubble concept Eliminate IP Subnet depletion: Bubble level IP addressing Compute morphing: VM Mobility within the bubble Container support: Container mobility and address flexibility Layer2 adjacencies within the bubble: VxLAN overlay tunnelNetwork Automation: 40G to 4x10G breakout100G to 4x25G breakout 2019 PayPal Inc. Confidential and proprietary.Day0 automation: Zero Touch Provisioning (ZTP) Automated Network build-out Plug and play Rack-on-boardingDay0 automation & service APIs9

AZ2.0 Network Design – Lesson learnedLessons learned: 15 Racks per POD Infra rack IP addressing at POD level – helps with IP deletion issues in thePOD VxLAN design & Interop tests with FW/LB/ compute nodes Tune alerts to catch HW resource utilizations Do comprehensive failure & scale tests beforehand. Convergencetimes vary based on failure scenario. Single vendor VxLAN domainPOD connectivity concept diagram 2019 PayPal Inc. Confidential and proprietary.10

Common Monitoring FabricMonitoring Fabric for US-WestSalient points: 2019 PayPal Inc. Confidential and proprietary.White Box Hardware100gig fabric between Filter/Core/DeliveryDelivery layer provides 10/25/40/50/100gigFirewall's will be tappedSwitches will use SpanNPB (Network Packet Broker) Deduplication NetFlow Generation Packet Slicing Header Stripping (and more)Plug and play fabricAutomatic traffic routing on failureAdditional services Analytics Packet Recorders11

100G/White-box Adoption: RoadmapPath to transformationWhitebox ied/not-deployedQualified/Deployed** ODM – Original Device Manufacturer** OEM – Original Equipment ManufacturerWhiteboxOSODM1(32x100G device)OEM1ODM2(32x100G device)Common White-box ODM HWOEM2Vendor#1V#2100G CWDM4Vendor#1V#2100G (4x25G)3rd Party Optics and Cables White-box HW for both core network as well as for common monitoring fabric Leverage common HW and personalize it with specific white-box/OEM SW Helps to avoid mixing vendors in a given environment Helps to streamline spare inventory 2019 PayPal Inc. Confidential and proprietary.12

ActorsUse casesMultivendor Network Automation JourneyService Fabric EnablementFluid capacity with securityNetwork OperationsZero Touch ProvisioningDeclarative operations: WISI/WIRIFrictionless server moveDelightful security provisioning/Audit trailReduction of manual changesIncreased visibilityReduction in MTTD/MTTRInfra and CloudNetEng/NetSecSlack BOTsPrivate CloudInfraServicesSelf-Service commandsConfiglets/playbooks & Templates Applications (Bots) Self-service enablementFoundation Life Cycle Managers/APIs Central Source of truth(inventory, configs) Workflow integrations Static & run time state Streaming Telemetry Self Healing Closed loop processesNetwork AnalyticsAutonomousAlert Management#3#4Network StateInsightsNetwork Automation CLI & SNMPManually Managed 2019 PayPal Inc. Confidential and proprietary.#1#2*WISB- What It Should Be*WIRI – What It Really Is*MTTD – Mean Time To Detect*MTTR – Mean Time To Resolution13

Infrastructure as Code (IaC)Core Benefits:The Journey: Remove human in-betweenDevice DeploymentDay0Determinism and greater network insight Increased business agility and productivity Lower operation drifts and costsProjects associated with each phase: Actual StateExpected stateDay nZero Touch OperationsComposable Services Day1Day2 Fleet-wide MonitoringCNS APIs; Global Search; Self-service 2019 PayPal Inc. Confidential and proprietary. Day 0: Zero Touch Provisioning (ZTP) Device deployment Day0 config push SOR -Inventory managementDay 1: Composable services Common framework development Service packs development Orchestration layers integrationDay 2: Fleet wide monitoring Device static and dynamic state monitoring NetFlow/SNMP based device counter collections Syslog monitoringDay n: Zero Touch Operations (ZTO) Device failure handlings/upgrades WISB WIRI : anomaly detection14

Spike: Network Services Common FrameworkEnablement to develop composable servicesValue Propositions:‘Spike’ Framework Uniform service layer interface to all CNS capabilities Self service capability enablement Integration with PayPal Cloud Services layer for Network APIsSpike Services:FWaaSDCNetBackboneCNS –State DBCNS composable services enablement 2019 PayPal Inc. Confidential and proprietary. LDAP integrationKey Manager integrationIPAM integrationServiceNow integrationWorkflow engineCMDB integrationMaintain correlation IDsBuilt-in DatabaseDevice Layer to interact with LCMsREST based APIsAdmin UI15

Spike Framework - ArchitectureWrite once, leverage it for all CNS IaC initiativesValue propositions:Work-flow integrationsCloud Services, ServiceNow Spike frameworkLCM1LCM2LCM3LCM4LCMs 2019 PayPal Inc. Confidential and proprietary.Repeatable Day0 task are enabled through CRUD operationsEliminate DriftReduce human touch/reduce manual changesNetwork APIs for IaaS orchestration layersBrownfield environment: vendor variety & device varietyDesign Principles:Network Service APIsAnsibleTower Service layer for heterogenous network environment Reduce manual changes and provide APIs for workflowintegrations Modular design to accommodate plug-n-play sub-components Config Consistency and availability No operational state persistency in data store Asynchronous execution for all APIs Auto re-try to handle downstream failures16

Network Self-Service EnablementNetwork Blackbox to Self-service capabilitiesValue propositions: Telemetry and visibility at global bot#Graph searchCNS Shell#%Dashboards%Domain healthCRUD APIsAutoRemediationlevel Dynamic network map Enable self service capabilities Drift detection and auto remediation factoryUse Cases:Fleet Inventory(CMDB)Static &Run time stateLogs/Alerts DBFlow data &SNMP counters DB Network global ‘path’ search is enabled through‘Slackbot’ Network state DB overlaid with SNMP logs Integrate with ping-pong service for latencymeasurements 2019 PayPal Inc. Confidential and proprietary.17

Final Thoughts Build balanced skillset organization with SWdev/NetEng/NetOpsDesigns standardization/Reduce HW SKUsTake advantage of leaf/spine network architectures for DC – It works!Be pragmatic in your automation journey, leverage vendor tools where availableCollaborate with like minded industry partnersHappy to rishnakotha/#WeAreHiring 2019 PayPal Inc. Confidential and proprietary.18

Network Designs Over The Years 2019 PayPal Inc. Confidential and proprietary. 6. It is a journey. Network stack evolved over the years . Network in the compute layer (Physical world) DC Network Layer . 1G/10G - Layer2. Core routers ( FWs and LBs) eBay BB/ISP. Boarder connectivity. Mid 2000's. Network in the compute layer (Physical .