Microsoft's Demon

Transcription

Microsoft’s DemonDatacenter Scale Distributed Ethernet Monitoring ApplianceRich GrovesPrincipal ArchitectMicrosoft GNS1Bill BenettiSenior Service EngineerMicrosoft MSIT

Before We Begin 2We are Network Engineers.This isn’t a Microsoft product.We are here to share methods and knowledge.Hopefully we can all foster evolution in the industry.

Microsoft is a great place to work! 3We need experts like you.We have larger than life problems to solve.Networking is important and well funded.Washington is beautiful.

The Microsoft Demon Technical Team 4Rich GrovesBill BenettiDylan GreeneJustin ScottKen HollisTanya OllickEric Chou

About Rich Groves Microsoft’s Global Network ServicesNOUS – Network of Unusual Scale Microsoft ITEOUS – Enterprise of Unusual ScaleAr#st’s  Approxima#on Time Warner Cable EndaceMade cards, systems, software for “Snifferguys” AOL“Snifferguy” 5MCI

The Traditional Network Hierarchical    Tree  Structure  –  Op#mized  for  N- ‐S  traffic6hierarchical tree optimized for north/south trafficfirewalls, load balancers, and WANoptimizersnot much cross datacenter trafficlots of traffic localized in the top ofrack

Analyzing the Traditional Network insert taps within the aggregationport mirror at the top of rackcapture packets at the loadbalancerwell  understood  but  costly  at  scale7

The Cloud Datacenter tuned for massive crossdata center traffic appliances removed forsoftware equivalents8

Can you tap this cost effectively? 8,16,and 32x10g uplinks Tapping 32x10g ports requires 64 portsto aggregate.(Who can afford buying current systems for that?) ERSpan could be used, but it impactsproduction traffic. Even port mirrors are a difficult taskat this scale.9

Many attempts at making this work Capturenet-complex to managepurpose built aggregation devices were far too expensive at scaleresulted in lots of gear gathering dust PMA - “Passive Measurement Architecture”-failed due to boring namerebranded as PUMA by outside marketing consultant (Rich’s eldest daughter) PUMA-lower cost than Capturenetextremely feature richtoo costly at scale Pretty Pink PUMA10attempt at rebranding by Rich’s youngest daughterrejected by the team

Solution 1: Off the Shelf used 100% purpose built aggregation gear supported many higher end features (timestamping,slicing,etc) price per port is far too high not dense enough (doesn’t even terminate one tap strip) high cost made tool purchases impossible no point without tools11

Solution 2: Cascading Port MirrorsHow mirror all attached monitor ports to nextlayerpre-filter by only mirroring interfaces youwish to seemonitor  portsI  heardpackets1,2,3,4The  Upside cost effectiveuses familiar equipmentcan be done using standard CLIcommands in a configThe  Downsidecontrol traffic removed by some switchesassumes you know where to find the datalack of granular controluses different pathways in the switchquantity of port mirror targets is limited 12switchswitchI’m  not  allowedto  tell  anyoneabout  packet2switchI  heardpackets1,3,4host

Solution 3: Making a Big HubHow turn off learningflood on all portsunique outer VLAN tag per port usingQinQpre-filter based on ingress port throughVLAN pruningmonitor  portsswitchswitchUpside cost effectiveDownside 13Control traffic is still intercepted by the switch.Performance is non-deterministic.Some switches need SDK scripts to make this work.Data quality suffers.switch

The End Well not really, but it felt like it.14

Core Aggregator Functionsterminates links Let’s solve 80 percent of the problem:do- ‐able  in  merchant  silicon  switch5-tuple pre-filters terminates linkschips 5-tuple pre-filtersduplication duplicationforwarding without modification forwarding without modificationlow latency low latencyzero loss zero losstime stampscostly  due  to  lack  of  demandframe slicing outside  of  the  aggregator  space15

Reversing the AggregatorThe  Basic  Logical  Components terminate links of all types and a lot of them 16low latency and losslessN:1, 1:N duplicationsome level of filteringcontrol plane for driving the device

What do these platforms have in common?Can  you  spot  thecommercialaggregator  ?17

Introducing Merchant Silicon SwitchesAdvantages of merchant silicon chips: more ports per chip (64x10g currently) much lower latency(due to fewer chip crossings) consume less power more reliable than traditional ASIC basedmulti-chip designs18

Merchant Silicon EvolutionYear200720112013201510G  on nm28nmInterface  speed  evolu#on:  40G,  100G,  400G(?),  1TbpsThis  is  a  single  chip.  Amazingly  dense  switches  are  created  using  mul#ple  chips.19

Reversing the AggregatorThe  Basic  Logical  Components terminate links of all types low latency and lossless N:1, 1:N duplication some level of filtering control plane for driving the device20

Port to Port Characteristics of Merchant SiliconLatency  port  to  port  (within  the  chip)Loss  within  the  aggregator  isn’t  acceptable.Such deterministic behavior makes a single chip system ideal as an aggregator.21

Reversing the AggregatorThe  Basic  Logical  Components terminate links of all types low latency and lossless N:1, 1:N duplication some level of filtering control plane for driving the device22

Duplication and FilteringDuplica#on line rate duplication in hardware to all portsfacilitates 1:N, N:1, N:N duplication and aggregationFiltering 23line rate L2/L3/L4 filtering on all portsthousands of filters depending on the chip type

Reversing the AggregatorThe  Basic  Logical  Components terminate links of all typeslow latency and losslessN:1, 1:N duplicationsome level of filtering control plane for driving the device24

Openflow as a Control PlaneWhat  is  Openflow? remote API for control allows an external controller to manage L2/L3 forwarding and some headermanipulation runs as an agent on the switch developed at Stanford 2007-2010 now managed by the Open Networking Foundation

Control  Bus  (Proprietary  control  protocol)Common Network DeviceData PlaneSupervisor!Control PlaneSupervisor!Data Plane

Supervisor (OpenFlow Agent)!PriorityMatchAc;onList300TCP.dst 80Fwd:port  5100IP.dst 192.8/16Queue:  2400*DROPOpenFlow Controller"Supervisor (OpenFlow Agent)!Flow TableControl  BusFlow TableController Programs Switch’s “Flow Tables”PriorityMatchAc;onList500TCP.dst 22TTL- ‐- ‐,Fwd:port  3200IP.dst 128.8/16Queue:  4100*DROP

Proactive Flow Entry Creation“match  xyz,  rewrite  VLAN,  forward  to  port  15”Controller"“match  xyz,  rewrite  VLAN,  forward  to  port  42”10.0.1.2!10.0.1.2!10.0.1.2!

Openflow 1.0 Match Primitives(Demon Related)Match Types ingress portsrc/dst MACsrc/dst IPethertypeprotocolsrc/dst portTOSVLAN IDVLAN PriorityAction Types mod VLAN IDdropoutputcontroller

Flow Table Entries “if,then,else”if    “ingress  port 24  and  ethertype 2048(IP)  and  dest  IP 10.1.1.1”then  “dest  mac 00:11:22:33:44:55  and  output port1”if  “ethertype 2054(ARP)  and  src  IP 10.1.1.1”then  “output rt10”if  “ethertype 2048(IP)  and  protocol 1(ICMP)”then  “controller”

Openflow 1.0 Limitations lack of QinQ support lack of basic IPv6 support no deep IPv6 match support can redirect based on protocol number (ether-type) no layer 4 support beyond port number cannot match on TCP flags or payloads31

Multi-Tenant Distributed Ethernet Monitoring ApplianceEnabling Packet Capture and Analysis at Datacenter Scalemonitor  ports4.8  Tbps  of  filtering  capacityfind  the  needle  in  the  haystackfilterserviceIndustry  Standard  CLImuxmore  than  20X  cheaperthan  “off  the  shelf”  solu#onsfilterserviceDemon  applianceself  serveusing  a  RESTful  APIdeliverytoolingsave  valuable  router  resourcesleveraging  Openflowusing  theDemon  packet  sampling  offloadfilter  and  deliver  to  any“Demonized”  datacenter  evento  hopboxes  and  Azureformodular  scale  and  granular  controlbased  on  low- ‐costmerchant  silicon

Filter Layermonitor  portsfilterterminates  inputs  from1,10,40g  portsmuxservicefilterservicedelivery filter switches have 60 filter interfaces facing monitor ports filter interfaces allow only inbound traffic through the use of highpriority flow entries 33ini#ally  drops  all  trafficinbound4x10g infrastructure interfaces are used as egress toward the muxapproximately  1000  L3/L4Flows  per  switchperforms  longest  match  filtershigh  rate  sFlow  sampling  withno  ”produc#on  impact”

Mux Layermonitor  portsterminates  4x10g  infrastructureports    from  each  filter  switchperforms  shortest  match  filtersmuxfilterservicefilterserviceprovides  both  service  node  anddelivery  connec#vitydeliveryduplicates  flows  downstream  ifneededtooling introduces pre-service and post-service ports used to aggregate all filter switches directs traffic to either service node or delivery interfaces34

Services Nodesmonitor  portsleverage higher end features on asmaller set of portspossible uses:filtermuxservicefilterservicedelivery connected to mux switch through pre-service and post-service ports performs optional functions that Openflow and merchant silicon cannotcurrently provide35 deeper filteringtime stampingframe slicingencapsulation removal for tunnelinspectionconfigurable logginghigher resolution samplingencryption removalpayload removal for complianceencapsulation of output for locationindependence

Delivery Layermonitor  ports1:N  and  N:1  duplica#ondata  delivery  to ing introduces delivery interfaces which connect tools to Demon can optionally fold into mux switch depending on tool quantity andlocation36further  filtering  if  needed

Advanced Controller Actionscontroller"API"receives  packets  and  octets  of  allflows  createdabove  used  as  rough  trigger  forautomated  packet  capturesDemonapplica#onCLIAPIduplicate  LLDP,  CDP,  and  ARPtraffic  to  the  controller  at  lowpriority  to  collect  topologyinforma#onsource  “Tracer”  documenta#onpackets  to  describe  the  trace37

Location Aware Demon Policy monitor  portsdrops  bydefaultport1 of filter1 to port 1 of delivery1”filterfilter1high  priorityflow lica#onserviceCLIAPIdeliveryuser38policy created using CLI or API“forward all traffic matching tcp dest 80 on Demon app creates flows thoughcontroller API controller pushes a flow entry tofilter1,mux,and delivery to outputusing available downstream links traffic gets to the wireshark system

Location Independent Demon Policydrops  by  defaulton  all  ingressinterfacesmonitor  portsingress  Vlan  tag  isrewrioenfilter1high  priorityflow  is  createdon  all ser39policy created using CLI or API if TCP dst port 80 on any ingressport on any filter switch then addlocation meta-data and deliver todelivery1 Ingress VLAN tag is rewritten toadd substrate locale info anduniqueness to duplicate packets. Traffic gets to Wireshark.filtercontroller"API"service

Inserting a Service Node policy created using CLI or APIforward all traffic matching tcp dest 80on port1 of filter1 to port 1 of delivery1and use service node “timestamping””monitor  portsmux  usesservice  node  asegress  for rviceCLIAPIdelivery#mestamp  is  addedto  frame  and  senttoward  mux40flows created per policy on thefilter and mux to use the servicenode as egress traffic gets to Wiresharkfilterflows  createdbased  on  policyservice mux  sends  servicenode  sourced  trafficto  delivery  switchuser

Advanced Use Case 1:Closed Loop Data Collection monitor  ports filterfilter1sFlowsamplessourcedfrom rviceserviceCLIAPIdeliveryonlymeaningfulcaptures  aretaken41sFlowcollectorsFlow exports to collectorProblem subnets are observedthrough behavioral analysis.sFlow collector executes Demonpolicy via the API to send all trafficfrom these subnets to a capturedevicetracer packets are fired toward thecapture device describing thereason and ticket number of theevent

Advanced Use Case 2:Infrastructure Cries for Helpmonitor  ports A script is written for the loadbalancer describing a failstate,DDOS signature, or otherperformance degradation. The load balancer executes anHTTP sideband connectioncreating a Demon policy based onthe scripted condition. Tracer packets are fired at thecapture server detailing the reasonfor this adbalancer42

Summary The use of single chip merchant silicon switches and Openflow canbe an adequate replacement for basic tap/mirror aggregation at afraction of the cost. An open API allows for the use of different tools for different tasks. Use of an Openflow controller enables new functionality that theindustry has never had in a commercial solution.43

Thanks Q&A Thanks for attending!44

sFlow' collector' sFlow exports to collector Problem subnets are observed through behavioral analysis. sFlow collector executes Demon policy via the API to send all traffic from these subnets to a capture device tracer packets are fired toward the capture device describing the reason and ticket number of the event