1430-1500 Barroso Developing Control Plane - NANOG

Transcription

DEVELOPING AND EVOLVINGYOUR OWN CONTROL PLANEDavid BarrosoName of Presentation

201320152017OriginsEvolutionMigration

201320152017OriginsEvolutionMigration

Requirements?

RequirementsMultiple transitseach with different full routing tablesrequired for redundancy, performance?

RequirementsMultiple transitseach with different full routing tablesrequired for redundancy, performance?Multiple hostsprovide horizontal scalabilityheavily optimized for application

RequirementsMultiple transitseach with different full routing tablesrequired for redundancy, performance?reduce cost of traffic between hostsmust be cost-effective in order to scalenumber of POPsMultiple hostsprovide horizontal scalabilityheavily optimized for application

RequirementsMultiple transitseach with different full routing tablesrequired for redundancy, performance!!Routerslots of bells and whistleslimited to “best-path” forwarding or PBRport density not very goodpower hungryexpensiveMultiple hostsprovide horizontal scalabilityheavily optimized for application

RequirementsMultiple transitseach with different full routing tablesrequired for redundancy, performance:D:DSwitcheslimited FIBonly care about IP and ethernetlinux and programmable!great port densitycheapMultiple hostsprovide horizontal scalabilityheavily optimized for application

The Team 2 network engineersresponsible for entire infrastructuretoo busy on-call to care about dealing with vendors 2 software engineersone was CEO, so doesn’t countopen source background, hate network appliancesdetermined to push control to application

The EpiphanyRouters are expensiveand don’t quite do what we want anywayport density is terrible in terms of size, and power consumptionSwitches are cheapport density is great in terms of size, and power consumptiondon’t do what we want and have plenty of hardware limitationsProgrammable switches running linux!and we have software engineers in the team

SilvertonLET’S BUILD OUR OWN CONTROL PLANE(how hard can it be1?)1 astly-network-part-1-fighting-fib

Architectureswitchhost* Section of the network, full architecture usually consistsof 4-way ECMP and may have more than one tier

ArchitectureL3 segmentL3 segmentswitchL3 segmenthost* Section of the network, full architecture usually consistsof 4-way ECMP and may have more than one tier

ArchitectureFIBARPconnected 192.168.1.0/24connected 10.0.0.0/31connected 10.0.1.0/31192.168.1.2 AA:BB:CC:DD:EE:11FIBARPconnected 192.168.1.0/24192.168.1.1L3 segmentL3 segmentswitchAA:BB:CC:DD:EE:44L3 segmenthost* Section of the network, full architecture usually consistsof 4-way ECMP and may have more than one tier

ArchitectureswitchiBGP - AS54113FIBARPconnected 192.168.1.0/24connected 10.0.0.0/31connected 10.0.1.0/31192.168.1.2 AA:BB:CC:DD:EE:11FIBARPconnected 192.168.1.0/24192.168.1.1AA:BB:CC:DD:EE:44host* Section of the network, full architecture usually consistsof 4-way ECMP and may have more than one tier

Architecture172.20.0.0/24 switch iBGP - AS54113FIBARPconnected 192.168.1.0/24connected 10.0.0.0/31connected 10.0.1.0/31192.168.1.2 AA:BB:CC:DD:EE:11FIBARPconnected 192.168.1.0/24BGP 172.20.0.0/24via 10.0.0.1via 10.0.1.1192.168.1.1AA:BB:CC:DD:EE:44host* Section of the network, full architecture usually consistsof 4-way ECMP and may have more than one tier

Architecture? ?switchFIBARPconnected 192.168.1.0/24connected 10.0.0.0/31connected 10.0.1.0/31192.168.1.2 AA:BB:CC:DD:EE:11FIBARPconnected 192.168.1.0/24BGP 172.20.0.0/24via 10.0.0.1via 10.0.1.1192.168.1.1AA:BB:CC:DD:EE:44hostdst ip: 172.20.0.1dst mac: ?:?:?:?:?:?* Section of the network, full architecture usually consistsof 4-way ECMP and may have more than one tier

ArchitectureswitchL2 domainFIBARPconnected 192.168.1.0/24connected 10.0.0.0/31connected 10.0.1.0/31192.168.1.2 AA:BB:CC:DD:EE:11FIBARPconnected 192.168.1.0/24BGP 172.20.0.0/24via 10.0.0.1via 44AA:BB:CC:DD:EE:00AA:BB:CC:DD:EE:11host* Section of the network, full architecture usually consistsof 4-way ECMP and may have more than one tier

ArchitectureswitchL2 domainFIBARPconnected 192.168.1.0/24connected 10.0.0.0/31connected 10.0.1.0/31192.168.1.2 AA:BB:CC:DD:EE:11FIBARPconnected 192.168.1.0/24BGP 172.20.0.0/24via 10.0.0.1via 44AA:BB:CC:DD:EE:00AA:BB:CC:DD:EE:11hostdst ip: 172.20.0.1dst mac: AA:BB:CC:DD:EE:00* Section of the network, full architecture usually consistsof 4-way ECMP and may have more than one tier

ArchitectureswitchL2 domainFIBARPconnected 192.168.1.0/24connected 10.0.0.0/31connected 10.0.1.0/31192.168.1.2 AA:BB:CC:DD:EE:11FIBARPconnected 192.168.1.0/24BGP 172.20.0.0/24via 10.0.0.1via 44AA:BB:CC:DD:EE:00AA:BB:CC:DD:EE:11hostdst ip: 172.20.0.1dst mac: AA:BB:CC:DD:EE:01* Section of the network, full architecture usually consistsof 4-way ECMP and may have more than one tier

st-ping

201320152017OriginsEvolutionMigration

201320152017OriginsEvolutionMigration

New RequirementsAddress the shortcomings of the previous architectureL2 hacks don’t scale wellSharing L2 with transits means we had to filter broadcast trafficNo clear separation between L2 and L3 tables confuses peopleSupport for IXPssharing L2 with IXPs is a **strong NO**FIB limitations were starting to hurt

The State of NetworkingRouterslots of bells and whistleslimited to “best-path” forwarding or PBRearly days of Segment Routing/BGP-LUexpensiveSwitchescheaplimited FIBonly care about IP and ethernetMPLS and other encapsulation protocols supported

The Team 6 network engineersresponsible for entire infrastructuretoo busy on-call to care about dealing with vendors 6 software engineersfocused on many different products; load balancing, distributed health checking, routingarchitecture, automation, kernel development, etc

New ArchitectureP-t-p linksLABEL ALABEL BeBGPAn MPLS label per exit(transits, PNIs, peers )edgeAS54113AS XMPLS starts on the hostHost tags the packets with labelsdepending on the desired pathAS YleafAS YAS Zhostdst ip: 172.20.0.1MPLS label: Adst ip: 172.20.0.1MPLS label: B

How does it work?172.20.0.0/241. Provisioning system assignsunique label to each exit point172.20.0.0/24A2. Silverton announces livenessof LSP down the networkBedge3. Incoming prefixes fromeach transit are tagged withan extended community thatidentifies the LSPleaf172.20.0.0/24ext comm: A - MPLS LABEL: Aext comm: B - MPLS LABEL: Bhost4. On the host, BGP extractsthe community and assignslabels to routes

The network couldn’t be simplerswitch-cmh8801#show mpls lfib routeCodes: S - Static MPLS Route, I A - ISIS-SR Adjacency Segment,I P - ISIS-SR Prefix Segment, L - LDP,I-L - ISIS-SR Segment to LDP, L-I - LDP to ISIS-SR 1Egress-Acl---------ApplyApplyApplyApplySource FEC------ --SSSS

Extended communities map to labelsdbarroso@cache-cmh8820: sudo birdc show route 223.255.251.0/24 allBIRD 1.6.3 ready.223.255.251.0/24 via 172.18.128.1 (MPLS label 1020) [switch cmh8801] * (100) [AS63199i]Type: BGP unicast univBGP.as path: 4210088000 174 3491 63199BGP.next hop: 172.18.128.1BGP.med: 0BGP.local pref: 50BGP.community: (174,21000) (174,22013)BGP.ext community: (ro, 2, 3) (ro, 3, 1) (generic, 0x80860000, 1020)via 172.18.130.1 (MPLS label 1024) [switch cmh8802] (100) [AS63199i]Type: BGP unicast univBGP.as path: 4210088000 174 3491 63199BGP.next hop: 172.18.130.1BGP.med: 0BGP.local pref: 50BGP.community: (174,21000) (174,22013)BGP.ext community: (ro, 2, 3) (ro, 3, 1) (generic, 0x80860000, 1024)

Extended communities map to labelsdbarroso@cache-cmh8820: sudo birdc show route 223.255.251.0/24 allBIRD 1.6.3 ready.223.255.251.0/24 via 172.18.128.1 (MPLS label 1020) [switch cmh8801] * (100) [AS63199i]Type: BGP unicast univBGP.as path: 4210088000 174 3491 63199BGP.next hop: 172.18.128.1BGP.med: 0BGP.local pref: 50BGP.community: (174,21000) (174,22013)BGP.ext community: (ro, 2, 3) (ro, 3, 1) (generic, 0x80860000, 1020)via 172.18.130.1 (MPLS label 1024) [switch cmh8802] (100) [AS63199i]Type: BGP unicast univBGP.as path: 4210088000 174 3491 63199BGP.next hop: 172.18.130.1BGP.med: 0BGP.local pref: 50BGP.community: (174,21000) (174,22013)BGP.ext community: (ro, 2, 3) (ro, 3, 1) (generic, 0x80860000, 1024)

Extended communities map to labelsdbarroso@cache-cmh8820: sudo birdc show route 223.255.251.0/24 allBIRD 1.6.3 ready.223.255.251.0/24 via 172.18.128.1 (MPLS label 1020) [switch cmh8801] * (100) [AS63199i]Type: BGP unicast univBGP.as path: 4210088000 174 3491 63199BGP.next hop: 172.18.128.1BGP.med: 0BGP.local pref: 50BGP.community: (174,21000) (174,22013)BGP.ext community: (ro, 2, 3) (ro, 3, 1) (generic, 0x80860000, 1020)via 172.18.130.1 (MPLS label 1024) [switch cmh8802] (100) [AS63199i]Type: BGP unicast univBGP.as path: 4210088000 174 3491 63199BGP.next hop: 172.18.130.1BGP.med: 0BGP.local pref: 50BGP.community: (174,21000) (174,22013)BGP.ext community: (ro, 2, 3) (ro, 3, 1) (generic, 0x80860000, 1024)

Extended communities map to labelsdbarroso@cache-cmh8820: sudo birdc show route 223.255.251.0/24 allBIRD 1.6.3 ready.223.255.251.0/24 via 172.18.128.1 (MPLS label 1020) [switch cmh8801] * (100) [AS63199i]Type: BGP unicast univBGP.as path: 4210088000 174 3491 63199BGP.next hop: 172.18.128.1BGP.med: 0BGP.local pref: 50BGP.community: (174,21000) (174,22013)BGP.ext community: (ro, 2, 3) (ro, 3, 1) (generic, 0x80860000, 1020)via 172.18.130.1 (MPLS label 1024) [switch cmh8802] (100) [AS63199i]Type: BGP unicast univBGP.as path: 4210088000 174 3491 63199BGP.next hop: 172.18.130.1BGP.med: 0BGP.local pref: 50BGP.community: (174,21000) (174,22013)BGP.ext community: (ro, 2, 3) (ro, 3, 1) (generic, 0x80860000, 1024)

Routing decision is made on the hostdbarroso@cache-cmh8820: ip rule174:from all fwmark 0xae lookup 1741299:from all fwmark 0x513 lookup 1299dbarroso@cache-cmh8820: ip route show table 174default proto static src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weight 1nexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight 1dbarroso@cache-cmh8820: ip route show table 1299default proto static src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1022 via 172.18.128.1 dev vlan100 weight 1nexthop encap mpls 1026 via 172.18.130.1 dev vlan200 weight 1dbarroso@cache-cmh8820: ip route show table main head1.0.4.0/24 proto bird src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weightnexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight1.0.4.0/22 proto bird src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weightnexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight1111

Routing decision is made on the hostdbarroso@cache-cmh8820: ip rule174:from all fwmark 0xae lookup 1741299:from all fwmark 0x513 lookup 1299dbarroso@cache-cmh8820: ip route show table 174default proto static src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weight 1nexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight 1dbarroso@cache-cmh8820: ip route show table 1299default proto static src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1022 via 172.18.128.1 dev vlan100 weight 1nexthop encap mpls 1026 via 172.18.130.1 dev vlan200 weight 1dbarroso@cache-cmh8820: ip route show table main head1.0.4.0/24 proto bird src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weightnexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight1.0.4.0/22 proto bird src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weightnexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight1111We have arouting tableper transit/peer

Routing decision is made on the hostdbarroso@cache-cmh8820: ip rule174:from all fwmark 0xae lookup 1741299:from all fwmark 0x513 lookup 1299dbarroso@cache-cmh8820: ip route show table 174default proto static src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weight 1nexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight 1dbarroso@cache-cmh8820: ip route show table 1299default proto static src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1022 via 172.18.128.1 dev vlan100 weight 1nexthop encap mpls 1026 via 172.18.130.1 dev vlan200 weight 1dbarroso@cache-cmh8820: ip route show table main head1.0.4.0/24 proto bird src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weightnexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight1.0.4.0/22 proto bird src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weightnexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight1111Per peer/transit routingtables onlycontain adefault routeindicating theMPLS labelrequired to use

Routing decision is made on the hostdbarroso@cache-cmh8820: ip rule174:from all fwmark 0xae lookup 1741299:from all fwmark 0x513 lookup 1299dbarroso@cache-cmh8820: ip route show table 174default proto static src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weight 1nexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight 1dbarroso@cache-cmh8820: ip route show table 1299default proto static src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1022 via 172.18.128.1 dev vlan100 weight 1nexthop encap mpls 1026 via 172.18.130.1 dev vlan200 weight 1dbarroso@cache-cmh8820: ip route show table main head1.0.4.0/24 proto bird src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weightnexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight1.0.4.0/22 proto bird src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weightnexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight1111Rules allow theapplication toforce trafficinto specificrouting tables

Routing decision is made on the hostdbarroso@cache-cmh8820: ip rule174:from all fwmark 0xae lookup 1741299:from all fwmark 0x513 lookup 1299dbarroso@cache-cmh8820: ip route show table 174default proto static src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weight 1nexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight 1dbarroso@cache-cmh8820: ip route show table 1299default proto static src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1022 via 172.18.128.1 dev vlan100 weight 1nexthop encap mpls 1026 via 172.18.130.1 dev vlan200 weight 1dbarroso@cache-cmh8820: ip route show table main head1.0.4.0/24 proto bird src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weightnexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight1.0.4.0/22 proto bird src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weightnexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight1111If applicationsets thefwmark “ae”traffic is forcedinto the table174

Routing decision is made on the hostdbarroso@cache-cmh8820: ip rule174:from all fwmark 0xae lookup 1741299:from all fwmark 0x513 lookup 1299dbarroso@cache-cmh8820: ip route show table 174default proto static src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weight 1nexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight 1dbarroso@cache-cmh8820: ip route show table 1299default proto static src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1022 via 172.18.128.1 dev vlan100 weight 1nexthop encap mpls 1026 via 172.18.130.1 dev vlan200 weight 1dbarroso@cache-cmh8820: ip route show table main head1.0.4.0/24 proto bird src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weightnexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight1.0.4.0/22 proto bird src 199.27.79.20 mtu 1500 advmss 1460nexthop encap mpls 1020 via 172.18.128.1 dev vlan100 weightnexthop encap mpls 1024 via 172.18.130.1 dev vlan200 weight1111main (default)routing tabledelegatesdecision totradition bestpath selectionalgorithm

RetrospectiveSame nice features as the previous iterationper flow routingShortcomings were addressedno more broken assumptions on how networks work or are operatedno more L2 hacksNew architectural changes worked as expected

201320152017OriginsEvolutionMigration

201320152017OriginsEvolutionMigration

ObjectivesChange of protocolOld architecture used iBGP while new used eBGPMust minimize concurrent architecturestwice the codetwice the toolingtwice the knowledgetwice the things that can go wrongMust minimize migration periodtraditional migrations can take years for large infrastructurescheduling, customer notification, executing the work must be reduced to bare minimum

iBGP to eBGP challenges1. Changing of ASN requires synch’ing changesin multiple deviceswhich comes with some automation challengesswitch-1switch-2hostswitch-3

iBGP to eBGP challenges2. ECMPWhile we have the same protocoleverywhere all paths are eligible.switch-1switch-2hostswitch-3

iBGP to eBGP challenges2. ECMPWhile we have the same protocoleverywhere all paths are eligible.eBGP wins over iBGPnow all traffic is going via a single linkpotentially congesting the linksswitch-1switch-2hostswitch-3

How do we solve those problems?“That’s impossible and I could cite 5 RFCs and reference 20 vendor white-papers explaining why”Jimmy, Network Architect, CCIE, JNCIE, Naysayer“I have no respect for my elders”Lisa, Software Developer, Nihilist, Hacker

ASN migrationStarting configprotocol bgp host-1 {local as 54113;neighbor host ip as 54113;}switch-1AS54113switch-2AS54113protocol bgp switch-1 {local as 54113;neighbor switch1 ip as 54113;}host

ASN migrationWe allow peers to connect witheither ASNprotocol bgp host-1 {local as 54113;neighbor host ip as 54113 as 65100;}switch-1AS54113switch-2AS54113protocol bgp switch-1 {local as 54113;neighbor switch1 ip as 54113 as 65000;}host

ASN migrationNow we can change the local ASNindependently on each deviceprotocol bgp host-1 {local as 65000;neighbor host ip as 54113 as 65100;}switch-1AS65000switch-2AS54113protocol bgp switch-1 {local as 54113;neighbor switch1 ip as 54113 as 65000;}host

iBGP/eBGP prefix compatibilityHow to make “eBGP prefixes”compatible with “iBGP prefixes”?switch-1AS65000protocol: eBGPAS PATH: 65000 AS PATHlocal-pref: 0switch-2AS54113hostprotocol: iBGPAS PATH: AS PATHlocal-pref: 100

iBGP/eBGP prefix compatibilityeBGP iBGPswitch-1AS65000protocol: eBGPAS PATH: 65000 AS PATHlocal-pref: 0switch-2AS54113hostprotocol: iBGPAS PATH: AS PATHlocal-pref: 100

iBGP/eBGP prefix compatibilitycompare as ibgpapplied to an eBGP session, itsprefixes will be comparedagainst other candidates as ifthey were coming from aniBGP sessionswitch-1AS65000protocol: eBGPAS PATH: 65000 AS PATHlocal-pref: 0switch-2AS54113hostprotocol: iBGPAS PATH: AS PATHlocal-pref: 100

iBGP/eBGP prefix compatibilitylen(65000 AS PATH) len( AS PATH)switch-1AS65000protocol: eBGPAS PATH: 65000 AS PATHlocal-pref: 0switch-2AS54113hostprotocol: iBGPAS PATH: AS PATHlocal-pref: 100

iBGP/eBGP prefix compatibilityskip private as path prefixdo not count leading private ASNswhen computing length of AS pathswitch-1AS65000protocol: eBGPAS PATH: 65000 AS PATHlocal-pref: 0switch-2AS54113hostprotocol: iBGPAS PATH: AS PATHlocal-pref: 100

iBGP/eBGP prefix compatibilityeBGP local-prefswitch-1AS65000protocol: eBGPAS PATH: 65000 AS PATHlocal-pref: 0switch-2AS54113hostprotocol: iBGPAS PATH: AS PATHlocal-pref: 100

iBGP/eBGP prefix compatibilityallow bgp local prefallow local-pref between eBGPspeakersswitch-1AS65000protocol: eBGPAS PATH: 65000 AS PATHlocal-pref: 100switch-2AS54113hostprotocol: iBGPAS PATH: AS PATHlocal-pref: 100

Migration process (overview)Step 1. Enable new options.Step 2. Flip “local as”:protocol bgp cache-cmh8820 from tpl bgp {protocol bgp cache-cmh8820 from tpl bgp {local as 54113;-local as 54113;local as 4210088000;-neighbor cache cmh8820 ip as 54113; neighbor cache cmh8820 ip as 54113 as 4210088001;} compare as ibgp; skip private as path prefix; allow bgp local pref;}Impact: NoneImpact: Only a BGP session flap thanks to the BGP“hacks”. Progressive rollout is enough.

ExampleeBGPiBGPbird show route 2.2.2.2/32 all2.2.2.2/32via 10.0.0.2 on eth0 [switch 1] * (110) [AS123i]Type: BGP unicast univBGP.origin: IGPBGP.as path: 123BGP.next hop: 10.0.0.2BGP.local pref: 110BGP.originator id: 10.0.0.51BGP.cluster list: 10.0.0.101via 10.0.0.1 on eth0 [switch 0] (110) [AS123i]Type: BGP unicast univBGP.origin: IGPBGP.as path: 65100 65000 65000 65000 65000 65000 123BGP.next hop: 10.0.0.1BGP.local pref: 110bird show route 2.2.2.2/32 export injector2.2.2.2/32multipath [switch 1 16:03:32 from 10.0.0.101] (100) [AS123i]via 10.0.0.1 on eth0 weight 1via 10.0.0.2 on eth0 weight 1

compare as ibgpeBGPiBGPbird show route 2.2.2.2/32 all2.2.2.2/32via 10.0.0.2 on eth0 [switch 1] * (110) [AS123i]Type: BGP unicast univBGP.origin: IGPBGP.as path: 123BGP.next hop: 10.0.0.2BGP.local pref: 110BGP.originator id: 10.0.0.51BGP.cluster list: 10.0.0.101via 10.0.0.1 on eth0 [switch 0] (110) [AS123i]Type: BGP unicast univBGP.origin: IGPBGP.as path: 65100 65000 65000 65000 65000 123BGP.next hop: 10.0.0.1BGP.local pref: 110bird show route 2.2.2.2/32 export injector2.2.2.2/32multipath [switch 1 16:03:32 from 10.0.0.101] (100) [AS123i]via 10.0.0.1 on eth0 weight 1via 10.0.0.2 on eth0 weight 1

skip private as path prefixeBGPiBGPbird show route 2.2.2.2/32 all2.2.2.2/32via 10.0.0.2 on eth0 [switch 1] * (110) [AS123i]Type: BGP unicast univBGP.origin: IGPBGP.as path: 123BGP.next hop: 10.0.0.2BGP.local pref: 110BGP.originator id: 10.0.0.51BGP.cluster list: 10.0.0.101via 10.0.0.1 on eth0 [switch 0] (110) [AS123i]Type: BGP unicast univBGP.origin: IGPBGP.as path: 65100 65000 65000 65000 65000 123BGP.next hop: 10.0.0.1BGP.local pref: 110bird show route 2.2.2.2/32 export injector2.2.2.2/32multipath [switch 1 16:03:32 from 10.0.0.101] (100) [AS123i]via 10.0.0.1 on eth0 weight 1via 10.0.0.2 on eth0 weight 1

allow bgp local prefeBGPiBGPbird show route 2.2.2.2/32 all2.2.2.2/32via 10.0.0.2 on eth0 [switch 1] * (110) [AS123i]Type: BGP unicast univBGP.origin: IGPBGP.as path: 123BGP.next hop: 10.0.0.2BGP.local pref: 110BGP.originator id: 10.0.0.51BGP.cluster list: 10.0.0.101via 10.0.0.1 on eth0 [switch 0] (110) [AS123i]Type: BGP unicast univBGP.origin: IGPBGP.as path: 65100 65000 65000 65000 65000 123BGP.next hop: 10.0.0.1BGP.local pref: 110bird show route 2.2.2.2/32 export injector2.2.2.2/32multipath [switch 1 16:03:32 from 10.0.0.101] (100) [AS123i]via 10.0.0.1 on eth0 weight 1via 10.0.0.2 on eth0 weight 1

iBGP/eBGP “compatible” prefixeBGPiBGPbird show route 2.2.2.2/32 all2.2.2.2/32via 10.0.0.2 on eth0 [switch 1] * (110) [AS123i]Type: BGP unicast univBGP.origin: IGPBGP.as path: 123BGP.next hop: 10.0.0.2BGP.local pref: 110BGP.originator id: 10.0.0.51BGP.cluster list: 10.0.0.101via 10.0.0.1 on eth0 [switch 0] (110) [AS123i]Type: BGP unicast univBGP.origin: IGPBGP.as path: 65100 65000 65000 65000 65000 123BGP.next hop: 10.0.0.1BGP.local pref: 110bird show route 2.2.2.2/32 export injector2.2.2.2/32multipath [switch 1 16:03:32 from 10.0.0.101] (100) [AS123i]via 10.0.0.1 on eth0 weight 1via 10.0.0.2 on eth0 weight 1

201320152017OriginsEvolutionMigration

201320152017OriginsEvolutionMigration

Did you consider

Did you consider OpenflowNo commentsNo, seriously, no comments

Did you consider Segment RoutingCore ideas are greatWe borrowed some and we will probably borrow moreComplexityMany protocols doing many things, complexity better in the hostSolves half the problemHow does it integrate with the application/host?No open-source optionnot even sure vendors support it yet

Did you consider BGP-LUEgress Peer Engineering using BGP-LUdraft-gredler-idr-bgplu-epe-09similar idea but different implementationSolves half the problemHow does it integrate with the application/host?No OSS implementationrequires new AFrequires dealing with race conditionsUsing BGP-LU is probably the right thing to doto revisit in the future

Building your own control planeNo viable alternatives in 2013existing solutions were expensivedidn’t quite do what we wantedhardware constraints limited our optionsCustom control plane solution was criticalsaved money and timeattracted talentRevisited problem spaceaddressed issues in the previous architectureless hardware constraintshacked our way towards seamless migrationopen source was critical

Questions?

ml

10.0.0.1 AA:BB:CC:DD:EE:00 10.0.1.0 AA:BB:CC:DD:EE:11 FIB ARP. st-ping. Origins 2013 Evolution 2015 2017 Migration. Origins Migration 2013 Evolution 2015 2017. New Requirements Address the shortcomings of the previous architecture L2 hacks don't scale well