BGP Scaling Techniques

Transcription

BGP Scaling TechniquesISP WorkshopsThese materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International 4.0/)Last updated 26th April 20221

AcknowledgementspThis material originated from the Cisco ISP/IXP WorkshopProgramme developed by Philip Smith & Barry GreenepUse of these materials is encouraged as long as the source is fullyacknowledged and this notice remains in placepBug fixes and improvements are welcomednPlease email workshop (at) bgp4all.comPhilip Smith2

BGP VideospNSRC has made a video recording of this presentation, as part of alibrary of BGP videos for the whole community to use:nhttps://learn.nsrc.org/bgp#bgp scaling techniques3

BGP Scaling TechniquespOriginal BGP specification and implementation was finefor the Internet of the early 1990snpBut didn’t scaleIssues as the Internet grew included:nnnScaling the IBGP mesh beyond a few peers?Implement new policy without causing flaps and route churning?Keep the network stable, scalable, as well as simple?4

BGP Scaling TechniquespBGP Configuration ScalingnpIndustry Best Practice Scaling TechniquesnnpGrouping BGP peersRoute RefreshRoute ReflectorsHistorical Scaling TechniquesnnnSoft ReconfigurationConfederationsRoute Flap Damping5

BGP Configuration ScalingCisco’s peer-groups&Juniper’s BGP groups6

Grouping similar BGP peerspWhat are they for?nnnnnnLets operators group peers with the same outbound policyMakes configuration easierMakes configuration less prone to errorMakes configuration more readableMembers can have different inbound policyCan be used for EBGP neighbours too!7

Grouping similar BGP peerspCisco:npeer-groupspnInternal code optimisation called update-groupsppOriginally designed to speed IBGP convergence – now for scaling BGPconfiguration managementSpeeds IBGP convergence; update only calculated once for neighbourswith the same outbound policyJuniper:nBGP groups8

Configuring a Peer Group in IOSrouter bgp 64500address-family ipv4neighbor IBGP peer-groupneighbor IBGP remote-as 64500neighbor IBGP update-source loopback 0neighbor IBGP send-communityneighbor IBGP route-map outfilter outneighbor 100.64.0.1 peer-group IBGPneighbor 100.64.0.2 peer-group IBGPneighbor 100.64.0.2 route-map infilter inneighbor 100.64.0.3 peer-group IBGP!pNote how 10.0.0.2 has an additional inbound filter over the peergroup9

Configuring a Peer Group in IOSrouter bgp 64500address-family ipv4neighbor EBGP peer-groupneighbor EBGP send-communityneighbor EBGP route-map set-metric outneighbor 100.89.1.2 remote-as 64502neighbor 100.89.1.2 peer-group EBGPneighbor 100.89.1.4 remote-as 64503neighbor 100.89.1.4 peer-group EBGPneighbor 100.89.1.6 remote-as 64504neighbor 100.89.1.6 peer-group EBGPneighbor 100.89.1.6 filter-list infilter in!pCan be used for EBGP as well10

Peer GroupspPeer-groups are considered obsolete by Cisco:nReplaced by update-groups (internal coding – not configurable)But are still considered best practice by many networkoperatorsp Cisco introduced peer-templatespnA much enhanced version of peer-groups, allowing morecomplex constructs11

Cisco’s update-groups (1)pUpdate-groups is an internal IOS coding, taking over theperformance gains introduce by peer-groupsRouter1#sh ip bgp 10.0.0.0/26BGP routing table entry for 10.0.0.0/26, version 2Paths: (1 available, best #1, table default)Advertised to update-groups:1Refresh Epoch 1Local0.0.0.0 from 0.0.0.0 (10.0.15.241)Origin IGP, metric 0, localpref 100, weight 32768, valid.pThe “show” command indicates the prefix is handled byupdate-group #112

Cisco’s update-groups (2)pThe update group itself lists all the peers which get thesame (identical) update:Router1#sh ip bgp update-group 1BGP version 4 update-group 1, internal, Address Family: IPv4 UnicastBGP Update version : 16/0, messages 0Topology: global, highest version: 16, tail marker: 16Format state: Current working (OK, last not in list)Refresh blocked (not in list, last not in list)Update messages formatted 11, replicated 13, current 0, refresh 0, limit 1000Number of NLRIs in the update sent: max 2, min 0Minimum time between advertisement runs is 0 secondsHas 13 4pAnd this group has 13 members13

Peer GroupspAlways configure peer-groups for IBGPnnnpEven if there are only a few IBGP peersEasier to scale network in the futureMakes configuration easier to readConsider using peer-groups for EBGPnnEspecially useful for multiple BGP customers using same AS(RFC2270)Also useful at Exchange Points:ppWhere ISP policy is generally the same to each peerFor Route Server where all peers receive the same routing updates14

Juniper BGP groupspJunOS has very similar configuration conceptnSimply known as bgp groups, for example:protocols {bgp {group ibgp {type internal;local-address 10.0.15.241;family inet {unicast;}export export-ibgp;peer-as 10;neighbor 10.0.15.242 {description ”Router 2";}neighbor 10.0.15.243 {description ”Router 3";}.etc.}}}15

Dynamic ReconfigurationNon-destructive policy changes16

Route Refresh: HistorypHistorically, routers only stored prefixes which wereaccepted by incoming policynnpIf a change of incoming policy was required:nnpThose rejected by policy were discardedNo storage of discard prefixesThe EBGP session had to be shutdown, and then brought up againDestructive change: EBGP session down means lost connectivity to thatpeer, and potentially the rest of the Internet (outage of many minutes!)Changes in BGP policy usually had to be carried out duringpublished scheduled maintenance timeslotsnTo minimise impact on end-users17

Route Refresh: Step OnepFirst step at solving this problem was by Cisco with the“soft reconfiguration” conceptnnRouter keeps a record of all prefixes received before any policyapplied (known as Adj-RIB-In)Needed extra memory (highly problematic in early routers andmodern routers with limited memory)pnnFull BGP table with policy change could require double the control planememory for BGPPolicy changes applied to the stored received prefixesNo shutdown and restart of the BGP session needed whenimplementing policy changes18

Cisco’s Soft ReconfigurationpeernormalsoftBGP inTable(Adj-RIB-In)receivedpeerBGP inprocessreceivedand useddiscardedacceptedBGPtableBGP outprocess19

Route Refresh: Step TwopSecond step at solving this problem was the introduction of“route refresh”nnnnA BGP Capability: RFC2918Peering remains activeImpacts only those prefixes affected by the policy changeNo configuration neededppnAutomatically negotiated at peer establishmentNo extra memory needed (no need for Adj-Rib-In)Tell peer to resend full BGP announcementclear ip bgp x.x.x.x [soft] innResend full BGP announcement to peerclear ip bgp x.x.x.x [soft] out20

Route RefreshpUse Route Refresh capability, not hard resetnnnpSupported on virtually all BGP implementationsFind out from “show ip bgp neighbor”Non-disruptive, “Good For the Internet”Only hard-reset a BGP peering as a last resortConsider the impact to beequivalent to a router reboot21

Route Refresh: Route Origin ValidationpRoute Origin Validation means checking if the prefixreceived has a valid ROAnnnpRouters implementing ROV apply the validation results viathe existing policy language & processnpRoute Origination Authorisation – digital object indicating the origin ASfor the prefix (and subnet size) using RPKIValid ROA means that the prefix (and subnet) is being originated fromthe correct origin ASSee the “BGP Origin Validation” presentation for more in-depth contentValid – allow; Invalid – drop; NotFound – allow (at lower preference?)Problem: how is incoming policy applied on routers today?22

Route Refresh: Route Origin ValidationpRouters which maintain the Adj-RIB-In:nnnApply the ROV policy to the stored received BGP tableUpdates are applied “automatically” to the BGP table andtherefore the FIBNo impact on any BGP peers (Route Refresh not needed)23

Route Refresh: Route Origin ValidationpRouters which do NOT maintain the Adj-RIB-In:nnApply the ROV policy by sending a Route Refresh to peersWhen there are a large number of ROAs (April 2022 sees over 275k IPv4and 167k IPv6), and frequent changes or updates of ROAs:pppnRouters are sending frequent Route Refresh requests to peers (typically everyfew minutes)Peers are being “bombarded” by Route Refresh requests: significant resourceburden when they send the full or a large portion of the BGP tableSevere control plane CPU impact on the peer router (effectively a Denial ofService on the peer router)As more and more ROAs are created and altered globally, this problembecomes significantly more serious!24

Route Refresh: Route Origin ValidationpJunOS implements Adj-RIB-In by defaultnpROA updates do not cause a problem when operating ROVCisco does not implement Adj-RIB-In by default:nnnApplies to all of Cisco IOS/IOS-XE/IOS-XR MUST turn on soft-reconfiguration if running ROV on therouterSoft-reconfiguration is similar concept to Adj-RIB-In25

Enabling Cisco’s Soft Reconfigurationrouter bgp 64510address-family ipv4neighbor 100.64.1.1 remote-as 64511neighbor 100.64.1.1 route-map infilter inneighbor 100.64.1.1 soft-reconfiguration inboundpWhen the policy needs to be changed:clear ip bgp 100.64.1.1 soft inpNote:nnWhen “soft-reconfiguration” is enabled, there is no access to the routerefresh capability CLIclear ip bgp 100.64.1.1 in also does a soft refresh26

Using Cisco’s Soft-ReconfigurationpStrongly recommended when deploying Route OriginValidationpOperators will also use soft-reconfiguration when troubleshooting EBGP peer problemsnnSoft reconfiguration enabled on an EBGP session means that theoperator can see which prefixes were sent by a neighbourbefore any policy is appliedThis helps saves arguments between operators about whoseBGP filters may have configuration errors!27

Route ReflectorsScaling the IBGP mesh28

Scaling the IBGP meshpAvoid ½n(n-1) IBGP meshn 1000 Þ nearlyhalf a millionIBGP sessions!p14 routers 91IBGP sessionsTwo solutionsnnRoute reflector: simpler to deploy and runBGP Confederation: more complex, has corner case advantages29

Route Reflector: PrincipleAAS 64500BC30

Route Reflector: PrincipleRoute ReflectorAAS 64500BC31

Route Reflector: RulesppppppReflector receives path fromclients and non-clientsSelects best pathIf best path is fromclient, reflect to other clientsand non-clientsIf best path is fromnon-client, reflect to clientsonlyNon-meshed clientsDescribed in RFC4456ClientsABReflectorsCAS 6450032

Route Reflector: TopologyDivide the backbone into multiple clustersp Provision at least one Route Reflector (RR) and fewclients per clusterp Route reflectors are fully meshedp Clients in a cluster could be fully meshedp Single IGP still carries next-hop and any local routesp33

Route Reflector: Loop AvoidancepOriginator ID attributenpCarries the RID of the originator of the route in the local AS(created by the RR)Cluster list attributennnThe local cluster-id is added when the update is sent by the RRCluster-id is router-id by default (usually the address ofloopback interface)Do NOT use bgp cluster-id x.x.x.x unless the two routereflectors are physically/directly connected34

Route Reflector: RedundancypMultiple RRs can be configured in the same cluster – notadvised!npAll RRs in the cluster must have the same cluster-id (otherwiseit is a different cluster)A router may be a client of RRs in different clustersnnCommon today in ISP networks to overlay two clusters –redundancy achieved that way Each client has two RRs redundancy35

Route Reflector: RedundancyPoP3AS 64500PoP1PoP2Cluster OneCluster Two36

Route Reflector: BenefitsSolves IBGP mesh problemp Packet forwarding is not affectedp Normal BGP speakers co-existp Multiple reflectors for redundancyp Easy migrationp Multiple levels of route reflectorsp37

Route Reflector: DeploymentpWhere to place the route reflectors?nnpAlways follow the physical topology!This will guarantee that the packet forwarding won’t be affectedTypical Service Provider network:nnnPoP has two core routersCore routers are RR for the PoPTwo overlaid clusters38

Route Reflector: MigrationpTypical ISP network:nnCore routers have fully meshed IBGPCreate further hierarchy if core mesh too bigppSplit backbone into regionsConfigure one cluster pair at a timennnEliminate redundant IBGP sessionsPlace maximum one RR per clusterEasy migration, multiple levels39

Route Reflector: MigrationAS 64502ACBAS 64500AS 64501pEDFGMigrate small parts of the network, one part at a time.40

Route Reflector: Cisco IOS ConfigurationpRouter D configuration:router bgp 64500address-family ipv4.neighbor 100.64.3.4neighbor 100.64.3.4neighbor 100.64.3.5neighbor 100.64.3.5neighbor 100.64.3.6neighbor 100.64.3.6.remote-as 64500route-reflector-clientremote-as 64500route-reflector-clientremote-as 64500route-reflector-client41

BGP Scaling TechniquespThese two standards-based techniques must be designedin from the beginning for all network operatorinfrastructure1.2.Route RefreshRoute Reflectors42

BGP Confederations43

ConfederationspDivide the AS into sub-ASnEBGP between sub-AS, but some IBGP information is keptppPreserve NEXT HOP across the sub-AS (IGP carries this information)Preserve LOCAL PREF and MEDUsually a single IGPp Described in RFC5065p44

ConfederationspVisible to outside world as single AS – “ConfederationIdentifier”npEach sub-AS uses a number from the private space (6451265534)IBGP speakers in sub-AS are fully meshednnThe total number of neighbors is reduced by limiting the fullmesh requirement to only the peers in the sub-ASCan also use Route-Reflector within sub-AS45

ConfederationsSub-AS65530ASub-AS65532CpAS 200Sub-AS65531BConfiguration (Router C):router bgp 65532bgp confederation identifier 200bgp confederation peers 65530 65531neighbor 141.153.12.1 remote-as 65530neighbor 141.153.17.2 remote-as 6553146

Confederations: Next BCSub-AS65001DEAS 200Confederation10047

Confederations: PrincipleLocal preference and MED influence path selectionp Preserve local preference and MED across sub-ASboundaryp Sub-AS EBGP path administrative distancep48

Confederations: Loop AvoidanceSub-AS traversed are carried as part of AS-pathp AS-sequence and AS path lengthp Confederation boundaryp AS-sequence should be skipped during MED comparisonp49

Confederations: 16(65004 65002) 200B180.10.0.0/16(65002) -AS65001Confederation10020050

Route Propagation DecisionspSame as with “normal” BGP:nnpFrom peer in same sub-AS only to external peersFrom external peers to all neighbors“External peers” refers tonnPeers outside the confederationPeers in a different sub-ASpPreserve LOCAL PREF, MED and NEXT HOP51

Confederations (cont.)pExample (cont.):BGP table version is 78, local router ID is 141.153.17.1Status codes: s suppressed, d damped, h history, * valid, best, i - internalOrigin codes: i - IGP, e - EGP, ? – incompleteNetwork* 10.0.0.0* 141.153.0.0* 144.10.0.0* 199.10.10.0Next HopMetric LocPrf Weight Path141.153.14.301000(65531) 1 i141.153.30.201000(65530) i141.153.12.101000(65530) i141.153.29.201000(65530) 1 i52

More points about confederationspCan ease “absorbing” other ISPs into your ISPnnpe.g., if one ISP buys another(can use local-as feature to do a similar thing)You can use route-reflectors with confederation sub-ASto reduce the sub-AS IBGP mesh53

Confederations: BenefitsSolves IBGP mesh problemp Packet forwarding not affectedp Can be used with route reflectorsp Policies could be applied to route traffic between subAses if requiredp54

Confederations: CaveatsMinimal number of sub-ASp Sub-AS hierarchyp Minimal inter-connectivity between sub-ASesp Path diversityp Difficult migrationpnnBGP reconfigured into sub-ASMust be applied across the network55

RRs or Confederations ywhere inthe networkYesYesMediumMedium to HighRoute ReflectorsAnywhere inthe networkYesYesVery HighVery LowNew network operators deploy RouteReflectors from Day One56

Route Flap DampingNetwork Stability for the 1990sNetwork Instability for the 21st Century!57

Route Flap DampingFor many years, Route Flap Damping was a stronglyrecommended practicep Now it is strongly discouraged as it causes far greaternetwork instability than it curesp But first, the theory p58

Route Flap DampingpRoute flapnGoing up and down of path or change in attributeppnnpBGP WITHDRAW followed by UPDATE 1 flapEBGP neighbour going down/up is NOT a flapRipples through the entire InternetWastes CPUDamping aims to reduce scope of route flap propagation59

Route Flap Damping (continued)pRequirementsnnnnpFast convergence for normal route changesHistory predicts future behaviourSuppress oscillating routesAdvertise stable routesImplementation described in RFC 243960

OperationpAdd penalty (1000) for each flapnpExponentially decay penaltynpHalf life determines decay ratePenalty above suppress-limitnpChange in attribute gets penalty of 500Do not advertise route to BGP peersPenalty decayed below reuse-limitnnRe-advertise route to BGP peersPenalty reset to zero when it is half of reuse-limit61

Operation40003000PenaltySuppress limitPenalty2000Reuse limit100000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25TimeNetworkAnnouncedNetworkNot AnnouncedNetworkRe-announced62

OperationOnly applied to inbound announcements from EBGP peersp Alternate paths still usablep Controlled by:pnnnnHalf-life (default 15 minutes)reuse-limit (default 750)suppress-limit (default 2000)maximum suppress time (default 60 minutes)63

ConfigurationpFixed dampingrouter bgp 100bgp dampening [ half-life reuse-value suppress-penalty max suppress time ]pSelective and variable dampingbgp dampening [route-map name ]route-map name permit 10match ip address prefix-list FLAP-LISTset dampening [ half-life reuse-value suppress-penalty max suppress time ]ip prefix-list FLAP-LIST permit 192.0.2.0/24 le 3264

OperationCare required when setting parametersp Penalty must be less than reuse-limit at the maximumsuppress timep Maximum suppress time and half life must allow penaltyto be larger than suppress limitp65

ConfigurationpExamples – ûnbgp dampening 15 500 2500 30ppreuse-limit of 500 means maximum possible penalty is 2000 – no prefixessuppressed as penalty cannot exceed suppress-limitExamples – ünbgp dampening 15 750 3000 45preuse-limit of 750 means maximum possible penalty is 6000 – suppresslimit is easily reached66

Maths!pMaximum value of penalty ispAlways make sure that suppress-limit is LESS than maxpenalty otherwise there will be no route damping67

Route Flap Damping HistoryFirst implementations on the Internet by 1995p Vendor defaults too severepnnnRIPE Routing Working Group recommendations in ripe-178,ripe-210, and ripe-229http://www.ripe.net/ripe/docsBut many ISPs simply switched on the vendors’ default valueswithout thinking68

Serious Problems:p“Route Flap Damping Exacerbates Internet Routing Convergence”np“What is the sound of one route flapping?”nppZhuoqing Morley Mao, Ramesh Govindan, George Varghese & Randy H. Katz,August 2002Tim Griffin, June 2002Various work on routing convergence by Craig Labovitz and AbhaAhuja a few years ago“Happy Packets”nClosely related work by Randy Bush et al69

Problem 1:pOne path flaps:nnnBGP speakers pick next best path, announce to all peers, flapcounter incrementedThose peers see change in best path, flap counter incrementedAfter a few hops, peers see multiple changes simply caused by asingle flap prefix is suppressed70

Problem 2:pDifferent BGP implementations have different transit timefor prefixesnnpSome hold onto prefix for some time before advertisingOthers advertise immediatelyRace to the finish line causes appearance of flapping,caused by a simple announcement or path change prefix is suppressed71

Solution:pMisconfigured Route Flap Damping will seriously impact access to:nnpandMore background contained in RIPE Routing Working Groupdocument:npYour networkThe ions now in:nwww.rfc-editor.org/rfc/rfc7196.txt and www.ripe.net/ripe/docs/ripe-58072

BGP Scaling TechniquesISP Workshops73

nA BGP Capability: RFC2918 nPeering remains active nImpacts only those prefixes affected by the policy change nNo configuration needed pAutomatically negotiated at peer establishment pNo extra memory needed (no need for Adj-Rib-In) nTell peer to resend full BGP announcement nResend full BGP announcement to peer 20 clear ipbgpx.x.x.x[soft] in