### Transcription

Network Routing and Network ProtocolsArvind KrishnamurthySpring 2004RoutingnnnRouting algorithms view thenetwork as a graphProblem: find lowest costpath between two nodesFactors:nnnnTwo main approaches:nnAStatic topologyDynamic loadPolicyLink state protocoln Each node builds a localcopy of the entire networkDistance-vector protocol61321BEF4C91D1

Distributed Bellman-Ford Start Conditions: Each router starts with a vector of distances to all directly attachednetworks Send step: Each router advertises its current vector to all neighboring routers Receive step: For every network X, router finds shortest distance to X Considers current distance to X Then takes into account distance to X from its neighbors Router updates its cost to X After doing this for all X, router goes to send stepExample - Initial Distances1BC78A122EDDistance to NodeInfo atNodeABCDEA07 1B701 8C 102 D 202E18 202

E receives D’s routes; Updates Costs1BC78A122DEDistance to NodeInfo atNodeABCDEA07 1B701 8C 102 D 202E18420Final Distances1BC78A122EDDistance to NodeInfo atNodeABCDEA06531B60135C51024D33202E154203

ComplexitynHow many steps does it take to converge?nWhat is the message complexity of the algorithm?nHow does this compare to link state routing protocol?The Bouncing EffectdestBCcost12dest cost1XABAC11125Cdest costAB214

C Sends Routes to BdestBCcost12dest costABAC 1125Cdest costAB21B Updates Distance to AdestBCcost12dest costABAC31125Cdest costAB215

B Sends Routes to CdestBCdest costcost12ABAC31125Cdest costAB41C Sends Routes to BdestBCcost12dest costABAC51125Cdest costAB416

SolutionsnProblems arise:nnn“Solutions”:nnnnWhen metric increasesImplicit path has loopsIf metric increases, delay propagating informationn Adversely affects convergenceSplit horizon: C does not advertise route to BPoisoned reverse: C advertises route to B with infinite distanceWorks for two node loopsnDoes not work for loops with more nodesExample Where Split Horizon Failsn1nBAWhen link breaks, C marks D asunreachable and reports that to A and BSuppose A learns it firstnn11CX1nnnA now thinks best path to D is through BA reports D unreachable to B and aroute of cost 3 to CC thinks D is reachable through A at cost4 and reports that to BB reports a cost 5 to A who reports newcost to Cetc.D7

Solution: Enhanced Distance VectornnEach routing update carries the entire pathLoops are detected as follows:nnWhen node gets route check if node is already in pathn If yes, reject routen If no, add self and (possibly) advertise route furtherAdvantage:nMetrics are local - node chooses path, protocol ensures no loopsBorder Gateway Protocol (BGP)nnnnnnDesigned for scalabilityGranularity is at the level of “autonomous systems” (ASs)Usual BGP table has a few thousand entriesEach entries contains the entire AS-path for getting to adestinationUses simple hop-count metric – does not propagateinformation about bandwidth or congestion in the systemSome problems:nASes do not necessarily convey packets through shortest pathsn Some adopt “early exit” strategy – get rid of packet as soon aspossiblen Some send packets only through other ASes with which theyhave contractual agreements8

Networking Software GoalsnSimplenScalabilitynPredict what will happen in the future: everything will have a networkaddressnHeterogeneity (not a goal – but have to support it)nRobustness: failure, structural changesnnSomething is changing; not a clean rebootPerformance:nnnLatency: minimum cost (or amount of work to get nothing done!)n Measured in timeBandwidth: incremental cost; measured in bytes/secondLatency more important than bandwidthn Most common mistake in systems is to ignore latencyIssues (Problems to solve)nLink transmission: how do you get a packet of data from one machineto another machine “connected” to itnRoutingnNaming: mapping names to network addressesnMultiplexing (how do you share resources, protocols)nReliable delivery (cannot guarantee that every packet will be delivered)[ack, timeout, retransmit]nnDuplicate packetsSequencing (process packets in the same order as it was sent; oneapproach is to have only packet outstanding)9

Issues (contd.)nFragmentation & reassemblynFlow controlnnnCongestion controlnnnnRelated to flow control; similar in many waysThere is more than the sender & receiverProblem gets rediscovered every once in a while!PresentationnnSender generating data faster than the receiver can handleFeedback required from receiver to senderEndian-ness, floating point formatSecurity (authentication)Solution: Layered ProtocolsnCollection of protocolsnnnStacked togetherEach solves one of the problemsProtocol has three interfaces:nnnProvides service to higher levels of the protocol stackDepends on some lower transport protocolHas a peer-to-peer interface10

Simple File TransfernCopy file to remote machineSend( fname,hostname )fname, useridokblock1ackSend( packet,hostname )Recv( hostname,buffer )block2ackProtocol 1111

Internet Protocol (IP)nDatagram protocol (as opposed to stream protocol)nnnnnNo sequencingStatelessUnreliableHost-to-host (not program-to-program)IP Functions:nnAddressing and routing (not naming)n Does not know about namesn Understands addressesn Uses route information computed by some other entityFragmentation (controversial functionality)n Other option: let network layer take care of fragmentationFragmentationnIf a network has a small packet size, two approaches:nnTransparent approach at the network levelIP fragments:n Packet stays fragmented till it reaches destinationn Reassembled at destinationn Makes it not stateless!n Destination needs to wait for all the fragments to dribble innnKeeps track of a partial datagram, and a map of useful partsPacket needs to have a:host-id (32 bits), datagram id (16 bits), position (16 bits), lengthnIP approach vs. network layer fragmentation/reassemblynQuestion: which is better?12

“Time-to-live”nField on an IP packet header:nnnnn8 bit header (255 secs or ticks)Every router/gateway forwards a packet, it subtracts at least 1 tickWhen it gets to zero, packet is trashedPrevents packets from roaming around for everQuestion: what are the implications of time-to-live?Features and LimitationsnIP packet headers are variable length:nnnRoute that a packet takes can be recordedSource routing: specify the route from the sourceWhat are the IP limits?nnn32 bits of addressReliability: requires to get to destination in one shotSpeed limitations?13

Transmission Control ProtocolnnnnConnection orientedEnd-to-end reliableFlow controlledCongestion controlledopenclosewritepushreadSend packetRecv packetOverall FeaturesnReliablennnnFlow controlnnnSequence numbers (per byte basis)AcknowledgementsTimeout/retransmit“sliding window protocol”Purpose: pipeline communication through overlapMultiplexingnSeveral connections to be open (sockets: host, port number)nConnection-based: state kept at both endsnOut of band data: “urgent”14

Reliable Message DeliverynAll of these networks can garble, drop messagesnnnnPhysical media can garble packets or have interferenceCongestion: too many packets at an intermediate nodeDestination cannot receive packets as fast as the senderWhat can we do?nnDetect garbling using checksumsReceiver ack’s if received properly and timeout at sendern If ack gets dropped, sender retransmitsn Put sequence number in message to identify retransmissionsnnnnRequires sender to keep copy of all packets sentReceiver must keep track of message ids that could be a duplicate (Whencan receiver know it’s ok to forget?)Destination controls window to indicate its willingness to receive messagesSolutions:nnAlternating bit protocolWindow based protocol (TCP)Alternating Bit ProtocolDestSourcenmsg, #0nack, #0msg, #1nack, #1nmsg, #2ack, #2nnSend one message at atimeDon’t send next messageuntil ack receivedReceiver keeps track ofsequence # of lastmessage receivedSimpleSmall overheadPoor performance:Bandwidth packet size/RTT15

TCP Flow ControlUserUserSend 3 bytesBuffer: 3nAssume:nnnReceiver window size 8KTCP minimum threshold for sending 1KInitial sequence number 3000TCP Flow Control 2UserUserSend 997 bytesBuffer: 0Seq: 3000, Size: 1KUserBuffer: 0nUserAck: 4000Buffer: 1KAcknowledge message: “expecting byte # x”17

TCP Flow Control 3UserUserSend 4K bytesBuffer: 1KSeq: 4000, Size: 3KUserBuffer: 1KnBuffer: 1KUserAck: 7000Buffer: 4KSend only up to the receiver windowTCP Flow Control 4UserUserRecv 2K bytesBuffer: 1KWindow: 2KUserBuffer: 0KnBuffer: 2KUserSeq: 7000, Size: 1KBuffer: 3KReceiver sends a window update when user picks up data18

TCP Flow Control 5UserUserRecv 3K bytesBuffer: 0KWindow: 3KUser3 x Send 1K bytesBuffer: 0KBuffer: 0KUserSeq: 8000, Size: 1KSeq: 9000, Size: 1KBuffer: 3KSeq: 10000, Size: 1KTCP Flow Control 6UserUserRecv 1K bytesBuffer: 0KAck: 9000, Wdw: 1KUserBuffer: 1KUserSeq: 9000, Size: 1KBuffer: 0KBuffer: 2KAck: 1100019

Congestion ControlnWindow size controls flow and congestionnnnReceiver advertised window is maximum amount of data that canbe outstandingHave a smaller window if there is congestion in the systemCanonical congestion problem:nnFlow between A-B uses up link capacityFlow between C-D starts, resulting in congestion on the linkABCDIssuesnFlows need to find a fair use of link resourcesnnFlows need to distinguish packet losses from packet delaysnnWhen a flow starts, it needs to find what is available reasonablyfast and under different network capacitiesSpurious detection of packet loss results in more traffic, morecongestion, more delays, and so onFlows need to adapt to changing network conditionsnSometimes increase its utilization, sometimes lower its utilization20

Finding Equilibrium from StartupnTwo features:nnnSelf-clocking mechanism“Slow start” mechanism – actually ramps up rather fast!Self-clocking:nnSend a new packet only when a previous packet is acknowledgedSoon packets are sent at the rate they are receivedPbPrSenderReceiverAsAbArSlow Start MechanismnInitially set cwnd to be 1nnMaintain the invariant that cwnd window given by receiverIncrement cwnd by 1 for every acknowledgementOne RTT0R1One pkt time1R1232R23453R4675896101171213141521

Accurate Round Trip Time EstimatesnHow long should timeout be?nnnnnnnToo long? Wastes timeToo short? Retransmits even though message is not lostMaintain running estimate of “R”R (1-α)*R α * Mwhere M is new measurement, α is decay constantHigh α makes it unstableLow α makes the system have too much historyAlso measure the error or variance in measurementsSet timeout to be R 4*varianceCongestion Avoidance AlgorithmnnReact to changing network conditions by modifying cwndAt loss: (multiplicative decrease)cwnd cwnd / 2Better to have a drastic decrease when losses occurnAfter loss: (additive increase)cwnd 1/cwndResults in slow increase; probes for available bandwidthBetter to have a conservative increase policycwnd22

AnnouncementsnnnAssignment 4 has been postedWednesday reading: “Authentication in distributedsystems”Friday reading: “Andrew File System”nBackground reading: “Implementing Remote Procedure Calls” and“Sun’s Network File Systems”Congestion Control at RoutersnnRouter queues can fill upWhen they fill up?nnnnWhat to drop?When to drop?Random Early Detection (RED) algorithm: userandomizationRouter can be in one of three states:nnnFew packets in the queues: do not drop any packets (normaloperating phase)Lots of packets in the queues: drop for sure (congestion phase)Intermediate number of packets: calculate probability for droppingbased on queue length and number of packets since last drop(congestion avoidance phase)23

RED Drop ProbabilitynnVoodoo constants: minqThresh, maxqThresh, maxpStep 1:p maxp * (avgQlen – minqThresh)/(maxqThresh – minqThresh)p maxpavgQlen is calculated as a weighted average over timepnavgQlenStep 2:Drop probability p / (1 – count*p)count is number of packets since last dropTry to avoid cascading dropsConnectionsnnRequires three-way handshakesSetup:nnnnOpen request packet (SYN, initial sequence number)Acknowledgement (SYN, own sequence number, ack number)Acknowledgement of the acknowledgementSYN occupies 1 byte of sequence spaceBASYN, #10SYN, #100, ACK-#10ACK-#10024

Failure ScenariosnnnCannot reuse sequence number if there are some old live datan Keep track of previous recent connectionsWhat if machines go up and down?n Wait for a while when machine rebootsn Let old packets dieWhat if connection packets get lost?n Timeout and retransmitn But initially very conservative estimate of RTTConnection Tear DownnABKeep connection statearound for some more timenFIN, #100nACK-#100nnFIN, #1000ACK-#1000nFIN occupies 1 byte insequence spaceConnection state is lastbyte received in sequenceTypically kept for 2*RTTdurationNo clean solution as towhen state can beforgottenA distributed consensusproblem25

General’s ProblemnnTwo generals on separate mountainsCan communicate only via messengersnnNeed to coordinate an attacknnnMessengers can be capturedIf they attack at the same time, they winElse they will all dieDevise a protocol to coordinate the two gen

Network Routing and Network Protocols Arvind Krishnamurthy Spring 2004 Routing 4 3 6 2 1 9 1 1 n Routing algorithms view the network as a graph n Problem: find lowest cost path between two nodes n Factors: n Static topology n Dynamic load n Policy n Two main approaches: n Link state protocol n Each node builds a local copy of the entire network n Distance-vector protocol D A F E B C. 2 .