Transcription
Fault Tolerant ServiceFunction ChainingM. GHA ZNAV I, E. JALALPO U R, B . WO NG, R. B O U TABA , A . MASHTIZADEHUNIVERSITY OF WATERLOO
Middleboxes and Service Function ChainsFirewallIDSNATInternet1
formr twods ofmidy im-100Percent Contributionntrubalandhorte imppli, andn usebility,ortedMiddlebox Failures81High Severity IncidentsPopulation7550433625201120L2 SwitchesPercent ContributionuNavendu JainMicrosoft Researchnavendu@microsoft.com31L3 RoutersMiddleboxesOthers100Demystifying the dark side of the middle:75A field study of middlebox failures in datacenters5035IMC201342251190vityyr32
Middlebox Fault ToleranceAlice NATBobInternetNAT3
Consistent State ReplicationAliceBob BingAlice AppleNATBobInternetNATBob BingAlice Apple4
Consistent State ReplicationAliceBob BingAlice AppleNATBobInternetNATBob Bing5
Previous ApproachesEXTERNALIZED STATESNAPSHOT BASEDStatelessNF, NSDI 2017Pico Replication, SoCC 2013CHC, NSDI 2019FTMB, SIGCOMM 2015REINFORCE, CoNEXT 20186
Externalized State ApproachNATInternetRead/WriteFault Tolerant Data StoreState7
Snapshot Based ApproachesAliceBob BingAlice AppleNATBobInternetNATBob BingAlice ApplePrimary stateReplicated state8
Snapshot Based Approaches for a ChainFWFWHighIDSLatencyLow ThroughputIDPSNATInternetNATPrimary stateReplicated state9
Our ApproachFault TolerantFFirewallIF’IDSI’NNAT Primary stateReplicated state10
GoalsConsistent state replication to tolerate ! middlebox failuresMinimizing performance overhead during normal operationMinimizing disruption during middlebox failures11
Fault Tolerant Chaining (FTC)In-chain replication Replicates a chain’s state instead of the state of individual middleboxes Each middlebox’s state replicated to subsequent ! middlebox serversTransactional packet processing Simplifies the development of multi-threaded middleboxes Improves scalability and performanceData dependency vectors Enables concurrent state replication12
Normal Operationm1Forw.r1m21r22m32r33Buffer13
Normal Operationm1Forw.r1m211r22m32r33Buffer14
Normal Operationm1Forw.3r1m211r22m32r33Buffer15
Failure Recoverym1Forw.3r1 m211r22m32r33Bufferm22’r2Primary stateReplicated state16
Transactional Packet ProcessingExisting approaches Single thread or batched packet processing FTMB: multi threaded packet processing Tracking state changes in granularity of each state variable read/write Frequent periodic state snapshotsOur approach Packet transaction model for concurrent packet processing Using isolation property to tracking state changes in granularity of packet transactions17
Data Dependency VectorsTracking data changes instead of thread operationsEnabling different number of threads at the middlebox and replicas Fail over to smaller machine Scale up to a larger machineMiddlebox ProductIPSecWANOptimizerWAFThroughput CPU CoreHP VSR1001268 Mbps1HP VSR1008926 Mbps8STEELHEAD CCX770M 10 Mbps2STEELHEAD CCX1555M 50 Mbps4Barracuda Level 1100 Mbps1Barracuda Level 5200 Mbps218
Data Dependency Vectors ExampleMiddleboxW(1)1⟨0,x,x⟩2R(1), W(3)⟨1,x,4⟩Middlebox’s dependency vector:⟨0,3,4⟩Replica ⟨0,3,4⟩ ⟨0,x,x⟩4hold3⟨0,3,4⟩ ⟨1,x,4⟩?5⟨1,3,4⟩ ⟨1,x,4⟩ 1⟨0,x,x⟩⟨1,3,4⟩ 2 ⟨2,3,5 ⟩⟨1,x,4⟩Replica’s dependency vector:⟨0,3,4⟩4⟨1,3,4⟩ 5 ⟨2,3,5 ⟩19
EvaluationMETHODENVIRONMENTSComparing FTC with:A cluster of 12 serversNF, Non-Fault tolerant system Ideal performanceFTMB (SIGCOMM 2015) State logging SnapshotsFTMB Snapshot (SIGCOMM 2015) State logging Snapshots 40 Gbps networkSAVI Cloud environment A virtual network of OVS switchesMoonGen and pktGen traffic generators UDP traffic Packet size: 256 B20
Fault Tolerant NATsNFThroughput (Mpps)10FTC2x higherthroughputFTMB864201248Threads21
Fault Tolerant Chains – ThroughputThroughput (Mpps)10NFFTMB SnapshotFTMBFTC1.8x higher3.5xthroughput8639% drop dueto snapshots4202345Chain Length22
Fault Tolerant Chains – LatencyNFPackets 0Latency (µs)23
ConclusionKeep operation of a chain of middleboxes online after ! middleboxes fail In-chain replication Transactional packet processing Data dependency vectors24
Fault Tolerant Service Function Chaining M. GHAZNAVI, E. JALALPOUR, B