SEATTLE: A Scalable Ethernet Architecture For Large .

Transcription

TRC00349ACM(Typeset by SPi, Manila, Philippines) 1 of 35February 23, 201114:311SEATTLE: A Scalable Ethernet Architecture for Large EnterprisesCHANGHOON KIM, MicrosoftMATTHEW CAESAR, University of Illinois, Urbana-ChampaignJENNIFER REXFORD, Princeton UniversityIP networks today require massive effort to configure and manage. Ethernet is vastly simpler to manage, butdoes not scale beyond small local area networks. This article describes an alternative network architecturecalled SEATTLE that achieves the best of both worlds: The scalability of IP combined with the simplicityof Ethernet. SEATTLE provides plug-and-play functionality via flat addressing, while ensuring scalabilityand efficiency through shortest-path routing and hash-based resolution of host information. In contrast toprevious work on identity-based routing, SEATTLE ensures path predictability, controllability, and stability,thus simplifying key network-management operations, such as capacity planning, traffic engineering, andtroubleshooting. We performed a simulation study driven by real-world traffic traces and network topologies,and used Emulab to evaluate a prototype of our design based on the Click and XORP open-source routingplatforms. Our experiments show that SEATTLE efficiently handles network failures and host mobility,while reducing control overhead and state requirements by roughly two orders of magnitude compared withEthernet bridging.Categories and Subject Descriptors: C.2.1 [Computer-Communication Network]: Network Architecture and Design; C.2.2 [Computer-Communication Network]: Network Protocols; C.2.5 [ComputerCommunication Network]: Local and Wide-Area NetworksGeneral Terms: Design, Experimentation, ManagementAdditional Key Words and Phrases:ethernetEnterprise network, data-center network, routing, scalability,ACM Reference Format:Kim, C., Caesar, M., and Rexford, J. 2011. SEATTLE: A scalable Ethernet architecture for large enterprises.ACM Trans. Comput. Syst. 29, 1, Article 1 (February 2011), 35 pages.DOI 10.1145/1925109.1925110 http://doi.acm.org/10.1145/1925109.19251101. INTRODUCTIONEthernet stands as one of the most widely used networking technologies today. Dueto its simplicity and ease of configuration, many enterprise, access-provider, and datacenter networks utilize Ethernet as an elementary building block. Each host in anEthernet is assigned a persistent and unique MAC address, and Ethernet bridgesautomatically learn host addresses and locations. These “plug-and-play” semanticssimplify many critical aspects of network configuration. Meanwhile, flat addressingsimplifies the handling of both host-location and network-topology changes, obviatingthe need for network administrators to reassign addresses.Authors’ addresses: C. Kim, One Microsoft Way, Redmond, WA 98052; email: kim.changhoon@gmail.com; M.Caesar, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801;email: Caesar@ca.illinois.edu; J. Rexford, Department of Computer Science, Princeton University, 25 OldenStreet, Princeton, NJ 08540; email: jrex@cs.princeton.edu.Permission to make digital or hard copies of part or all of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or commercial advantage and thatcopies show this notice on the first page or initial screen of a display along with the full citation. Copyrightsfor components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any componentof this work in other works requires prior specific permission and/or a fee. Permissions may be requestedfrom Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax 1 (212)869-0481, or permissions@acm.org.c 2011 ACM 0734-2071/2011/02-ART1 10.00 DOI 10.1145/1925109.1925110 http://doi.acm.org/10.1145/1925109.1925110ACM Transactions on Computer Systems, Vol. 29, No. 1, Article 1, Publication date: February 2011.

TRC00349ACM(Typeset by SPi, Manila, Philippines) 2 of 35February 23, 20111:214:31C. Kim et al.However, Ethernet is facing revolutionary challenges. Today’s layer-2 networks arebeing built at an unprecedented size and with highly demanding requirements interms of efficiency, scalability, and availability. Large data centers are being built,comprising hundreds of thousands of physical and virtual machines within asingle facility [Arregoces and Portolani 2003; Barroso and Holzle 2009], and maintained by hundreds of network operators. To increase machine utilization, facilitate maintenance, and reduce operational cost, these data centers employ variousagility-enhancing mechanisms such as live virtual-machine migration, fast machinere-imaging (i.e., non-live machine migration), and dynamic adjustment of resourceshares (e.g., expanding or shrinking the size of a machine pool running a distributedapplication). All these agility mechanisms place additional requirements on handlingvery high rates of host and network churn—host arrival and departure, host addressand location changes, IP subnet and Ethernet Virtual LAN (VLAN) re-configuration,etc. In particular, cloud-service data centers face especially challenging requirements,as they must offer networking service for a large number of tenants (cloud-servicecustomers sharing the same data center) whose arrival and departure rates can be extremely high owing to the usage-based charging policy and the machine-independent,transient nature of jobs. Meanwhile, large metro Ethernet deployments easily contain over a million hosts and tens of thousands of bridges [Halabi 2003]. Ethernet isalso being increasingly deployed in highly dynamic environments, such as backhaulfor wireless campus networks, and as transport for developing regions [Hudson 2002].Ethernet becomes all the more important in these environments because it allowshosts to retain their IP addresses as long as they move within a single layer-2 domain(i.e., IP subnet). This property is highly useful for ensuring service continuity acrosshost-location changes as well as simplifying both network and host configuration related to policy enforcement (e.g., access control). Despite these benefits, conventionalEthernet has some critical limitations. First, Ethernet bridging relies on network-wideflooding to locate end hosts. This results in large overhead to disseminate and storehost state that grows with the size of the network. Second, Ethernet forces paths tocomprise a spanning tree. Spanning trees perform well for small networks that oftendo not have many redundant paths anyway, but introduce substantial inefficiencies onlarger networks that have more demanding requirements for low latency, high availability, and traffic engineering. Finally, critical bootstrapping protocols used frequentlyby end hosts, such as Address Resolution Protocol (ARP) and Dynamic Host Configuration Protocol (DHCP), rely on broadcasting. Not only does broadcasting waste usefulnetwork and end-host resources, doing so additionally introduces security vulnerabilities and privacy concerns.Network administrators sidestep Ethernet’s inefficiencies today by interconnectingsmall Ethernet LANs using routers running the Internet Protocol (IP). IP routing ensures efficient and flexible use of networking resources via shortest-path routing. Italso has control overhead and forwarding-table sizes that are proportional to the number of subnets (i.e., prefixes), rather than the number of hosts. However, introducingIP routing breaks many of the desirable properties of Ethernet. For example, networkadministrators must now subdivide their address space to assign IP prefixes acrossthe topology, and update these configurations when the network-design changes. Subnetting leads to wasted address space, and laborious configuration tasks. AlthoughDHCP automates host address configuration, maintaining consistency between DHCPservers and routers still remains challenging. Moreover, since IP addresses arenot persistent identifiers, ensuring service continuity across location changes (e.g.,due to virtual machine migration or physical mobility) becomes more challenging.Additionally, access-control policies must be specified based on the host’s current position, and updated when the host moves.ACM Transactions on Computer Systems, Vol. 29, No. 1, Article 1, Publication date: February 2011.

TRC00349ACM(Typeset by SPi, Manila, Philippines) 3 of 35SEATTLE: A Scalable Ethernet Architecture for Large EnterprisesFebruary 23, 201114:311:3Alternatively, operators may use VLANs, which allow administrators to build IPsubnets irrespective of hosts’ location. Hence, by provisioning each VLAN over a largefraction of—if not an entire—network, administrators can lower the overhead of address and access-control policy re-configuration due to host mobility. Unfortunately,however, having large VLANs counteracts the benefits of broadcast scoping, and worsens data-plane efficiency, as a larger spanning tree is used in each VLAN to forwardtraffic. In addition, properly determining the coverage of a VLAN (i.e., deciding whichbridges and links participate in a VLAN) requires precise knowledge about hosts’ communication and mobility patterns and thus is extremely hard to automate. Moreover, since hosts in different VLANs still require IP to communicate with one another,this architecture still inherits many of the challenges of IP mentioned above, such asaddress-space fragmentation.In this article, we address the following question: Is it possible to build a protocolthat maintains the same configuration-free properties as Ethernet bridging, yet scalesto large dynamic networks? To answer, we present a Scalable Ethernet Architecturefor Large Enterprises (SEATTLE). Specifically, SEATTLE offers the following features:A One-Hop, Network-Layer DHT. SEATTLE forwards packets based on end-hostMAC addresses. However, SEATTLE does not require each switch to maintain statefor every host, nor does it require network-wide floods to disseminate host locations.Instead, SEATTLE uses the global switch-level view provided by a link-state routingprotocol to form a one-hop DHT [Gupta et al. 2004], which stores the location of eachhost. We also use this network-layer DHT to build a flexible directory service which enables address resolution (e.g., storing the MAC address associated with an IP address)as well as convenient service discovery (e.g., maintaining DHCP server addresses, theleast loaded DNS server’s address, or a printer within the domain). In addition, toreduce lookup latency and enable fault isolation in a large network deployed over awide area, we present a hierarchical configuration of multiple regional DHTs.Traffic-Driven Location Resolution and Caching. To forward packets along shortestpaths and to avoid excessive load on the directory service, switches can cache responsesto queries. Caching routing information is a well studied topic especially for Internet routers which have to maintain a large amount of routing information [Jain andRouthier 1986; Jain 1990; Feldmeier 1988; Heimlich 1990; Partridge 1996; Partridgeet al. 1998; Kim et al. 2009]. The route-caching design we propose is particularly effective for the target operational environment we envision—enterprises, metro-area,and data-center networks. In these networks, many hosts typically communicateonly with a small number of other hosts (e.g., web, mail, or proxy servers) which arecommonly popular across the whole network [Aiello et al. 2005].1 Hence, SEATTLEswitches can achieve a very high cache-hit ratio by maintaining the information aboutthe small working set of destination hosts. Our route-caching design also employsa unique mechanism that addresses the key limitation of the earlier route-cachingwork, namely slow-path forwarding upon cache misses. Unlike this earlier design, a1 Special-purpose cluster or data-center networks that are predominantly used to run data-parallel distributed computing [Dean and Ghemawat 2004; Isard et al. 2007] or high-performance computing applicationsmight be an exception to this. In such a network, a host can communicate with a large number of other hostsin a short period of time. In fact, one of the primary goals of the job scheduler in such a system is avoiding askewed host-popularity distribution, making SEATTLE’s host-information caching less effective. Nonetheless, our SEATTLE design, even without host-information caching, can still ensure most of the core benefits(e.g., smaller forwarding tables, fewer control-message exchanges, zero flood) over conventional Ethernet,and we specifically demonstrate some of these benefits in Section 6. Furthermore, in data-center networkswhere latency and workload increase due to a longer stretch is tolerable, random traffic indirection via thenetwork-layer DHT can offer unique benefits, such as traffic-oblivious load spreading [Kodialam et al. 2004].ACM Transactions on Computer Systems, Vol. 29, No. 1, Article 1, Publication date: February 2011.

TRC00349ACM(Typeset by SPi, Manila, Philippines) 4 of 35February 23, 20111:414:31C. Kim et al.SEATTLE switch can simply forward a packet to another switch chosen by a simplehash function even when the packet causes a cache miss, completing the entire packetforwarding process on the fast path. Furthermore, SEATTLE also provides a way topiggyback location information on ARP replies, which eliminates the need for separatelocation resolution when forwarding data packets following the ARP resolution. Allthese mechanisms allow data packets to directly traverse the shortest path, makingthe network’s forwarding behavior predictable and stable as well as simplifying trafficengineering and network troubleshooting. This is one of the core benefits of SEATTLEcompared to conventional DHT-based networking systems.A Scalable, Prompt Cache-Updat

Meanwhile, large metro Ethernet deployments easily con-tain over a million hosts and tens of thousands of bridges [Halabi 2003]. Ethernet is also being increasingly deployed in highly dynamic environments, such as backhaul for wireless campus networks, and as transport for developing regions [Hudson 2002]. Ethernet becomes all the more important in these environments because it allows hosts to .