P4: Programming Protocol-Independent Packet Processors - SIGCOMM

Transcription

P4: Programming Protocol-IndependentPacket ProcessorsPat Bosshart† , Dan Daly* , Glen Gibb† , Martin Izzard† , Nick McKeown‡ , Jennifer Rexford** ,Cole Schlesinger** , Dan Talayco† , Amin Vahdat¶ , George Varghese§ , David Walker**†Barefoot Networks*Intel‡Stanford University**Princeton UniversityABSTRACTGoogle§Microsoft Researchmultiple stages of rule tables, to allow switches to exposemore of their capabilities to the controller.The proliferation of new header fields shows no signs ofstopping. For example, data-center network operators increasingly want to apply new forms of packet encapsulation (e.g., NVGRE, VXLAN, and STT), for which they resort to deploying software switches that are easier to extendwith new functionality. Rather than repeatedly extendingthe OpenFlow specification, we argue that future switchesshould support flexible mechanisms for parsing packets andmatching header fields, allowing controller applications toleverage these capabilities through a common, open interface (i.e., a new “OpenFlow 2.0” API). Such a general, extensible approach would be simpler, more elegant, and morefuture-proof than today’s OpenFlow 1.x standard.P4 is a high-level language for programming protocol-independent packet processors. P4 works in conjunction withSDN control protocols like OpenFlow. In its current form,OpenFlow explicitly specifies protocol headers on which itoperates. This set has grown from 12 to 41 fields in a fewyears, increasing the complexity of the specification whilestill not providing the flexibility to add new headers. In thispaper we propose P4 as a strawman proposal for how OpenFlow should evolve in the future. We have three goals: (1)Reconfigurability in the field: Programmers should be ableto change the way switches process packets once they aredeployed. (2) Protocol independence: Switches should notbe tied to any specific network protocols. (3) Target independence: Programmers should be able to describe packetprocessing functionality independently of the specifics of theunderlying hardware. As an example, we describe how touse P4 to configure a switch to add a new hierarchical label.1.¶INTRODUCTIONSoftware-Defined Networking (SDN) gives operators programmatic control over their networks. In SDN, the control plane is physically separate from the forwarding plane,and one control plane controls multiple forwarding devices.While forwarding devices could be programmed in manyways, having a common, open, vendor-agnostic interface(like OpenFlow) enables a control plane to control forwarding devices from different hardware and software vendors.VersionOF 1.0OF 1.1OF 1.2OF 1.3OF 1.4DateDec 2009Feb 2011Dec 2011Jun 2012Oct 2013Figure 1: P4 is a language to configure switches.Header Fields12 fields (Ethernet, TCP/IPv4)15 fields (MPLS, inter-table metadata)36 fields (ARP, ICMP, IPv6, etc.)40 fields41 fieldsRecent chip designs demonstrate that such flexibility canbe achieved in custom ASICs at terabit speeds [1, 2, 3]. Programming this new generation of switch chips is far fromeasy. Each chip has its own low-level interface, akin tomicrocode programming. In this paper, we sketch the design of a higher-level language for Programming Protocolindependent Packet Processors (P4). Figure 1 shows therelationship between P4—used to configure a switch, tellingit how packets are to be processed—and existing APIs (suchas OpenFlow) that are designed to populate the forwardingtables in fixed function switches. P4 raises the level of abstraction for programming the network, and can serve as aTable 1: Fields recognized by the OpenFlow standardThe OpenFlow interface started simple, with the abstraction of a single table of rules that could match packets on adozen header fields (e.g., MAC addresses, IP addresses, protocol, TCP/UDP port numbers, etc.). Over the past fiveyears, the specification has grown increasingly more complicated (see Table 1), with many more header fields andACM SIGCOMM Computer Communication Review88Volume 44, Number 3, July 2014

generalizations. First, OpenFlow assumes a fixed parser,whereas our model supports a programmable parser to allownew headers to be defined. Second, OpenFlow assumes thematch action stages are in series, whereas in our model theycan be in parallel or in series. Third, our model assumes thatactions are composed from protocol-independent primitivessupported by the switch.Our abstract model generalizes how packets are processedin different forwarding devices (e.g., Ethernet switches, loadbalancers, routers) and by different technologies (e.g., fixedfunction switch ASICs, NPUs, reconfigurable switches, software switches, FPGAs). This allows us to devise a common language (P4) to represent how packets are processedin terms of our common abstract model. Hence, programmers can create target-independent programs that a compiler can map to a variety of different forwarding devices,ranging from relatively slow software switches to the fastestASIC-based switches.general interface between the controller and the switches.That is, we believe that future generations of OpenFlowshould allow the controller to tell the switch how to operate,rather than be constrained by a fixed switch design. The keychallenge is to find a “sweet spot” that balances the needfor expressiveness with the ease of implementation across awide range of hardware and software switches. In designingP4, we have three main goals: Reconfigurability. The controller should be able to redefine the packet parsing and processing in the field. Protocol independence. The switch should not be tiedto specific packet formats. Instead, the controller shouldbe able to specify (i) a packet parser for extracting headerfields with particular names and types and (ii) a collectionof typed match action tables that process these headers. Target independence. Just as a C programmer doesnot need to know the specifics of the underlying CPU, thecontroller programmer should not need to know the details of the underlying switch. Instead, a compiler shouldtake the switch’s capabilities into account when turninga target-independent description (written in P4) into atarget-dependent program (used to configure the switch).The outline of the paper is as follows. We begin by introducing an abstract switch forwarding model. Next, weexplain the need for a new language to describe protocolindependent packet processing. We then present a simplemotivating example where a network operator wants to support a new packet-header field and process packets in multiple stages. We use this to explore how the P4 programspecifies headers, the packet parser, the multiple match action tables, and the control flow through these tables. Finally, we discuss how a compiler can map P4 programs totarget switches.Related work. In 2011, Yadav et al. [4] proposed an abstract forwarding model for OpenFlow, but with less emphasis on a compiler. Kangaroo [1] introduced the notion of programmable parsing. Recently, Song [5] proposed protocoloblivious forwarding which shares our goal of protocol independence, but is targeted more towards network processors. The ONF introduced table typing patterns to expressthe matching capabilities of switches [6]. Recent work onNOSIX [7] shares our goal of flexible specification of match action tables, but does not consider protocol-independenceor propose a language for specifying the parser, tables, andcontrol flow. Other recent work proposes a programmatic interface to the data plane for monitoring, congestion control,and queue management [8, 9]. The Click modular router [10]supports flexible packet processing in software, but does notmap programs to a variety of target hardware switches.2.Figure 2: The abstract forwarding model.The forwarding model is controlled by two types of operations: Configure and Populate. Configure operations program the parser, set the order of match action stages, andspecify the header fields processed by each stage. Configuration determines which protocols are supported and howthe switch may process packets. Populate operations add(and remove) entries to the match action tables that werespecified during configuration. Population determines thepolicy applied to packets at any given time.For the purposes of this paper, we assume that configuration and population are two distinct phases. In particular,the switch need not process packets during configuration.However, we expect implementations will allow packet processing during partial or full reconfiguration enabling upgrades with no downtime. Our model deliberately allowsfor, and encourages, reconfiguration that does not interruptforwarding.Clearly, the configuration phase has little meaning in fixedfunction ASIC switches; for this type of switch, the com-ABSTRACT FORWARDING MODELIn our abstract model (Fig. 2), switches forward packetsvia a programmable parser followed by multiple stages ofmatch action, arranged in series, parallel, or a combinationof both. Derived from OpenFlow, our model makes threeACM SIGCOMM Computer Communication Review89Volume 44, Number 3, July 2014

parsing in P4 by declaring legal header types. Similarly, theprogrammer needs to express how packet headers are to beprocessed. For example, TTL fields must be decrementedand tested, new tunnel headers may need to be added, andchecksums may need to be computed. This motivates P4’suse of an imperative control flow program to describe headerfield processing using the declared header types and a primitive set of actions.We could use a language such as Click [10], which buildsswitches from modules composed of arbitrary C . Click isextremely expressive, and very suitable for expressing howpackets are processed in the kernel of a CPU. But it is insufficiently constrained for our needs—we need a languagethat mirrors the parse-match-action pipelines in dedicatedhardware. In addition, Click is not designed for a controllerswitch architecture and hence does not allow programmersto describe match action tables that are dynamically populated by well-typed rules. Finally, Click makes it difficultto infer dependencies that constrain parallel execution—aswe now discuss.A packet processing language must allow the programmerto express (implicitly or explicitly) any serial dependenciesbetween header fields. Dependencies determine which tables can be executed in parallel. For example, sequentialexecution is required for an IP routing table and an ARPtable due to the data dependency between them. Dependencies can be identified by analyzing Table Dependency Graphs(TDGs); these graphs describe the field inputs, actions, andcontrol flow between tables. Figure 3 shows an example table dependency graph for an L2/L3 switch. TDG nodes mapdirectly to match action tables, and a dependency analysis identifies where each table may reside in the pipeline.Unfortunately TDGs are not readily accessible to most programmers; programmers tend to think of packet processingalgorithms using imperative constructs rather than graphs.piler’s job is to simply check if the chip can support the P4program. Instead, our goal is to capture the general trendtowards fast reconfigurable packet-processing pipelines, asdescribed in [2, 3].Arriving packets are first handled by the parser. Thepacket body is assumed to be buffered separately, and unavailable for matching. The parser recognizes and extractsfields from the header, and thus defines the protocols supported by the switch. The model makes no assumptionsabout the meaning of protocol headers, only that the parsedrepresentation defines a collection of fields on which matching and actions operate.The extracted header fields are then passed to the match action tables. The match action tables are divided between ingress and egress. While both may modify the packetheader, ingress match action determines the egress port(s)and determines the queue into which the packet is placed.Based on ingress processing, the packet may be forwarded,replicated (for multicast, span, or to the control plane),dropped, or trigger flow control. Egress match action performs per-instance modifications to the packet header – e.g.,for multicast copies. Action tables (counters, policers, etc.)can be associated with a flow to track frame-to-frame state.Packets can carry additional information between stages,called metadata, which is treated identically to packet headerfields. Some examples of metadata include the ingress port,the transmit destination and queue, a timestamp that canbe used for packet scheduling, and data passed from tableto-table that does not involve changing the parsed representation of the packet such as a virtual network identifier.Queueing disciplines are handled in the same way as thecurrent OpenFlow: an action maps a packet to a queue,which is configured to receive a particular service discipline.The service discipline (e.g., minimum rate, DRR) is chosenas part of the switch configuration.Although beyond the scope of this paper, action primitives can be added to allow the programmer to implementnew or existing congestion control protocols. For example,the switch might be programmed to set the ECN bit basedon novel conditions, or it might implement a proprietarycongestion control mechanism using match action tables.3.A PROGRAMMING LANGUAGEWe use the abstract forwarding model to define a language to express how a switch is to be configured and howpackets are to be processed. This paper’s main goal is topropose the P4 programming language. However, we recognize that many languages are possible, and they will likelyshare the common characteristics we describe here. For example, the language needs a way to express how the parser isprogrammed so that the parser knows what packet formatsto expect; hence a programmer needs a way to declare whatheader types are possible. As an example, the programmercould specify the format of an IPv4 header and what headersmay legally follow the IP header. This motivates definingACM SIGCOMM Computer Communication ReviewFigure 3: Table dependency graph for an L2/L3switch.This leads us to propose a two-step compilation process.At the highest level, programmers express packet processing programs using an imperative language representing thecontrol flow (P4); below this, a compiler translates the P4representation to TDGs to facilitate dependency analysis,90Volume 44, Number 3, July 2014

and then maps the TDG to a specific switch target. P4 isdesigned to make it easy to translate a P4 program into aTDG. In summary, P4 can be considered to be a sweet spotbetween the generality of say Click (that makes it difficult toinfer dependencies and map to hardware) and the inflexibility of OpenFlow 1.0 (that makes it impossible to reconfigureprotocol processing). Control Programs: The control program determinesthe order of match action tables that are applied to apacket. A simple imperative program describe the flow ofcontrol between match action tables.Next, we show how each of these components contributes tothe definition of an idealized mTag processor in P4.4.A design begins with the specification of header formats.Several domain-specific languages have been proposed forthis [13, 14, 15]; P4 borrows a number of ideas from them.In general, each header is specified by declaring an orderedlist of field names together with their widths. Optional fieldannotations allow constraints on value ranges or maximumlengths for variable-sized fields. For example, standard Ethernet and VLAN headers are specified as follows:4.2P4 LANGUAGE BY EXAMPLEWe explore P4 by examining a simple example in-depth.Many network deployments differentiate between an edgeand a core; end-hosts are directly connected to edge devices, which are in turn interconnected by a high-bandwidthcore. Entire protocols have been designed to support thisarchitecture (such as MPLS [11] and PortLand [12]), aimedprimarily at simplifying forwarding in the core.Consider an example L2 network deployment with topof-rack (ToR) switches at the edge connected by a two-tiercore. We will assume the number of end-hosts is growingand the core L2 tables are overflowing. MPLS is an optionto simplify the core, but implementing a label distributionprotocol with multiple tags is a daunting task. PortLandlooks interesting but requires rewriting MAC addresses—possibly breaking existing network debugging tools—and requires new agents to respond to ARP requests.P4 lets us express a custom solution with minimal changesto the network architecture. We call our toy example mTag:it combines the hierarchical routing of PortLand with simpleMPLS-like tags. The routes through the core are encodedby a 32-bit tag composed of four single-byte fields. The 32bit tag can carry a “source route” or a destination locator(like PortLand’s Pseudo MAC). Each core switch need onlyexamine one byte of the tag and switch on that information.In our example, the tag is added by the first ToR switch,although it could also be added by the end-host NIC.The mTag example is intentionally very simple to focusour attention on the P4 language. The P4 program for anentire switch would be many times more complex in practice.4.1header ethernet {fields {dst addr : 48; // width in bitssrc addr : 48;ethertype : 16;}}header vlan {fields {pcp : 3;cfi : 1;vid : 12;ethertype : 16;}}The mTag header can be added without altering existingdeclarations. The field names indicate that the core hastwo layers of aggregation. Each core switch is programmedwith rules to examine one of these bytes determined by itslocation in the hierarchy and the direction of travel (up ordown).header mTag {fields {up1 : 8;up2 : 8;down1 : 8;down2 : 8;ethertype : 16;}}P4 ConceptsA P4 program contains definitions of the following keycomponents: Headers: A header definition describes the sequence andstructure of a series of fields. It includes specification offield widths and constraints on field values. Parsers: A parser definition specifies how to identifyheaders and valid header sequences within packets. Tables: Match action tables are the mechanism for performing packet processing. The P4 program defines thefields on which a table may match and the actions it mayexecute. Actions: P4 supports construction of complex actionsfrom simpler protocol-independent primitives. These complex actions are available within match action tables.ACM SIGCOMM Computer Communication ReviewHeader Formats4.3The Packet ParserP4 assumes the underlying switch can implement a statemachine that traverses packet headers from start to finish,extracting field values as it goes. The extracted field valuesare sent to the match action tables for processing.P4 describes this state machine directly as the set of transitions from one header to the next. Each transition may betriggered by values in the current header. For example, wedescribe the mTag state machine as follows.91Volume 44, Number 3, July 2014

parser start {ethernet;}table mTag table {reads {ethernet.dst addr : exact;vlan.vid : exact;}actions {// At runtime, entries are programmed with params// for the mTag action. See below.add mTag;}max size : 20000;}parser ethernet {switch(ethertype) {case 0x8100: vlan;case 0x9100: vlan;case 0x800: ipv4;// Other cases}}For completeness and for later discussion, we present briefdefinitions of other tables that are referenced by the ControlProgram (§4.6).parser vlan {switch(ethertype) {case 0xaaaa: mTag;case 0x800: ipv4;// Other cases}}table source check {// Verify mtag only on ports to the corereads {mtag : valid; // Was mtag parsed?metadata.ingress port : exact;}actions { // Each table entry specifies *one* actionparser mTag {switch(ethertype) {case 0x800: ipv4;// Other cases}}// If inappropriate mTag, send to CPUfault to cpu;Parsing starts in the start state and proceeds until anexplicit stop state is reached or an unhandled case is encountered (which may be marked as an error). Upon reaching a state for a new header, the state machine extractsthe header using its specification and proceeds to identifyits next transition. The extracted headers are forwardedto match action processing in the back-half of the switchpipeline.The parser for mTag is very simple: it has only four states.Parsers in real networks require many more states; for example, the parser defined by Gibb et. al. [16, Figure 3(e)]expands to over one hundred states.4.4// If mtag found, strip and record in metadatastrip mtag;// Otherwise, allow the packet to continuepass;}max size : 64; // One rule per port}table local switching {// Reads destination and checks if local// If miss occurs, goto mtag table.}table egress check {// Verify egress is resolved// Do not retag packets received with tag// Reads egress and whether packet was mTagged}Table SpecificationNext, the programmer describes how the defined headerfields are to be matched in the match action stages (e.g.,should they be exact matches, ranges, or wildcards?) andwhat actions should be performed when a match occurs.In our simple mTag example, the edge switch matches onthe L2 destination and VLAN ID, and selects an mTag toadd to the header. The programmer defines a table to matchon these fields and apply an action to add the mTag header(see below). The reads attribute declares which fields tomatch, qualified by the match type (exact, ternary, etc).The actions attribute lists the possible actions which maybe applied to a packet by the table. Actions are explained inthe following section. The max size attribute specifies howmany entries the table should support.The table specification allows a compiler to decide howmuch memory it needs, and the memory type (e.g., TCAMor SRAM) to implement the table.ACM SIGCOMM Computer Communication Review4.5Action SpecificationsP4 defines a collection of primitive actions from whichmore complicated actions are built. Each P4 program declares a set of action functions that are composed of actionprimitives; these action functions simplify table specificationand population. P4 assumes parallel execution of primitiveswithin an action function. (Switches incapable of parallelexecution may emulate the semantics.)The add mTag action referred to above is implemented asfollows:action add mTag(up1, up2, down1, down2, egr spec) {add header(mTag);// Copy VLAN ethertype to mTag92Volume 44, Number 3, July 2014

the packet, recording whether the packet had an mTag inmetadata. Tables later in the pipeline may match on thismetadata to avoid retagging the packet.The local switching table is then executed. If this table“misses,” it indicates that the packet is not destined for alocally connected host. In that case, the mTag table (defined above) is applied to the packet. Both local and coreforwarding control can be processed by the egress checktable which handles the case of an unknown destination bysending a notification up the SDN control stack.The imperative representation of this packet processingpipeline is as follows:copy field(mTag.ethertype, vlan.ethertype);// Set VLAN’s ethertype to signal mTagset field(vlan.ethertype, 0xaaaa);set field(mTag.up1, up1);set field(mTag.up2, up2);set field(mTag.down1, down1);set field(mTag.down2, down2);// Set the destination egress port as wellset field(metadata.egress spec, egr spec);}If an action needs parameters (e.g., the up1 value for themTag), it is supplied from the match table at runtime.In this example, the switch inserts the mTag after theVLAN tag, copies the VLAN tag’s Ethertype into the mTagto indicate what follows, and sets the VLAN tag’s Ethertypeto 0xaaaa to signal mTag. Not shown are the inverse actionspecification that strips an mTag from a packet and the tableto apply this action in edge switches.P4’s primitive actions include: set field: Set a specific field in a header to a value.Masked sets are supported. copy field: Copy one field to another. add header: Set a specific header instance (and all itsfields) as valid. remove header: Delete (“pop”) a header (and all its fields)from a packet. increment: Increment or decrement the value in a field. checksum: Calculate a checksum over some set of headerfields (e.g., an IPv4 checksum).We expect most switch implementations will restrict actionprocessing to permit only header modifications that are consistent with the specified packet format.4.6control main() {// Verify mTag state and port are consistenttable(source check);// If no error from source check, continueif (!defined(metadata.ingress error)) {// Attempt to switch to end hoststable(local switching);if (!defined(metadata.egress spec)) {// Not a known local host; try mtaggingtable(mTag table);}// Check for unknown egress state or// bad retagging with mTag.table(egress check);}}5.For a network to implement our P4 program, we needa compiler to map the target-independent description ontothe target switch’s specific hardware or software platform.Doing so involves allocating the target’s resources and generating appropriate configuration for the device.The Control ProgramOnce tables and actions are defined, the only remainingtask is to specify the flow of control from one table to thenext. Control flow is specified as a program via a collectionof functions, conditionals, and table references.5.1Compiling Packet ParsersFor devices with programmable parsers, the compiler translates the parser description into a parsing state machine,while for fixed parsers, the compiler merely verifies that theparser description is consistent with the target’s parser. Details of generating a state machine and state table entriescan be found in [16].Table 2 shows state table entries for the vlan and mTagsections of the parser (§4.3). Each entry specifies the currentstate, the field value to match, and the next state. Othercolumns are omitted for brevity.Figure 4: Flow chart for the mTag example.Current StatevlanvlanvlanmTagmTagFigure 4 shows a graphical representation of the desiredcontrol flow for the mTag implementation on edge switches.After parsing, the source check table verifies consistencybetween the received packet and the ingress port. For example, mTags should only be seen on ports connected tocore switches. The source check also strips mTags fromACM SIGCOMM Computer Communication ReviewCOMPILING A P4 PROGRAMLookup Value0xaaaa0x800*0x800*Next StatemTagipv4stopipv4stopTable 2: Parser state table entries for the mTag example.93Volume 44, Number 3, July 2014

5.2Compiling Control Programsrectly control a whole network of switches. OpenFlow supports this goal by providing a single, vendor-agnostic API.However, today’s OpenFlow targets fixed-function switchesthat recognize a predetermined set of header fields and thatprocess packets using a small set of predefined actions. Thecontrol plane cannot express how packets should be processed to best meet the needs of control applications.We propose a step towards more flexible switches whosefunctionality is specified—and may be changed—in the field.The programmer decides how the forwarding plane processespackets without worrying about implementation details. Acompiler transforms an imperative program into a table dependency graph that can be mapped to many specific targetswitches, including optimized hardware implementations.We emphasize that this is only a first step, designed asa straw-man proposal for OpenFlow 2.0 to contribute tothe debate. In this proposal, several aspects of a switch remain undefined (e.g., congestion-control primitives, queuingdisciplines, traffic monitoring). However, we believe the approach of having a configuration language—and compilersthat generate low-level configurations for specific targets—will lead to future switches that provide greater flexibility,and unlock the potential of software defined networks.The imperative control-flow representation in §4.6 is aconvenient way to specify the logical forwarding behavior ofa switch, but does not explicitly call out dependencies between tables or opportunities for concurrency. We thereforeemploy a compiler to analyze the control program to identifydependencies and look for opportunities to process headerfields in parallel. Finally, the compiler generates the target configuration for the switch. There are many potentialtargets: for example, a software switch [17], a multicore software switch [18], an NPU [19], a fixed function switch [20],or a reconfigurable match table (RMT) pipeline [2].As discussed in §3, the compiler follows a two-stage compilation process. It first converts the P4 control programinto an intermediate table dependency graph representationwhich it analyzes to determine dependencies between tables. A target-specific back-end then maps this graph ontothe switch’s specific resources.We briefly examine how the mTag example would be implemented in different kinds of switches:Software switches: A software switch provides completeflexibility: the table count, table configuration, and parsingare under software control. The compiler directly maps themTag table graph to switch tables. The compiler uses table type information to constrain table widths, heights, andmatching criterion (e.g., exact, prefix, or wildcard) of eachtable. The compiler might also optimize ternary or prefixmatching with software data structures.Hardware switches with RAM and TCAM: A compiler can configure hashing to perform efficient exact-matching using RAM, for the mTag table in edge switches. Incontrast, the core mTag forwarding table that matches on asubset of tag bits would be mapped to TCAM.Switches supporting parallel tables: The compilercan detect data dependencies and arrange tables in parallelor in series. In the mTag example, the tables mTag table andlocal switching can execute in parallel up to the executionof the action of setting an mTag.Switches that apply actions at the end of the pipeline: For switches with action processing only at the end ofa pipeline, the compiler can tell intermediate stages to generate metadata that is used to perform the final write

SDN control protocols like OpenFlow. In its current form, OpenFlow explicitly speci es protocol headers on which it operates. This set has grown from 12 to 41 elds in a few years, increasing the complexity of the speci cation while still not providing the exibility to add new headers. In this paper we propose P4 as a strawman proposal for how Open-