Embedded System Design For Automotive Applications

Transcription

C O V E R F E A T U R EEmbeddedSystem Design forAutomotive ApplicationsAlberto Sangiovanni-Vincentelli, University of California, BerkeleyMarco Di Natale, Scuola Superiore S. Anna, PisaTo optimize the system design and allow for plug-and-play of subsystems, automotiveelectronic system architecture evaluation and development must be supported with arobust design flow based on virtual platforms.Today, though still relatively stable, the roles ofcarmakers and their suppliers are undergoing aperiod of stress caused by the increased importance and added value of electronics. The automotive supply chain includes car manufacturers—or OEMs—such as GM, Ford,DaimlerChrysler, and Toyota, who provide the finalproduct to the consumer market; Tier 1 suppliers—such as Bosch, Contiteves, Siemens,Nippondenso, Delphi, and Magneti Marelli—thatprovide subsystems such as power train management,suspension control, and brake-by-wire devices toOEMs; Tier 2 suppliers—chip manufacturers such asFreescale, Infineon, ST, and Renesas; IP providers suchas ARM; and real-time operating system supplierssuch as WindRiver and ETAS—who serve bothOEMs and Tier 1 suppliers; and manufacturing suppliers such as Flextronics andTSMC.Because of liability issues, automakers generally limitoutside manufacturing to non-safety-critical verticals.The standard approach for OEMs is to develop systemsby assembling components that have been completelyor partly designed and developed by Tier 1 suppliers.However, these suppliers increasingly are shiftingtoward outsourcing their manufacturing.The supply process traditionally has been targeted atsimple, black-box integrated subsystems in which42Computerrequirements capture and OEM-issued specificationsconsisted of the message interface’s periods and generalperformance requirements, but without a detailed definition of timing and synchronization properties and ofthe communication protocols’ requirements. As a result,the integration of subsystems is done routinely, albeit ina heuristic and ad hoc way. The resulting lack of an overall understanding of the subsystems’ interplay, and thedifficulties encountered in integrating very complex parts,make system integration a very challenging job. The “CarElectronics Architecture” sidebar provides more information on the complexity of modern architectures.CHALLENGESNovel methods and tools for system-level analysis andmodeling are needed not only for predictability andcomposability when partitioning end-to-end functions atdesign time (and later, at system integration time), butalso for providing guidance and support to the designerin the very early stage where the electronics and software architectures of product lines are evaluated andselected. The critical architecture-evaluation and -selection design-process phase affects profoundly a productline’s cost, performance, and quality.Architecture selection typically is performed years inadvance of subsystem development and integration. Inthis process, models of the functions and possible solutions for the physical architecture must be defined andmatched to evaluate quality and select the best possible hardware platform with respect to performance,reliability, and cost metrics and constraints.Published by the IEEE Computer Society0018-9162/07/ 25.00 2007 IEEE

Given the high cost of research, training,and possibly license acquisition for systemlevel design, using a coherent set of models,methods, and tools during a product’s orplatform’s entire lifetime is desirable. Thisextends from the architecture-analysis stageto system partitioning and design, andincludes model-based development, with itsautomatic middleware and application codegeneration steps, and the final integration,testing, and validation stages.Optimizing automotive electronics systemdesign requires standards in the software andhardware domains that allow for plug-andplay of subsystems. The ability to integratesubsystems will then become a commodityitem, available to all OEMs. An OEM’s competitive advantage will increasingly rely onnovel and compelling functionalities. Theessential technical problem to solve for thisvision is the establishment of standards forinteroperability among IPs—both softwareand hardware—and tools. AUTOSAR,1 aworldwide consortium of most of the players in the automotive domain electronicssupply chain, has this goal clearly in mind.However, technical and business challenges must first be overcome. In particular,from a technical viewpoint, while sharingalgorithms and functional designs seems feasible at this time, the sharing of safety-critical and hard real-time software is difficult,even assuming substantial improvements indesign methods and technology. Severalissues must be resolved for function partitioning and subsystem integration in thepresence of real-time and reliability requirements. These include the following: Time predictability. This issue relates tothe capability of predicting the systemlevel timing behavior (latencies and jitter) resulting from the synchronizationbetween tasks and messages, as well asfrom the interplay that different tasks canhave at the real-time operating system(RTOS) level and the synchronizationand queuing policies of the middleware.The timing of end-to-end computationsdepends, in general, on the deploymentof the tasks and messages on the targetarchitecture and on the resource management policies. Dependability. Deploying functions ontothe system engine control units (ECUs)and determining communication and syn-Car Electronics ArchitectureA typical modern vehicle contains between a dozen and nearly100 electronic control units (ECUs).1 Current electronics systemsare typically partitioned by domains. There are two main classes ofelectronic systems: hard-real-time control of mechanical parts andinformation-entertainment. The first category includes chassis control; automotive body, including components such as interiorair conditioning, dashboard, power windows, and controlsubsystems; powertrain, including the engine, transmission, and emissionand control systems; and active safety control.The second category includes information management, navigation, computing, external communication, and entertainment.Each domain has its own requirements for computation speeds,time scales, reliability, flexibility, and extensibility. Today, powertrain applications pose the most demanding challenge in terms ofreal-time constraints and computational power, with activationperiod requirements going down to a few milliseconds at highengine speeds.New active safety applications, currently planned to execute atslower rates—typically in the range of 20 to 100 ms—at eachstage, will pose new challenges because of their high distribution,complexity, and interoperability. The typical power train ECUtoday relies on a 32-bit microcontroller running at hundreds ofMHz, while the rest of the real-time subsystems use a 16-bit microcontroller running at less than 1 MHz, with memory requirementsreaching up to 2 Mbytes for a few complex subsystems. The nextgeneration, however, is rapidly moving toward widespread use of32-bit ECUs, with some running at more than 100 MHz. MulticoreECUs will likely provide the next-generation solution for applications requiring high reliability.For communications, a typical vehicle today contains two orthree controller area network buses, with rates from 25 to 500Kbytes, two or three lower-speed local interconnect networkbuses, and, optionally, some dedicated high-speed links for infotainment. Experimental vehicles now being developed have up to10 CAN buses, with additional buses almost invariably providing500-Kbps links. A further increase in the number of buses isunlikely because of the additional gateways and consequentincreased latencies and jitter. This is why FlexRay, aside from beinga possible solution for future highly reliable communicationneeds, is already required for high-speed, highly deterministiccommunication.Reference1. J.A. Cook et al., “Control, Computing and Communications: Technologies for the Twenty-First Century Model T,” Proc. IEEE, special issue onautomotive power electronics and motor drives, vol. 95, no. 2, 2007,pp. 334-355.October 200743

chronization policies must be done with a view to methodologies for improving the quality and reusabilmeeting dependability targets. A system-level design ity of these software artifacts. A model-based environtool should integrate support for design patterns ment allows development of control and dataflowsuited to the development of highly reliable systems applications in a graphical language familiar to controlwith fault containment at both the functional and tim- engineers and domain experts. Defining components ating levels. Such tools should also support the auto- higher abstraction levels and with well-defined intermatic construction of fault trees to compute the faces permits separation of concerns and improves modprobability of a hazard occurrence or simply the ularity and reusability. Further, the use of virtualcausal dependencies that link it toprototyping tools during developsubsystem-level or even atomicment allows verification by simulaThe constant growthcomponent faults based on thetion of the system behavior.deployment choices.However, when considered in theof embedded systems Composability and extensibilitycontext of a design flow that startsdesign complexityversus efficiency. The timing offrom the early stages of architecturemakes manual analysissoftware tasks depends on theexploration and analysis and supportspresence or absence of othercomplex interacting functions withand design impracticaltasks, and a similar reasoningreal-time requirements, deployed onand error prone.applies to messages. A schedula distributed architecture, most moding policy that could preventern tools have several shortcomings:timing variability in the presenceof dynamically changing task characteristics can be Lack of separation between the functional and archiconceived, but it will carry at least some overhead.tecture model. Such a separation is fundamental forFurther, no commercially available RTOS supportsexploring different architecture options with respectthis kind of policy.to functionality and for reusing an architecture platform with different functions.The previous situation shows the standard tradeoff Lack of support for defining the task and resourcebetween efficiency and reliability, but with more impormodel. Most model-based design flows support thetant business implications than usual. If software from diftransition from the functional model directly to theferent sources must be integrated on a common hardwarecode implementation. The designer has limited conplatform—in the absence of composition rules and formaltrol when generating the task set and can barelyverification of the composed systems’ properties—who willaddress the task and resource model. Placement ofbe responsible for the final product’s correct functioning?tasks in a distributed environment is typically perWhoever takes responsibility for subsystem specificaformed at the code level. The specification of tasktion and later integration will need a strong methodoland message design, and of resource allocation poliogy and iron fist to make suppliers and partners complycies, is necessary to evaluate the system’s timing andwith it. This may not be enough, in the sense that softdependability properties. Modeling languages oftenware characteristics are hard to pin down. Even withdo not consider the definition of end-to-end deadthe best intentions, in the presence of foreign compolines and jitter constraints, which results in insuffinents, developers might not be able to guarantee funccient support for the specification of timing contional and timing behavior and reliability targets.straints and attributes.The constant growth of embedded systems design Lack of modeling support for the analysis and backcomplexity makes manual analysis and design impracticalannotation of scheduling-related delays. Most toolsand error prone. The ideal approach would automaticallysupport the functional model’s simulation and verifimap a set of tasks onto the platform, guaranteeing thecation, which developers typically base on an assumpcorrect functionality and timing with optimal resourcetion of zero communication and computation delays.utilization. This approach should take the design descripDeployment on a given architecture allows analysistion at the pure functional level—including performanceof the delays caused by resource sharing. In a soundand other constraints, as well as the platform architecdesign flow, tools should support this analysis, andture—and produce correct settings for the middleware,the communication and scheduling delays should beRTOS, and optimized application-level code.back-annotated into the model to verify the function’sperformance on a given architecture solution.MODEL-BASED DESIGN Lack of sufficient semantics preservation. When genSoftware content in vehicles has grown steadily overerating code from a starting model description,the years. Conceivably, by 2010 more than 100 milliondevelopers do not always preserve the originallines of code will be present in even low-end vehicles.semantics. Designers and developers must underManufacturers increasingly adopt model-based designstand under what conditions the code-generation44Computer

8Highest priority7Response time6543210012345Execution time678363432302826242220181614400Medium priority (task 5)Lowest priority350300Discontinuity250Linear dependency20015000.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 0Execution time2468Execution time1012Figure 1.Worst-case response times of the highest-priority, a medium-priority (fifth task), and the lowest-priority task in thesample set of eight tasks of Table 1, when their computation times are increased from 0 to the maximum value that ensurescompletion within the designated period.stage can preserve the model semantics. They mustalso realize the implications of an incorrect implementation.TIMING PREDICTABILITY AND ISOLATIONTraditionally, the automotive domain has been receptive to methods and techniques for timing predictability and time determinism. Developers based the standard controller area network (CAN) bus2 on a deterministic resolution of the contention between messagesand on the assignment of priorities to them. The OSEKstandard for RTOSs (www.osek-vdx.org) not only supports predictable priority-based scheduling,3 but alsobounded worst-case blocking time through an implementation of the immediate priority ceiling protocol.4OSEK also defines nonpreemptive groups5 for a possible further improvement of some response times and toallow for stack space reuse. In the absence of faults, andassuming that a task’s worst-case execution time canbe safely estimated, these standards allow predictingthe worst-case timing behavior of computations andcommunications.6,7Priority-based scheduling of tasks and messages fitswell within the traditional design cycle, in which timingproperties are largely verified a posteriori and applications require conformance with respect to worst-caselatency constraints rather than tight time determinism.Further, developers design control algorithms to be tolerant of both small changes in the timing behavior andthe nondeterminism in time. This can arise because ofpreemption and scheduling delays8 or possibly becauseof overwritten data or skipped task and messageinstances caused by temporary timing faults.Finally, although formally incorrect, there is acommon perception that small changes in the timingparameters, such as decreased periods or wrong computation-time estimates, typically result only in a graceful degradation of the tasks’ and messages’ responseTable 1. Sample task set.TaskCiTiri 1 2 3 4 5 6 7 es. Further, developers believe that such degradationwill in any case preserve the high-priority computations.This is only partly true, however. Figure 1 shows theworst-case response times of the highest-priority, amedium-priority, and the lowest-priority task in a sample set of eight tasks.3 As Table 1 shows, these taskshave nominal computation time and period values thatensure completion within the designated period.However, when their computation times are increased,their response time depends linearly on the computation time only in limited portions of the graphs. For alltasks except the highest-priority one, there exist pointsof discontinuity, where the increased number of preemptions adds the entire execution of one or more taskinstances to the response time.Development of larger and more complex applications—deployed with significant parallelism on eachECU, consisting of a densely connected graph of distributed computations and new safety-critical functionsthat require tight deadlines and the guaranteed absenceof timing faults—makes previous assumptions no longertrustworthy. A new rigorous science must be established.Several issues must be considered regarding current standards and the use of priority-based task and messagescheduling:October 200745

Priority-based scheduling can lead to discontinuousbehavior in time and timing anomalies. The dependency of a lower-priority task or message’s responsetime on the computation time of a higher-prioritytask is nonlinear and not even continuous. A smalladditional high-priority load can lead to a suddenincrease in the response time on some computationpaths.9 Further, especially in distributed systems, timing anomalies are possible, and shorter computationtimes may result in larger latencies.10 Variability of the response times between worst- andbest-case scenarios, together with the possible preemptions, can lead to violation of time-deterministicmodel semantics in the implementation of softwaremodels by priority scheduled tasks and messages.11 Extensibility and, to some degree, tolerance withrespect to unexpectedly large resource requirementsfrom tasks and messages allowed by priority-basedscheduling comes at the price of additional jitter,latency, and lack of timing isolation. Future applications, including safety-critical andactive-safety ones, need shorter latencies and timedeterminism—reduced jitter—to increase performance. The current model for propagating information, based on communication by periodicsampling among nonsynchronized nodes,12 has veryhigh latency in the worst case and significant jitterTask 1Messagebetween the best- and worst-case delays. Even ifcommunication-by-sampling can be formally studied and platform implementations defined to guarantee at least some fundamental communicationflow properties, such as data preservation,12 timedeterminism is typically disrupted and the application must tolerate the large latencies caused byrandom sampling delays. Figure 2 shows how communicating information by periodic sampling andshared variables can result in large latencies and anequally large jitter between the best- and worst-caseend-to-end latency. In a time-triggered system, taskand message scheduling can be arranged to reducelatencies and jitter. Deployment of reliable systems requires timing isolation in software-component execution and protectionfrom timing faults. Timing protection is even moreimportant in light of AUTOSAR, which integratescomponents from Tier 1 suppliers into the same ECU,requiring containment and isolation of faulty functional and temporal behaviors. The development of future applications will alsorequire the enforcement of composability and compositionality, not only in the functional domain butalso for parafunctional system properties, includingthe components’ timing behavior and reliability.Task 2System with periodic samplingVariable scheduling delaysVariable sampling timesTask 1MessageTask 2SmalllatencyVery large latencyTime triggered systemTask 1MessageTask 2Latency can be controlled at scheduling timeFigure 2. Periodic sampling model versus a time-triggered system. Communicatinginformation by periodic sampling and shared variables can result in large latenciesand an equally large jitter between the best- and worst-case end-to-end latency.In a time-triggered system, task and message scheduling can be arranged to reducelatencies and jitter.46ComputerPriority-based resource schedulinghas the major downside of allowingfaulty high-priority computation orcommunication flows to easily takecontrol of the ECU or bus, subtractingtime from lower-priority tasks or messages. For example, an excessiverequest for computation time fromany high-priority task affects theresponse time of lower-priority taskson the same ECU.In this case, additional control layers—consisting of runtime guards thatmonitor the timing assertions—avoidthe propagation of timing faults. In thefuture, application tasks from multiple Tier 1 suppliers will integrate intothe same ECU—leveraging the standardization of interfaces allowed byAUTOSAR—and it will be necessaryto protect the tasks of each IP fromother IPs’ timing errors. Timing isolation is therefore required to provideadditional separation of concerns andprotection.Time-based schedulers, includingthose supported by the FlexRay andOSEKtime standards, force contextswitches on the ECUs and the assign-

ment of the communication bus atFlexRay cyclepredefined points, regardless of theStatic segmentDynamic segmentSymbolNitoutstanding requests for computationand communication bandwidth. TheyN1 1 N2 2 N3 3 unused N1 4 N4 52 3814are thus better suited to provide temporal protection, except that theenforcement of a strict time windowN1 1N3 3 unused N1 44 611 12for execution and communicationrequires that the designer have a muchbetter ability to predict the worst-casetask execution times13 to allow sizing Figure 3. FlexRay’s dual-channel bus. Dual-channel configurations allow replicatingthe execution window appropriately. messages on both channels, which facilitates safety-critical communications thatFurther, guardians must be used to leverage physical redundancy.ensure that an out-of-time transmisFlexRay includes a dual-channel bus specification forsion will not disrupt the bus’s communication flow.increased reliability. Including bus guardians at the nodeand star-level in the upcoming specification will in turnCOMMUNICATION AND DISTRIBUTED SYSTEMSMotivations for the upcoming FlexRay communica- offer increased reliability and timing protection. In ation standard for highly deterministic and high-speed dual-channel configuration, messages can be replicatedcommunication include development of new by-wire on both channels, as Figure 3 shows for the messagesfunctions with stringent requirements for determinism from node N1. This facilitates safety-critical communiand short latencies, as well as innovative active safety cations that leverage physical redundancy. The slots canfunctions. These are characterized by large volumes of also be assigned independently, in which case the systemdata traffic, generated by 360-degree sensors positioned doubles the communication bandwidth.FlexRay’s time-triggered model not only allows foraround the vehicles.muchbetter time determinism, but developers also conA consortium that includes BMW, DaimlerChrysler,siderita better paradigm for composability and extenGeneral Motors, Freescale, NXP, Bosch, and Volkswagen/sibility.Each node only needs to know the time slots forAudi as core members is developing the FlexRayitsoutgoingand incoming communications. The specistandard (www.flexray.com). The consortium seeksficationsofthesetime slots reside in local schedulingto support cost-effective deployment of distributedtables.Noglobaldescriptionexists and each node exeby-wire k. As longThe currently available CAN standard is limited to aasthelocaltablesarekeptconsistent,notiming conspeed of 500 Kbps and imposes a protocol overhead offlictsorinterferencesarise.Slotsleftfreeinthe virtualmore than 40 percent, given that the maximum mpositionof each frame is 64 bits and the protocol overhead consists of at least 47 bits for the standard format. In CAN, can be used for future extensions. Reserving time slotsa contention phase assigns the shared bus immediately guarantees time protection and isolation from timingbefore each message’s transmission. At each contention, faults, while guardians avoid that node transmit outsidethe message with the lowest identifier gets the right the allocated time window.Clock synchronization and time determinism on theto transmit.communicationchannel allow implementation of endFlexRay defines the communication speed at 10to-endcomputationsin which the data generation, dataMbps. The bus time is assigned according to a timeconsumption,andcommunicationprocesses aligntriggered pattern, with time divided into communicatemporally,avoidingsamplingdelays.Also, systemtion cycles. Each cycle contains up to four ticsstatic, dynamic, symbol, and nit. Clock ntrolmodtion for communication has been embedded in the ard using part of the nit segment and therefore incurssemantics, like those that popular commercial toolsno additional cost.Of the communication segments, the static part allows such as Simulink from Mathworks (www.mathworks.transmission of time-critical messages according to a com) produce. To achieve these goals, the time-triggeredperiodic cycle in which the system always reserves a time communication model must be propagated to the comslot of fixed length at a given position on the same node. putation layers, using a time-triggered scheduler andThe dynamic segment allows for flexible communica- careful coordination of the communication and comtion. Identifier priority arbitrates message transmission putation schedules, so that the schedule becomes global.in the dynamic part, with the lowest identifier messages However, although the OSEKtime standard is a suitable candidate for a time-triggered RTOS, current stantransmitted first, similar to CAN.October 200747

dards barely address synchronization of the communiThe AUTOSAR project has focused on the concepts ofcation and RTOS layers.location independence, interface standardization, andFinally, with respect to reliability, although FlexRay code portability. Although these goals are extremelyhas a powerful error-detection mechanism, the foreseen important, their achievement will not necessarily be suferror-management scheme instructs the receiver to dis- ficient for improving the software systems’ quality. As withcard a corrupted frame. Because the standard does not most other embedded systems, car electronics are characprovide support for an acknowledgment mechanism terized by functional and nonfunctional properties,(which does exist in CAN), if an application needs a reli- assumptions, and constraints. In complex systems, comable communication mechanism, an acknowledgment ponent-based design can provide encapsulation and sepmust be implemented at the application level.aration of concerns, thereby improving reuse if inforHowever, the communication cycle’s fixed structure mation hiding is implemented so that the componentwould probably require preallocating a communication model allows the following properties6:slot specifically for acknowledging each transmission.Since the system uses fixed-size static slots, this can composability, which guarantees preservation of aimply a significant loss of bandwidth and, even in thecomponent property across integration; andbest case, the transmitter must wait for its next com compositionality, which allows deduction of themunication cycle before attempting a retransmission.composed object’s global properties from its comIn CAN, however, faults usually have limited conseponent properties; this property enables correctnessquences. All receivers discard errorby-construction.frames and attempt retransmissionThe AUTOSARimmediately, without the applicaThe current specification has attion’s intervention. Similarly, CANleast two major shortcomings thatdevelopment partnershipoffers some limited protection againstprevent achieving the desired goals.has been created to developbyzantine faults, although most serThe AUTOSAR metamodel, as ofan open industryial data designers and users probablynow, lacks a clear and unambiguousare not aware of this. Again, suchcommunication and synchronizationstandard for automotiveprotection must be planned for andsemantics and a timing model.software architectures.explicitly implemented at FlexRay’sSimilar to UML—not surprisingly,application level. Taking all these facconsidering that UML 2.1 inspiredthe specification, which by its verytors into account, the potential 20descriptionisprovidedin the form of UML diagrams—times speedup for FlexRay with respect to 500 Kbps CANtheAUTOSARmetamodelis sufficiently mature in itscommunication will probably be much less than14staticorstructuralpart,butoffersan incomplete behavexpected.ioral description. Developers plan to remedy this withsignificant updates in the upcoming AUTOSAR reviCOMPOSABILITY AND AUTOSARThe increasing complexity of software implementa- sion, however.Further, none of the standard’s several layers addresstions parallels increasing supply-chain complexity.issuesrelated to timing and performance, which thusSoftware developers design their components based onunderestimatesthe complexity of current and futurerequirement definitions from the OEMs or Tier 1 suppliers, who are later responsible for their integration. The applications. These applications’ component interacAUTOSAR development partnership,1 which includes tions generate a variety of timing dependencies due toseveral OEM manufacturers, Tier 1 suppliers, and tool scheduling, communication, synchronization, arbitraand software vendors, has been created

Embedded System Design for Automotive Applications. October 2007 43 Given the high cost of research, training, and possibly license acquisition for system-level design, using a coherent set of models, methods, and tools during a p