Brian Carlson OMAP 5 Product Line Manager Member Of Group Technical .

Transcription

W H I T EPA P E RBrian CarlsonOMAP 5 Product Line ManagerMember of Group Technical Staff (MGTS)Wireless business unitIntroductionWe are in the early stages of a mobile devicerevolution that is dramatically changing our lives.Going “beyond a faster horse”to transform mobile devicesMobile devices have become a digital extensionof ourselves that we increasingly depend uponthroughout the day. They are redefining how wecommunicate and socialize with others, learnabout and navigate the world around us, captureife moments, entertain, transact business andMobile device evolutionMobile devices have evolved from a wide variety of technological advancements, driven bystrong consumer demand. Figure 1 below shows examples of mobile device evolution. Theseexamples illustrate the innovations that have transformed them including digital technology,much more. They are primary devices we dependcolor displays, touchscreen, cameras, keyboards, innovative form factors, high-performanceon for our computing needs, and the first devicesCPUs, multimedia accelerators, pico-projectors and stereoscopic 3D (S3D) capture andwe often interact with when we wake up in thedisplay capabilities.morning. We are witnessing the start of the nextgeneration of computing that was dominated bypersonal computers over the past two decades.The game is changing and the technologythat is enabling it is changing quickly to meetconsumers’ insatiable appetite to do more withthe devices they carry with them. The “alwayson” mobile computing experience is in demand,along with the desire for more performance, betteruser experiences and more applications.There are continual technological improvements that make these mobile devices morecapable. However, the TI OMAP 5 platform, oneof the first applications processors based onARM Cortex -A15 MPCore processors, notonly brings a new level of performance, but moreimportantly, extends capabilities to enable newuse cases that will truly transform mobile devices.This paper focuses on some of the key newcapabilities of the Cortex-A15 processor that willhelp drive the transformation of mobile computing.Figure 1 – Mobile device evolution through the yearsThe Next Disruptive TechnologyMajor disruptions from new technologies have changed the course and applications of mobiledevices dramatically over the years. The next major disruption is a processor technology thatelevates performance levels and capabilities to an extent that enables new operational environments, use cases and user experiences – all within mobile power budgets. The capabilitiesand extensions of this new processor technology set the stage for future software innovationsin mobile devices, transforming them from content-consumption devices to content-creationdevices that can serve as our primary computing devices. This disruptive processor technologyis the ARM Cortex-A15 MPCore processor.The Cortex-A15 processor takes mobile computing to the “next level,” as it offers asubstantial performance increase due to several key design enhancements compared to theprevious generation Cortex-A9 processor. The Cortex-A15 processor also provides several

2Texas Instrumentskey new features that support more advanced system-level support, including extended physical addressingextension, hardware virtualization, improved debug/trace, soft-fault recovery and AMBA 4 bus that enablessystem coherency.Table 1 provides a high-level overview of some of the key enhancements, features and benefits of theCortex-A15 processor relative to the Cortex-A9 processor that is now coming to the market in high-endmobile devices.EnhancementsNew featuresKey benefits128-bit (vs 64) load/store path3-inst (vs 2) instruction decode8-micro-ops (vs 4) issue64-byte (vs 32) cache lineSimultaneous load storeImproved brance prediction: Higher capacity Support for indirect branchesMore out-of-order instructionsOptimized level 1 cachesTighter integration with NEON/ VFPImproved memory performance: Tightly-coupled L2 cache reduces latency Enhanced auto-prefetch More requests bufferingVirtualization support: Virtual interrupt controller Second stage MMU for Hypervisorcontrol of guest OS memory CP15 trappingExtended physical addressing (up to 40 bits)Debug/trace support: Integrated trace Virtualization supportReliability and soft-faultrecovery supportAMBA4 bus supports: System coherency MMU coherencyIn the same process node: 1.5x single-thread performance 1.6x floating point and mediaperformance Improved multiprocessing bandwidth Improved streaming performance Advanced system support– Hardware virtualization– Larger memory– System coherencyCortex-A15 offers substantial enhancements and new features to dramatically increase performance and system-level supportTable 1 – Cortex-A15 processor enhancements/features/benefitsThese enhancements focus on improving the processing throughput and efficiency of the core by supportingwider paths, more parallelism, tighter integration and various optimizations. The details of all these processingenhancements are out of the scope of this paper, and can be found in ARM papers and documentation.This paper will later focus on two new features that will significantly benefit mobile devices and extend thesoftware they can support: hardware virtualization and larger physical address extension.Before addressing these new features, it is important to note the significant boost in performance andimproved energy efficiency that the Cortex-A15 process delivers.Performance andenergy efficiencyThe Cortex-A15 processor includes an extensive list of enhancements that result in single-thread performance improvement of 1.5x and floating point and media performance of 1.6x relative to the Cortex-A9processor in the same process technology. The Texas Instruments Incorporated (TI) Cortex-A15 implementation is in a low-power, 28nm process that provides additional frequency and power improvements over theCortex-A9 implemented in 45nm. In general, you should expect a 2-3x peak processing improvement whengoing from one generation of mobile device to the next when using the Cortex-A15 processor.It is important to note that a Cortex-A15 processor clock frequency cannot be directly compared withCortex-A8 or Cortex-A9 processors because of architectural and instructions per cycle (IPC) differences. ForGoing “beyond a faster horse” to transform mobile devicesMay 2011

Texas Instruments 3example, with its 1.5x single-thread performance improvement, a 2GHz Cortex-A15 processor could provideequivalent performance of a 3GHz Cortex-A9 processor. Memory architecture and sizing can also have a bigimpact on the actual performance that is achieved in end products. Performance and power comparisonsshould come from application benchmarks rather than using frequency and mW/MHz numbers directly.The TI OMAP architecture team has done extensive analysis to compare various multi-core configurations of Cortex-A9 and Cortex-A15 processors to see how they differ in performance. This is a very complexprocess since there are many variables and system interactions involved. For example, you have to considerthe shared L2 cache size sensitivity with different types and number of cores, processor efficiencies, cachemiss/hit rates and more. You also have to consider the level of available software parallelism and how thiscan mapped to different numbers of cores. In the end, TI has found that a dual-core Cortex-A15 configuration outperforms a quad-core Cortex-A9 configuration. When you also consider the system enhancementsand what you can do with a device based on Cortex-A15 that you can’t do with Cortex-A9, the Cortex-A15 isvery attractive. TI recently announced the OMAP 5 applications processors, which are based on the CortexA15 MPCore technology to power best-in-class mobile devices in 2012.Performance cannot be a sole metric when evaluating a processor for a mobile device such as a smartphone that must run on a battery in the typical range of 1000-1500 mAh. The mobile device world is verydifferent than the PC world that was driven by performance without the extreme constraint of milliwatts power ranges that is required for all-day usage and multi-day standby. In mobile devices, you have to provide themaximum performance possible, while respecting the limitations of the physical/thermal and battery capacityof mobile devices. Typically there is a maximum system power limitation of 2.5-3W for mobile devices sincethey are small, contained (no fans) and the temperature of the device cannot rise to a point of discomfort forconsumers. This power budget not only includes the processor, but other cores in the applications processorlike graphics and video, as well as the other system components like the display/backlight, modem, RF andother components which can be significant contributors.The processor in a mobile device works in a very dynamic way with extended periods of time in standbymode (still on and able to come up immediately), but also with use cases like web browsing that areprocessing-intensive bursts, as well as processing-intensive, sustained use cases like gaming. With sucha complex operational profile, you have to look at use cases and all-day profiles of the device to properlyevaluate the energy efficiency.The Cortex-A15 processor exhibits a unique ability to not only provide a 2-3x boost in performance overprevious generation processors, but also to harness this processing efficiency to lower energy consumptionand extend battery life.TI has determined that you can provide the same user experience, but do it with nearly 60% less averagepower by taking advantage of the Cortex-A15 processing efficiency relative to the Cortex-A9. This allows theTI OMAP 5 platform to offer a range of significant power and performance improvements as shown below.Going “beyond a faster horse” to transform mobile devicesMay 2011

4Texas InstrumentsHardwarevirtualizationA significant new feature provided by the Cortex-A15 processor is hardware virtualization support, openingup a significant opportunity for power and performance-efficient, multiple guest operating system (OS) support. The ability for a mobile device to host multiple, guest operating systems or services is a game-changerbecause it can enable many new operational scenarios and flexibility that benefits the entire ecosystem.Before getting into the details of virtualization, let’s step back first to introduce the concept itself. Virtualization can be implemented in software, hardware or a hybrid model to manage the operational behaviorsof multiple software domains. Virtualization increases the platform robustness and improves the resourcesharing between these software domains. One example could be to have a device that is running a high-levelOS like Android or Linux, while also running a real-time OS on the same processor or cluster. Virtualizationenables these to work together on the same platform.There are two main approaches to implement virtualization called para-virtualization in which software isused to simulate underlying hardware and hardware virtualization that uses built-in hardware in the processor. Para-virtualization requires guest OS kernel modification and also has more software layers. Hardwarevirtualization has an advantage of being able to host guest OS kernels without modification. This is important;as it minimizes the development work involved for faster time-to-market and can allow consumers to addnew OSes and services to their devices – giving a lot of flexibility.Three key requirements of virtualization were defined in a 1974 ICM paper called “Formal Requirementsfor Virtualizable Third Generation Architectures” by Popek and Goldberg1. They include:Equivalence/Fidelity – Program runs essentially identical to that when running on equivalentmachine directlyResource Control/Safety – Hypervisor has complete control of the virtualized resourcesEfficiency/Performance – Dominant fraction of machine instructions must be executedwithout interventionGoing “beyond a faster horse” to transform mobile devicesMay 2011

Texas Instruments 5An example of virtualization is shown in Figure 2 below.Without virtualizationApplication 1With virtualizationApplication 2Virtual machine 1Application 1Operation SystemVirtual machine 2Application 2Application 1Guest OS 1Hardware– Single operating system– Multiple apps. targeted for OS– OS runs directly on hardwareApplication 2Guest OS 2Virtual machine monitor (hypervisor)USR modeSVC modeHYP modeHardware– Supports multiple guest OSes or profiles– Each OS runs its own apps– Hypervisor layer between OSes and hardwareFigure 2 – Comparison of “without virtualization” and “with virtualization”It can be seen that without virtualization, a single operating environment running applications designedfor that operating system runs on a hardware platform. With virtualization, you can have multiple guest OSes(supervisor mode), each able to run applications (user mode) designed for them with the addition of the Virtual Machine Monitor or Hypervisor that runs in a third hypervisor mode. The benefits of supporting multipleOSes will be discussed later, but they are significant and directly impact mobile device uses and capabilities.Virtualization provides benefits to the entire ecosystem, including the developer, original equipment manufacturer (OEM), operator, business and consumer. This is a very important point, as it highlights the real valuethat spans all parties. Table 2 summarizes the benefits provided to each party by virtualization.PartyBenefitsDeveloper/OEM Leverage legacy software investmentFaster development in virtual environmentRapid deployment of new device variantsOperator Eases device management regardless of OSFreedom for more differentiationMaintain legacy services with new OS(es)Business Improved security/isolation (corporate data)Reduced cost of device managementConsumer Freedom of phone selectionChoice of OS or multiple OSesConverged device – personal and workTable 2 – Virtualization benefits to the entire ecosystemHypervisor support enables multiple software environments on a platform that provide real benefits asshown above. These software environments can be diverse, including running multiple operating systems,but also low-level real-time operating systems (RTOSes) for baseband processing or other system chores andalso lightweight environments for specialized processing like shared device drivers, security code ).2Going “beyond a faster horse” to transform mobile devicesMay 2011

6Texas InstrumentsBelow are a few examples of new mobile device use cases that can be enabled. There are many more thatcan transform the use of mobile devices.Personal and work profiles on your device – Allows users to have one device for both purposes,while separating personal and confidential data in each profile from each other. This is important forbusinesses that have enterprise security concerns. It also can enable workers a choice of phone, notjust one(s) mandated by the employer.Legacy software/services support – OEMs or operators can leverage a previous legacy investmentand continue to offer services based on one environment. They can also offer devices with a newoperating system and efficiently support both. This can be made transparent to the user and gives thebest of both worlds. Operators can leverage this to have separate branded services that are outside themain or open source operating system environment.Phone customization – A consumer can select the operating systems that are desired on the phonerather than having to settle for what typically comes with one operating system. This gives consumers achoice. It also allows the user to run applications that are only available for a certain operating system,and it is possible to make this all seamless – for example be in one OS and run another application inanother OS from the same menu display.As mentioned, the Cortex-A15 processor includes hardware virtualization support which provides fast, powerefficient support for these use case scenarios. Without this hardware virtualization, you can support them onlywith para-virtualization that has several disadvantages including high operational overhead to process the highnumber of traps and exceptions from guest OS kernels running in user mode. This gets compounded whenadditional guest OSes are added. It also complicates software development because it requires changes tothe guest OS and in critical areas. This can be problematic for OSes that are not open source where you don’thave access to the source code to re-host them. The presentation material that goes with this paper shows theoperation of a para-virtualization that has these disadvantages compared to hardware virtualization.The Cortex-A15 provides world-class hardware virtualization that reduces the operational overhead for higherperformance, enables guest OSes to run at native CPU privilege, lowers development cost and improves securityand isolation. A key benefit is the ability to run native ARM OSes without the need for kernel source code whichprovides a lot of flexibility for developers and users.Figure 3 shows an example of a Cortex-A15 platform supporting multiple software domains (OS, hardwareabstraction and other services are shown). It is important to note that these run in the non-secure state,separate from the Trusted Execution Environment. A software hypervisor provides the minimal support requireddue to the hardware virtualization capability that helps in several ways, reducing the entry to the hypervisor byseparating the virtual and physical effects more cleanly and giving more precise control over what enters thehypervisor.Going “beyond a faster horse” to transform mobile devicesMay 2011

Texas Instruments 7Modem-network-firewallWindow CE/PhoneModem-network-firewallNext-generation WindowsSecure codeexecution Assets storageLinux SMP (Ubuntu)Hardware-assistedenvironmenthardeningLinux SMP (Android)TrustedExecutionEnvironmentSoC hardware abstration (TI-Linux)Example: Hosted software domains (hypervisor enabled)Software hypervisor (hardware assisted)Cortex-A15 MPCore platform withfull hardware virtualization extensionsFigure 3 – Cortex-A15 world-class hardware virtualizationThe Cortex-A15 processor does an extraordinary job with separation of virtual and physical. It is standard formicroprocessors to separate virtual and physical interrupts. However, ARM has gone way beyond by separatingthe page table management done by the guest OS from the virtualization page tables handled by the hypervisor,a benefit that is not provided by processors. The Cortex-A15 solution for interrupts means that the guest OSdoesn’t need to enter the hypervisor when servicing virtual interrupts, even several queued for the guest OS.It is important to note that with the Cortex-A15, it is possible to run a guest OS without any code changes andwith good performance.The Cortex-A15 hardware virtualization support is vast, so for the purpose of this overview, this article willfocus on the key hardware MMU support and interrupt virtualization.The Cortex-A15 includes a 2-stage MMU, which is only present on the non-secure side. The first stage is100% compatible with OSes and is “owned’ by the guest OS. This stage performs mapping from the virtualaddress map of each application on each guest OS to an intermediate physical address (IPA) map. The secondstage is owned by the hypervisor and performs the mapping from the IPA to the real system physical addressmap. Each software layer (OS and hypervisor) can manipulate tables independently. This 2-stage MMU approachis fully compatible with guest OSes (they don’t even know about the second stage), yet enables support for thehypervisor in an efficient manner.Going “beyond a faster horse” to transform mobile devicesMay 2011

Texas InstrumentsFigure 4 illustrates the implementation of the 2-stage MMU in the Cortex-A15 processor.SoftwareblockHYP “remaps”this “Guest OS”within the “real”physical addressspace.HardwareblockIPA Intermediate Physical AddressPA Physical cey OSMMU page tables(Stage 1)*Guest OS interacts with MMU Stage 1exactly the same way as if the OS were onbare metal versus re-hosting to HYP.controlcontrolOSMMUmgmt.controlHYP ModeMMU tables(Stage 2)controlHYP CPUModecontrolIPA addressMMU Stage 1 hardware outputs IPAMMU Stage 2 hardware outputs PAPA address8Physical Address (PA) output from MPU subsystemfor SoC interconnect transactionsFig. 4 – Cortex-A15 2-Stage MMU for hardware virtualizationVirtual interrupts are also supported, as interrupts need to be routed to the current guest OS, anotherguest OS that is suspended or directly to the hypervisor. To maintain stability, guests may not directlymanipulate interrupts, so virtual interrupts are maintained for each guest. Without this support in hardware,a software implementation would need to do a lot of processing and have associated overhead. In theCortex-A15, the guest OS sees the virtual GIC (VGIC) as if it was the real GIC.Figure 5 illustrates how virtual interrupts are supported by the Cortex-A15 for efficient processing aspart of its hardware virtualization support.Going “beyond a faster horse” to transform mobile devicesMay 2011

Texas Instruments 9Interrupt flowHardwareGuest OSGuest OSGuest OSGuest OSGuest OSVitual GIC(contextmanaged)Vitual GIC(contextmanaged)Vitual GIC(contextmanaged)Vitual GIC(contextmanaged)Vitual GIC(contextmanaged)HypervisorReal GIC – Hypervisor managedCortex-A15#1Cortex-A15#2Fig. 5 – Cortex-A15 virtual interrupt supportLarger physicaladdress extensionAnother significant new feature introduced by the Cortex-A15 processor is the Larger Physical AddressExtension which extends the address space from 32 bits up to 40 bits. This is very important to supportthe future needs of mobile devices on their current memory trajectory, as well as enabling support fornew applications and usage of the mobile devices.Today’s high-end smartphones support 512MB of DRAM and tablets are extending this to 1GB. Based onTI predictions, and consistent with industry data, tablets will exceed 2GB of DRAM in 2012 and smartphonesin 2013. DRAM size increase is being driven by larger OS memory needs and support for richer content anddata sets.There is a need to extend beyond the current total 4GB of addressable space for the system I/O andmemory, of which today only 2GB is supportable for the DRAM. This makes the need for Cortex-A15 criticalfor high-end mobile devices in the 2012-2013 timeframe.With the Cortex-A15 extension of the physical address space, it provides the ability to support a larger DRAMsize which will transform mobile devices in several ways. With larger memory these devices can support: increasing OS memory needs, multiple OS memory needs (in conjunction with hardware virtualization), larger, richer applications and mixture of applications, more simultaneous processes, and larger data sets and richer content.In the end, this will improve the capabilities of mobile devices. More memory will enable more advancedsoftware and expand their usage beyond content consumption devices to content creation/editing devicesand help drive them as the next generation of computing devices.Going “beyond a faster horse” to transform mobile devicesMay 2011

9Texas InstrumentsThe TI OMAP 5 platform supports up to 8GB of DRAM with its Cortex-A15 integration to enable new usecases that were not previously possible with the Cortex-A9 processor.Transformingmobile devicesThe Cortex-A15 offers a lot of performance enhancements and new features that will dramatically transformmobile devices. Mobile devices will become more capable mobile computers with higher performance, largermemory and the ability to support new opportunities. It will do all this and maintain power within the strictbudget required of battery-powered mobile devices.There will be many new use cases that mobile devices can support as part of this transformation. As mentioned, they will become more content creation devices. They will be able to support mainstream computingbeyond their current focus as content consumption devices. With efficient hardware virtualization, we willsee many new multiple software environments products and use cases including such things as multiplepersonalities/profiles on phones and the ability to run applications from any vendor on any device. Operatorsand OEMs will be able to preserve legacy software and services in addition to supporting the latest operatingsystems, allowing consumers to have a seamless expanded user experience. It is also likely that we will seeadaptive mobile device operation based on your location. For example, a device could run Google Androidwhen mobile and automatically switch to Google Chrome or the next generation of Microsoft Windows whendocked.The future looks very bright with the new ARM Cortex-A15 processor coming to a mobile device nearyou -- powered by TI’s OMAP 5 applications processor. For more information about the OMAP 5 platform,visit www.ti.com/omap5arm15-wpReferences1. Requirements for Virtualization: Popek and Goldberg - 1974 www.wikipedia.org/wiki/Popek and Goldberg virtualization requirements2. Mobile Virtualization - Coming to a Smartphone Near You – Steve Subar – Open Kernel alization-coming-to-a-smartphone-near-you/Special thanks to Steven Goss and Steve Krueger from Texas Instruments for providing virtualizationinsights and supporting graphics.Going “beyond a faster horse” to transform mobile devicesMay 2011

Texas Instruments 10About the AuthorBrian CarlsonOMAP Product Line ManagerMember of Group Technical Staff (MGTS)Wireless Business UnitTexas Instruments IncorporatedAs a product line manager for the Texas Instruments Incorporated (TI) Wireless Business Unit, Brian Carlsonis responsible for defining future OMAP platforms, overseeing related worldwide concept-to-productionactivities and driving communications strategies. He also represents TI on the MIPI Alliance Board ofDirectors and serves as the vice-chairman. Brian is a member of TI’s Group Technical Staff, composed ofTI’s top 20percent of technical achievers company-wide. With over 25 years’ experience in technologymarketing, business development and engineering, Carlson has a rich background in mobile communications,DSP, and multimedia product development.Important Notice: The products and services of Texas Instruments Incorporated and its subsidiaries described herein are sold subject to TI’s standard terms andconditions of sale. Customers are advised to obtain the most current and complete information about TI products and services before placing orders. TI assumes noliability for applications assistance, customer’s applications or product designs, software performance, or infringement of patents. The publication of informationregarding any other company’s products or services does not constitute TI’s approval, warranty or endorsement thereof.The platform bar is a trademark of Texas Instruments.All other trademarks are the property of their respective owners. 2011 Texas Instruments IncorporatedA042210SWPT048

IMPORTANT NOTICETexas Instruments Incorporated and its subsidiaries (TI) reserve the right to make corrections, modifications, enhancements, improvements,and other changes to its products and services at any time and to discontinue any product or service without notice. Customers shouldobtain the latest relevant information before placing orders and should verify that such information is current and complete. All products aresold subject to TI’s terms and conditions of sale supplied at the time of order acknowledgment.TI warrants performance of its hardware products to the specifications applicable at the time of sale in accordance with TI’s standardwarranty. Testing and other quality control techniques are used to the extent TI deems necessary to support this warranty. Except wheremandated by government requirements, testing of all parameters of each product is not necessarily performed.TI assumes no liability for applications assistance or customer product design. Customers are responsible for their products andapplications using TI components. To minimize the risks associated with customer products and applications, customers should provideadequate design and operating safeguards.TI does not warrant or represent that any license, either express or implied, is granted under any TI patent right, copyright, mask work right,or other TI intellectual property right relating to any combination, machine, or process in which TI products or services are used. Informationpublished by TI regarding third-party products or services does not constitute a license from TI to use such products or services or awarranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectualproperty of the third party, or a license from TI under the patents or other intellectual property of TI.Reproduction of TI information in TI data books or data sheets is permissible only if reproduction is without alteration and is accompaniedby all associated warranties, conditions, limitations, and notices. Reproduction of this information with alteration is an unfair and deceptivebusiness practice. TI is not responsible or liable for such altered documentation. Information of third parties may be subject to additionalrestrictions.Resale of TI products or services with statements different from or beyond the parameters stated by TI for that product or service voids allexpress and any implied warranties for the associated TI product or service and is an unfair and deceptive business practice. TI is notresponsible or liable for any such statements.TI products are not authorized for use in safety-critical applications (such a

used to simulate underlying hardware and hardware virtualization that uses built-in hardware in the proces-sor. Para-virtualization requires guest OS kernel modification and also has more software layers. Hardware virtualization has an advantage of being able to host guest OS kernels without modification. This is important;