Popcorn Linux: Cross Kernel Process And Thread Migration .

Transcription

Popcorn Linux: Cross Kernel Process and Thread Migration in aLinux-Based MultikernelDavid G. KatzThesis submitted to the Faculty of theVirginia Polytechnic Institute and State Universityin partial fulfillment of the requirements for the degree ofMaster of ScienceinComputer EngineeringBinoy Ravindran, ChairRobert P. BroadwaterChao WangSeptember 22, 2014Blacksburg, VirginiaKeywords: Linux, Multikernel, Process, Thread, Address Space Migration, Task MigrationCopyright 2014, David G. Katz

Popcorn Linux: Cross Kernel Process and Thread Migration in aLinux-Based MultikernelDavid G. Katz(ABSTRACT)Proliferation of new computing hardware platforms that support increasing numbers of cores,as well as increasing instruction set architecture (ISA) heterogeneity, is creating opportunityfor systems software developers to question existing software architecture.One promising emerging systems architecture is the multikernel, pioneered in Barrelfish OS.The multikernel directly addresses the challenges of high core counts and increased heterogeneity by partitioning the system into multiple independently running kernel instanceswhich cooperate to form a single operating system. Popcorn Linux is an adaptation of themultikernel concept to a Linux environment, melding the multikernel concept with the powerand ubiquity of the Linux platform. The goal of the Popcorn Linux project is to provide aLinux-based single-system image environment for heterogeneous hardware. In constructingthis environment, Linux must be extended to distribute the plethora of operating systemservices that it provides across kernel instances.This thesis presents the newly developed Popcorn Linux mechanism for migrating tasks andtheir address spaces between kernel instances at arbitrary points in their execution. Bothprocess and thread migration is supported, and distributed address spaces are maintainedand guaranteed to remain consistent between distributed thread group members runningon different kernel instances. Tasks can migrate through an unlimited number of kernelinstances, as well as back to previously visited kernel instances. Additionally, the full tasklife-cycle is supported, allowing migrated tasks to exit and create new children on whicheverkernel instance happens to be hosting them.The mechanisms developed were vetted through unit testing, review, and a number ofcompute-bound benchmarks in a homogeneous x86 64bit environment. Correctness wasdemonstrated, and performance metrics were acquired. Popcorn Linux performance wasshown to be reasonable when compared to SMP Linux. The mechanisms developed aretherefore deemed feasible. Scalability was determined to be a function of workload characteristics, where in some cases Popcorn Linux out-scales SMP Linux and in other cases SMPLinux out-scales Popcorn Linux. Optimizations are recommended to reduce the maturitygap between Popcorn Linux and SMP Linux, improving Popcorn Linux performance.This work was supported in part by the US Office of Naval Research under Contract N0001412-1-0880.

DedicationThis work is dedicated to my wife and my daughter.iii

Contents1 Introduction11.1Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41.2Scope. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41.3Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52 Related Work2.16Multikernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62.1.1Barrelfish Operating System . . . . . . . . . . . . . . . . . . . . . . .62.1.2Factored Operating System (FOS). . . . . . . . . . . . . . . . . . .7Cluster Operating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . .72.2.1Disco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72.2.2Hive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8Linux Approaches to Multikernel Design . . . . . . . . . . . . . . . . . . . .92.3.1NoHype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .92.3.2coLinux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10Software Partitioning of Hardware . . . . . . . . . . . . . . . . . . . . . . . .112.4.1Twin Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .112.4.2SHIMOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122.5Compute-Node Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122.6Operating Systems for Emerging Exascale Multicore . . . . . . . . . . . . . .142.6.1142.22.32.4Tessellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .iv

2.72.82.6.2Tornado . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .142.6.3Corey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .152.6.4Sprite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16Scalable Virtual Memory Subsystems . . . . . . . . . . . . . . . . . . . . . .172.7.1Bonsai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .172.7.2RadixVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17Limitations of Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . .182.8.1Single system image . . . . . . . . . . . . . . . . . . . . . . . . . . .182.8.2Address space consistency . . . . . . . . . . . . . . . . . . . . . . . .183 Linux Concepts193.1Linux Execution Contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . .193.2Task Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .203.3Task Exiting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .213.4Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .223.5Linux Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . .223.6Linux User Mode Helper API . . . . . . . . . . . . . . . . . . . . . . . . . .244 Popcorn Concepts254.1Popcorn System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . .254.2Hardware Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .254.3Global Accessible Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . .264.4Message Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .264.5Device Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .264.6Single System Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .274.7Applications Spanning Kernel Instances . . . . . . . . . . . . . . . . . . . . .274.8Basic Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .275 Task Migration5.129Architectural Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .v29

5.2Shadow Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .305.3Distributed Task and Thread Group Identification . . . . . . . . . . . . . . .315.4Forking Children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .315.5Task Exit Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .335.6Task Migration Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . .345.6.1Migrating A Task To A New Kernel Instance . . . . . . . . . . . . . .355.6.2Migrating A Task To A Previous Kernel Instance . . . . . . . . . . .36Address Space Migration and Consistency Maintenance . . . . . . . . . . . .365.7.1Address Space Consistency . . . . . . . . . . . . . . . . . . . . . . . .375.7.2Up-Front Address Space Migration . . . . . . . . . . . . . . . . . . .405.7.3On-Demand Address Space Migration . . . . . . . . . . . . . . . . . .415.7.4Concurrent Mapping Retrieval . . . . . . . . . . . . . . . . . . . . . .445.7.5Virtual Memory Area Modification . . . . . . . . . . . . . . . . . . .455.7.6Mapping Prefetch . . . . . . . . . . . . . . . . . . . . . . . . . . . . .515.76 Results6.16.26.36.46.553Microbenchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .546.1.1Mechanism Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .546.1.2Task Migration Measurement . . . . . . . . . . . . . . . . . . . . . .55IS-POMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .566.2.1IS-POMP Workload Profile . . . . . . . . . . . . . . . . . . . . . . .566.2.2IS-POMP Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66FT-POMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .686.3.1FT-POMP Workload Profile . . . . . . . . . . . . . . . . . . . . . . .686.3.2FT-POMP Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .77CG-POMP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .806.4.1CG-POMP Workload Profile . . . . . . . . . . . . . . . . . . . . . . .806.4.2CG-POMP Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .89BFS-POMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .92vi

6.66.76.5.1BFS-POMP Workload Profile . . . . . . . . . . . . . . . . . . . . . .926.5.2BFS-POMP Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 101LUD-POMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1036.6.1LUD-POMP Workload Profile . . . . . . . . . . . . . . . . . . . . . . 1046.6.2LUD-POMP Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 112Benchmark Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1156.7.1SMP Linux versus Popcorn Linux . . . . . . . . . . . . . . . . . . . . 1156.7.2Mapping Retrieval Overhead and Symmetry . . . . . . . . . . . . . . 1156.7.3Prefetch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1166.7.4Transport Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1177 Conclusions1188 Future Work1208.1Identify and Resolve Performance Bottlenecks . . . . . . . . . . . . . . . . . 1208.2Alternative Mapping Retrieval Protocols . . . . . . . . . . . . . . . . . . . . 1208.2.1Single Query Single Response Mapping Query . . . . . . . . . . . . . 1218.2.2Distributed Thread Grouping Based Mapping Query . . . . . . . . . 1228.3Shared File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1238.4User-Space Locking Abstractions . . . . . . . . . . . . . . . . . . . . . . . . 1238.5Scheduling Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123Bibliography126vii

List of Figures1.1Perf Measurement for IS-POMP Workload on SMP Linux. . . . . . . . . .24.1Popcorn System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .255.1Migration - originating kernel instance . . . . . . . . . . . . . . . . . . . . .305.2Migration - receiving kernel instance . . . . . . . . . . . . . . . . . . . . . .315.3Task duplication hook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .325.4Task exit procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .335.5Response to thread group exiting notification. . . . . . . . . . . . . . . . .345.6Two consistent distributed VMAs in a distributed address space. In lightcolor the full VMA address range, in solid color the VMA mappings knownby each kernel instance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38An inconsistent distributed VMA in a distributed address space. In light colorthe full VMA address range, in solid color the VMA mappings known by eachkernel instance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .395.8On-demand address space migration procedure . . . . . . . . . . . . . . . . .425.9Munmap procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .475.10 Mprotect procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .485.11 Mremap procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .495.12 Mmap procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .506.1Migration time measurements comparing SMP to Popcorn Linux . . . . . . .556.2IS Workload Event Totals With 1 Prefetch Slot . . . . . . . . . . . . . . . .576.3IS Workload profile with 16 kernel instances with 1 prefetch slot . . . . . . .585.7viii

6.4IS Workload Event Totals With 4 Prefetch Slots . . . . . . . . . . . . . . . .596.5IS Workload profile with 16 kernel instances with 4 prefetch slots . . . . . .606.6IS Workload Event Totals With 8 Prefetch Slots . . . . . . . . . . . . . . . .616.7IS Workload profile with 16 kernel instances with 8 prefetch slots . . . . . .626.8Perf Measurement for IS-POMP Workload on SMP Linux. . . . . . . . . .646.9Perf Measurement for IS-POMP Workload on SMP Linux - User vs Kernel .646.10 Perf Measurement for IS-POMP Workload on Popcorn . . . . . . . . . . . .656.11 Perf Measurement for IS-POMP Workload on Popcorn - User vs Kernel . . .656.12 is-pomp results varying involved kernel instances with 1 prefetch slot . . . .666.13 is-pomp results varying involved kernel instances with 4 prefetch slots . . . .676.14 is-pomp results varying involved kernel instances with 8 prefetch slot . . . .676.15 is-pomp results varying kernel instances involved . . . . . . . . . . . . . . . .686.16 FT Workload Event Totals With 1 Prefetch Slot . . . . . . . . . . . . . . . .696.17 FT Workload profile with 16 kernel instances with 1 prefetch slot . . . . . .706.18 FT Workload Event Totals With 4 Prefetch Slots . . . . . . . . . . . . . . .726.19 FT Workload profile with 16 kernel instances with 4 prefetch slots . . . . . .726.20 FT Workload Event Totals With 8 Prefetch Slots . . . . . . . . . . . . . . .736.21 FT Workload profile with 16 kernel instances with 8 prefetch slots . . . . . .746.22 Perf Measurement for FT-POMP Workload on SMP Linux . . . . . . . . . .756.23 Perf Measurement for FT-POMP Workload on SMP Linux - User vs Kernel766.24 Perf Measurement for FT-POMP Workload on Popcorn . . . . . . . . . . . .766.25 Perf Measurement for FT-POMP Workload on Popcorn - User vs Kernel . .776.26 ft-pomp results varying involved kernel instances with 1 prefetch slot . . . .786.27 ft-pomp results varying involved kernel instances with 4 prefetch slots . . . .786.28 ft-pomp results varying involved kernel instances with 8 prefetch slot . . . .796.29 ft-pomp results varying kernel instances involved . . . . . . . . . . . . . . . .796.30 CG Workload Event Totals With 1 Prefetch Slot . . . . . . . . . . . . . . . .816.31 CG Workload profile with 16 kernel instances with 1 prefetch slot . . . . . .82ix

6.32 CG Workload Event Totals With 4 Prefetch Slots . . . . . . . . . . . . . . .836.33 CG Workload profile with 16 kernel instances with 4 prefetch slots . . . . . .846.34 CG Workload Event Totals With 8 Prefetch Slots . . . . . . . . . . . . . . .856.35 CG Workload profile with 16 kernel instances with 8 prefetch slots . . . . . .866.36 Perf Measurement for CG-POMP Workload on SMP Linux . . . . . . . . . .876.37 Perf Measurement for CG-POMP Workload on SMP Linux - User vs Kernel886.38 Perf Measurement for CG-POMP Workload on Popcorn. . . . . . . . . . .886.39 Perf Measurement for CG-POMP Workload on Popcorn - User vs Kernel . .896.40 cg-pomp results varying involved kernel instances with 1 prefetch slot . . . .906.41 cg-pomp results varying involved kernel instances with 4 prefetch slots . . . .906.42 cg-pomp results varying involved kernel instances with 8 prefetch slot . . . .916.43 cg-pomp results varying kernel instances involved . . . . . . . . . . . . . . .916.44 BFS Workload Event Totals With 1 Prefetch Slot . . . . . . . . . . . . . . .936.45 BFS Workload profile with 16 kernel instances with 1 prefetch slot . . . . . .946.46 BFS Workload Event Totals With 4 Prefetch Slots . . . . . . . . . . . . . . .956.47 BFS Workload profile with 16 kernel instances with 4 prefetch slots . . . . .956.48 BFS Workload Event Totals With 8 Prefetch Slots . . . . . . . . . . . . . . .976.49 BFS Workload profile with 16 kernel instances with 8 prefetch slots . . . . .976.50 Perf Measurement for BFS-POMP Workload on SMP Linux . . . . . . . . .996.51 Perf Measurement for BFS-POMP Workload on SMP Linux - User vs Kernel996.52 Perf Measurement for BFS-POMP Workload on Popcorn . . . . . . . . . . . 1006.53 Perf Measurement for BFS-POMP Workload on Popcorn - User vs Kernel . . 1006.54 bfs-pomp results varying involved kernel instances with 1 prefetch slot . . . . 1016.55 bfs-pomp results varying involved kernel instances with 4 prefetch slots . . . 1026.56 bfs-pomp results varying involved kernel instances with 8 prefetch slot . . . . 1026.57 bfs-pomp results varying kernel instances involved . . . . . . . . . . . . . . . 1036.58 LUD Workload Event Totals With 1 Prefetch Slot . . . . . . . . . . . . . . . 1046.59 LUD Workload profile with 16 kernel instances with 1 prefetch slot . . . . . 105x

6.60 LUD Workload Event Totals With 4 Prefetch Slots . . . . . . . . . . . . . . 1066.61 LUD Workload profile with 16 kernel instances with 4 prefetch slots . . . . . 1076.62 LUD Workload Event Totals With 8 Prefetch Slots . . . . . . . . . . . . . . 1086.63 LUD Workload profile with 16 kernel instances with 8 prefetch slots . . . . . 1096.64 Perf Measurement for LUD-POMP Workload on SMP Linux . . . . . . . . . 1106.65 Perf Measurement for LUD-POMP Workload on SMP Linux - User vs Kernel 1116.66 Perf Measurement for LUD-POMP Workload on Popcorn . . . . . . . . . . . 1116.67 Perf Measurement for LUD-POMP Workload on Popcorn - User vs Kernel . 1126.68 lud-pomp results varying involved kernel instances with 1 prefetch slot . . . 1136.69 lud-pomp results varying involved kernel instances with 4 prefetch slots . . . 1136.70 lud-pomp results varying involved kernel instances with 8 prefetch slot . . . 1146.71 lud-pomp results varying kernel instances involved . . . . . . . . . . . . . . . 114xi

List of Tables6.1Microbenchmark results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .546.2IS-POMP fault processing times by kernel instance with 1 prefetch slot . . .586.3IS-POMP fault processing times by kernel instance with 4 prefetch slots . . .616.4IS-POMP fault processing times by kernel instance with 8 prefetch slots . . .626.5IS-POMP 2 Mapping Retrieval Message Transport Times . . . . . . . . . . .636.6FT-POMP fault processing times by kernel instance with 1 prefetch slot. .716.7FT-POMP fault processing times by kernel instance with 4 prefetch slots . .736.8FT-POMP fault processing times by kernel instance with 8 prefetch slots . .746.9FT-POMP 2 Mapping Retrieval Message Transport Times . . . . . . . . . .756.10 CG-POMP fault processing times by kernel instance with 1 prefetch slot . .826.11 CG-POMP fault processing times by kernel instance with 4 prefetch slots . .856.12 CG-POMP fault processing times by kernel instance with 8 prefetch slots . .866.13 CG-POMP 2 Mapping Retrieval Message Transport Times . . . . . . . . . .876.14 BFS-POMP fault processing times by kernel instance with 1 prefetch slot . .946.15 BFS-POMP fault processing times by kernel instance with 4 prefetch slots .966.16 BFS-POMP fault processing times by kernel instance with 8 prefetch slots .986.17 BFS-POMP 2 Mapping Retrieval Message Transport Times . . . . . . . . .986.18 LUD-POMP fault processing times by kernel instance with 1 prefetch slot . . 1056.19 LUD-POMP fault processing times by kernel instance with 4 prefetch slots . 1086.20 LUD-POMP fault processing times by kernel instance with 8 prefetch slots . 1096.21 LUD-POMP 2 Mapping Retrieval Message Transport Times . . . . . . . . . 110xii

Chapter 1IntroductionAs core count increases in hardware platforms, researchers are questioning the scalability ofthe traditional and ubiquitous shared memory symmetric multiprocessing (SMP) operatingsystem design. Some researchers have found that SMP systems can continue to scale[6].Others argue that extending scalability optimizations made for a given architecture to different architectures often involves prohibitive amounts of work[3]. Yet others point out thatan iterative process must be applied to eliminate bottlenecks in SMP systems, where eachnew advance reveals a new bottleneck[8]. Lock contention in SMP Linux was measuredin this effort under various workloads, and shown to add significant amounts of overhead,specifically in the memory management sub system. Contention over memory managementresources is expected in SMP systems due to the presence of multiple concurrently runningCPUs experiencing faults on the same memory map, and as a result needing to modify pagetables and other highly shared data structures. Figure 1.1 shows SMP overhead as seen onthe main thread of a cpu-bound workload. As the number of kernel instances increases, lockcontention seen in handle pte fault and pagevec lru move fn increasingly dominates overhead. These functions are both part of the memory management sub system that handlefault events. In addition to these types of scalability issues, SMP systems suffer from highfailure rates due to poor isolation between cores[8]. In response to these observations, newsystems software strategies that stray from SMP tradition are emerging to efficiently harnessthe computing power that hardware platforms provide.1

David G. KatzChapter 1. Introduction2Figure 1.1: Perf Measurement for IS-POMP Workload on SMP LinuxNot only is core count increasing, ISA heterogeneity is also being introduced at an increasing rate in shared memory systems. The traditional shared memory SMP paradigm breaksdown once ISA heterogeneity is introduced. Cores with different instruction sets cannotshare code, and therefore sharing a monolithic kernel image is not possible. A solution tothis problem that is seen in many current technologies is to run separate operating systemson cores of dissimilar types, as found in the Qualcomm MSM family of SoCs[13]. Multicorechipsets hosting cores of similar ISA but of varied types and capabilities are also emerging.These chipsets are seen in the mobile devices space (smart phones and tablets), where powerconcerns are significant. ARM big.LITTLE is one such emerging heterogeneous architecturecharacterized by some number of powerful ARM processors coexisting with a number ofrelatively less powerful ARM processors on the same die, sharing a common and coherentcache. The high power chips are used for CPU intensive tasks, while background processingis handled on the more power-frugal CPUs. This architecture has been used in a number of Samsung, Renesas Mobile, MediaTek, and Qualcomm ARM chips for use in mobiledevices[14]. Heterogeneity is also found in current workstations. Intel integrates a Xeonprocessor with a Xeon Phi coprocessor cluster over a PCIe interconnect. The instructionsets of the Xeon and Xeon Phi devices are overlapping but not equal.Many of the emerging systems concepts focus on identifying new methods of pairing computation with computing resources. One such concept is the multikernel. The concept ofa multikernel, introduced by Bauman et al., of the Systems Group at ETH Zurich and Microsoft Research, is embodied in the Barrelfish project and is a design strategy that departsfrom the traditional SMP notion of how to structure an operating system in a multi/many-

David G. KatzChapter 1. Introduction3core potentially heterogeneous architecture. This solution directly addresses scalability issuesarising from increased core count, as well as the coupling issue that is inherent to heterogeneous architectures. The multikernel is a replicated-kernel operating system where eachhardware resource, e.g. core, group of cores sharing a common ISA, non-uniform memoryaccess (NUMA) node, or cluster thereof, runs an independent kernel instance out of its ownprivate portion of the memory space. Each kernel instance acts as a peer, cooperating withthe others to form a cohesive, single system image environment. In contrast to a traditionalSMP operating system, kernel data structures are not shared between kernel instances. Instead, all kernel layer communication is done explicitly through message passing betweenkernel instances. This facilitates cooperation, while maintaining separation to allow kernelinstances to run independently. This independence removes lock contention between kernelinstances, as kernel data structures are replicated rather than being shared and thereforerequire no synchronization. This independence also facilitates heterogeneity within the multikernel by allowing kernel instances running on dissimilar ISAs to cooperate by using anagreed upon messaging interface.The Popcorn Linux project is a research effort aimed at building scalable systems softwarefor future generations of heterogeneous and homogeneous multi/many-core architectures.Popcorn Linux is a self-hosting multikernel operating system where each kernel instance is amodified Linux kernel. Many Popcorn Linux kernel instances co-exist on the same hardwareand can be allocated to hardware resources one per core, one per ISA, one per NUMA node,or clustered in some combination thereof. Hardware resources, such as disks and networkinterfaces, can be assigned (and dynamically re-assigned) kernel instance affinity to giveexclusive access to that resource to a given kernel instance.All kernel instances in the multikernel must cooperate in order to stitch together a singlelogically consistent user space to host user layer software. In order to host legacy Linuxapplications, the system must handle this cooperation at the kernel layer, allowing the userlayer software to implement business logic. This is a sizable task in a multikernel environment, as all resources and functionality that the user software relies upon must be carefullymanaged in a distributed fashion.One requirement for creating this unified user space is effective task migration across kernelinstances. This capability allows for workload distribution and balancing within the multikernel. This thesis is focused on task migration in a homogeneous x86 64bit multikernelenvironment, with attention given to both process and thread migration. The ability to migrate a task at any arbitrary point in its execution between kernel instances was developed,tested, and characterized.Address space consistency across distributed thread group members is a significant problem in an environment where members of the thread group may be executing on differentkernel instances. This thesis breaks that problem into components and presents viable solutions to each aspect of address space consistency, each of which has been vetted throughimplementation and testing.

David G. KatzChapter 1. Introduction4While this thesis concentrates exclusively on task migration and address space consistency ina homogeneous x86 64bit environment, it can be used as a stepping stone to task migrationin other homogeneous environments and eventually heterogeneous environments. The challenges identified in the area of task migration and address space consistency will continueto be problems in other architectures. As Popcorn Linux is developed into a heterogeneoussystems environment, the solutions presented in this thesis can be extended to include theadditional logic that is necessary to support cross architecture migrations.1.1Research ContributionsThis thesis contributes the following to the development of the Popcorn Linux platform:1. A mechanism is designed and implemented to migrate tasks between kernel instancesin a homogeneous x86 64bit multikernel environment.2. A mechanism is designed and implemented to migrate address space components asneeded to support migrated tasks.3. A mechanism is designed and implemented to ensure that distributed address spacesremain consistent across kernel instances.4. The implementation of the components listed above are tested against five benchmarkworkloads, and the results that were collected are analyzed. Recommendations forrefining the design are made based on insights gained from the analysis. The resultingsystem is shown to perform reasonably when compared to SMP Linux and is thereforea feasible approach. Specific improvements can be made to Popcorn Linux to realizefurther gains relative to SMP Linux.1.2ScopeThis thesis does not touch upon resource migration, such as the migration of open filedescriptors, locks, drivers, and name spaces including process identifier (PID) name space.Other students are working on those aspects of the single system image development. Amechanism for migrating file descriptors is also currently under development by SSRG, butis not part of this thesis. This thesis focuses exclusively on task and address space migrationand consistency.

David G. Katz1.3Chapter 1. Introduction5Thesis OrganizationChapter 2 provides information about related work. Special attention is given to task migration, address space migration, and address space indexing mechanisms when applicable.Chapter 3 presents Linux concepts that are necessary to an understanding of Popcorn Linux.Chapter 4 presents Popcorn Linux concepts including the overall system architecture.Chapter 5 is an in-depth description of task migration in Popcorn Linux. This chapter alsodescribes the methods used to achieve the distributed address spaces necessary to supportdistributed thread groups.Chapter 6 presents the results achieved. This section profiles benchmark workloads to determine the types of operations the kernel is performing. Specific benchmarks that vary anumber of kernel configuration items are then presented. Results are analyzed to identifytrends.Chapter 7 provides general concluding remarks.Chapter 8 provides some suggestions for areas to further research.

Chapter 2Related WorkThis chapter provides a survey of existing projects that are related to multi-core systemssoftware. To provide a backdrop for this thesis, this section focuses on task migration andaddress space indexing mechanisms.2.12.1.1MultikernelsBarrelfish Operating SystemSystems Group at ETH Zurich, Microsoft Research, and ENS Cachan Bretagne introducethe no

teristics, where in some cases Popcorn Linux out-scales SMP Linux and in other cases SMP Linux out-scales Popcorn Linux. Optimizations are recommended to reduce the maturity gap between Popcorn Linux and SMP Linux, improving Popcorn Linux performance. This work was supported in part by the U