On The Scalability Of Real-Time Scheduling Algorithms On Multicore .

Transcription

On the Scalability of Real-TimeScheduling Algorithms onMulticore Platforms: A Case StudySathish GopalakrishnanThe University of British Columbia(based on work by others at the University of North Carolina)Real-Time Scalability1

Focus of this Talk Multicore platforms are predicted to getmuch larger in the future.» 10s or 100s of cores per chip, multiplehardware threads per core. Research Question: How will differentreal-time scheduling algorithms scale?» Scalability is defined w.r.t. schedulability(more on this later).Real-Time Scalability2

Outline Background.» Real-time workload assumed.» Scheduling algorithms evaluated.» Some properties of these algorithms.Research questions addressed. Experimental results. Observations/speculation. Future work. Real-Time Scalability3

Real-Time Workload Assumed in this Talk Set τ of periodic tasks scheduled on M cores:52T (2,5)One Core HereU (9,15)0Real-Time Scalability510152025304

Real-Time Workload Assumed in this Talk Set τ of periodic tasks scheduled on M cores:» Task T (T.e,T.p) releases a job with exec. cost T.e everyT.p time units.– T’s utilization (or weight) is U(T) T.e/T.p.– Total utilization is U(τ) ΣT T.e/T.p.52T (2,5)One Core HereU (9,15)0Real-Time Scalability510152025305

Real-Time Workload Assumed in this Talk Set τ of periodic tasks scheduled on M cores:» Task T (T.e,T.p) releases a job with exec. cost T.e everyT.p time units.– T’s utilization (or weight) is U(T) T.e/T.p.– Total utilization is U(τ) ΣT T.e/T.p.52T (2,5)One Core HereU (9,15)0Real-Time Scalability510152025306

Real-Time Workload Assumed in this Talk Set τ of periodic tasks scheduled on M cores:» Task T (T.e,T.p) releases a job with exec. cost T.e everyT.p time units.– T’s utilization (or weight) is U(T) T.e/T.p.– Total utilization is U(τ) ΣT T.e/T.p.» Each job of T has a deadline at the next job release of T.52T (2,5)One Core HereU (9,15)0Real-Time Scalability510152025307

Real-Time Workload Assumed in this Talk Set τ of periodic tasks scheduled on M cores:» Task T (T.e,T.p) releases a job with exec. cost T.e everyT.p time units.– T’s utilization (or weight) is U(T) T.e/T.p.– Total utilization is U(τ) ΣT T.e/T.p.» Each job of T has a deadline at the next job release of T.52T (2,5)One Core HereU (9,15)0Real-Time Scalability510152025308

Real-Time Workload Assumed in this Talk Set τ of periodic tasks scheduled on M cores:» Task T (T.e,T.p) releases a job with exec. cost T.e everyT.pistimeThisanunits.earliest-deadline-first schedule.– T’s utilization (or weight) is U(T) T.e/T.p.Muchof our work pertains to EDF scheduling – Total utilization is U(τ) ΣT T.e/T.p.» Each job of T has a deadline at the next job release of T.52T (2,5)One Core HereU (9,15)0Real-Time Scalability510152025309

Scheduling vs. Schedulability W.r.t. scheduling, we actually care about twokinds of algorithms:» Scheduling algorithm (of course).– Example: Earliest-deadline-first (EDF): Jobs with earlierdeadlines have higher priority.» Schedulability test.τReal-Time ScalabilityTest forEDFyesnono timing requirementwill be violated if τ isscheduled with EDFa timing requirementwill (or may) beviolated 10

Multiprocessor Real-Time SchedulingTwo Approaches:PartitioningGlobal SchedulingSteps:Important Differences:1. 2.Assign tasks to processors (binpacking).Schedule tasks on eachprocessor using a uniprocessoralgorithm.Real-Time ScalabilityOne task queue.Tasks may migrate amongthe processors.11

Scheduling Algorithms ConsideredPartitioned EDF: PEDF. Preemptive & Non-preemptive GlobalEDF: GEDF & NP-GEDF. Clustered EDF: CEDF. » Partition onto clusters of cores, globallyschedule within each clusterclustersC C C CC C C CL1L1From other8 cores L2Real-Time Scalability12

Scheduling Algorithms (Continued) PD2, a global Pfair algorithm.» Schedule jobs one quantum at a time at a“uniform” rate.– May preempt and migrate jobs frequently. Staggered PD2: S-PD2.» Same as PD2 but quanta are “staggered” toavoid excessive bus contention.Real-Time Scalability13

PD2 Example 3Under& mostglobalalgorithms,tasks partitioningwith parameters(2,3)on twoprocessors overall utilization must be capped to avoidOn Processordeadlinemisses. 1 On Processor 2» Due to connections to bin-packing.T (2,3) Exception:Global “Pfair” algorithms do notrequire caps.U (2,3)» Such algorithms schedule jobs one quantum at a time.– May therefore preempt and migrate jobs frequently.V (2,3)– Perhaps less of a concern on a multicore platform. Under most global algorithms, if utilization is notcapped,deadlinetardiness051015is bounded.202530» Sufficient for soft real-time systems.Real-Time Scalability14

SchedulabilityHRT: No deadline is missed. SRT: Deadline tardiness is bounded. For some scheduling algorithms,utilization loss is inherent when checkingschedulability. » That is, schedulability cannot beguaranteed for all task systems with totalutilization at most M.Real-Time Scalability15

Example: PEDFExample:Partitioningthreeglobaltasks withparametersUnder partitioning& mostalgorithms,(2,3)on tionmust willbe cappedavoidIndeadlineterms of misses.bin-packing » Due to connections to bin-packing. Exception: Global “Pfair” algorithms do notrequire caps.Task 1»1 Such algorithms schedule jobs one quantum at a time.– May therefore preempt and migrate jobs frequently.– Perhaps less of a concern on a multicore platform. Under most global algorithms, if utilizationnot3Task 2 isTaskcapped, deadline tardiness is bounded.»0Sufficientfor soft 1real-timesystems.Processor2ProcessorReal-Time Scalability16

Schedulability SummaryPEDFGEDFNP-GEDFCEDFPD2S-PD2HRTSRTutil. lossutil. lossutil. lossutil. lossno lossslight lossutil. loss (same as HRT)no lossno lossutil. loss (not as bad as PEDF)no lossno loss(must shrink periodsby one quantum)Real-Time Scalability17

GEDF SRT Example Under& mostglobal algorithms,Earlierpartitioningexample withGEDF overall utilization must be capped to avoidTardiness is at most one quantum.deadline misses.» Due to connections to bin-packing.T (2,3) Exception:Global “Pfair” algorithms do notrequire caps.U (2,3)» Such algorithms schedule jobs one quantum at a time.– May therefore preempt and migrate jobs frequently.V (2,3)– Perhaps less of a concern on a multicore platform. Under most global algorithms, if utilization is notcapped,deadlinetardiness051015is bounded.202530» Sufficient for soft real-time systems.Real-Time Scalability18

Outline Background.» Real-time workload assumed.» Scheduling algorithms evaluated.» Some properties of these algorithms.Research questions addressed. Experimental results. Observations/speculation. Future work. Real-Time Scalability19

Research Questions In theory, PD2 is always preferable.» It is optimal (no utilization loss).FocusthisinTalk:An Experimental Whatofaboutpractice?» That is, whathappensif system overheadscomparisonof theseschedulingare takenaccount?algorithmsonintothebasis of schedulability.Do migrations really matter on amulticore platform with a shared cache? As multicore platforms get larger, willglobal algorithms scale? Real-Time Scalability20

Test System HW platform: Sun Niagara (UltraSPARC T1).4 HW threadsper core16K (8K) L1instr. (data)cache percoreShared 3MBL2Core 1 L1Core 8L1L2 1.2 GHz “RISC-like”cores. Relatively simple,e.g., no instr.reorderingor branch prediction. Caches somewhatsmall compared toIntel.– OS has 32 “logical CPUs” to manage.– Far larger than any system considered before in RT literature.– Note: CEDF “cluster” 4 HW threads on a core.Real-Time Scalability21

Test System (Cont’d) Operating System: LITMUSRT: LInux Testbedfor MUltiprocessor Scheduling in Real-Timesystems.» Developed at UNC.» Extends Linux by allowing different schedulers tobe linked as “plug-in” components.» Several (real-time) synchronization protocols arealso supported.» Code is available at http://www.cs.unc.edu/ anderson/litmus-rt/.Real-Time Scalability22

Methodology Ran several hundred (synthetic) task setson the test system.Collected70 GBofstepraw isoverheadNote:Thisoffline. samples.It Distilleddoesexpressionsfor average(for SRT)not involvethe Niagara.and worst-case (for HRT) overheads. Conducted schedulability experimentsinvolving 8.5 million randomly-generatedtask sets with overheads considered.Real-Time Scalability23

Kinds of Overheads Tick scheduling overhead.» Incurred when the kernel is invoked at the beginning ofeach quantum (timer “tick”). A quantum is 1ms.Theseoverheadscanbeaccounted Release overhead.for in whenschedulabilityteststo byinflating» Incurredthe kernel is invokedhandlea jobrelease.jobexecutioncosts. Scheduling overhead.» Incurred when the scheduler (in the kernel) is invoked. Context-switching(Doing thisoverhead.correctly is a little tricky.)» Non-cache-related costs associated with a context switch. Preemption/migration overhead.» Costs incurred upon a preemption/migration due to a lossof cache affinity.Real-Time Scalability24

Kernel Overheads Most overheads were small (2-15µs) exceptworst-case overheads impacted by globalqueues.» Most notable: Worst-case scheduling overheadsfor PD2, S-PD2, and GEDF/NP-GEDF:AlgPD2S-PD2GEDF/NP-GEDFReal-Time ScalabilityScheduling Overhead (in µs)32.743.155.2 .26N (N no. of tasks)25

Preemption/Migration Overheads Obtained by measuring synthetic tasks, eachwith a 64K working set & 75/25 read/write ratio.» Interesting trends: PD2 is terrible, staggering reallyhelps, preempt. cost mig. cost per algorithm, butalgorithms that migrate have higher costs.Worst-Case Overheads (in 71.6139.1Real-Time a-Cluster Mig654.2103.4326.8167.3---Inter-Cluster Mig681.1104.1321.1----26

Schedulability Results Generated random tasks using 6 distributionsand checked schedulability using “state-ofthe-art” tests (with overheads considered).» 8.5 million task sets in total. Distributions:» Utilizations uniform over– [0.001,01] (light),– [0.1,0.4] (medium), and– [0.5,09] (heavy).» Bimodal with utilizations distributed over either[0.001,05) or [0.5,09] with probabilities of– 8/9 and 1/9 (light),– 6/9 and 3/9 (medium), and– 4/9 and 5/9 (heavy).Real-Time Scalability27

Schedulability Results Generated random tasks using 6 distributionsand checked schedulability using “state-ofthe-art” tests (with overheads considered).» 8.5 million task sets in total. Distributions:» Utilizations uniform over– [0.001,01] (light),– [0.1,0.4] (medium), and– [0.5,09] (heavy).» Bimodal with utilizations distributed over either[0.001,05) or [0.5,09] with probabilities of– 8/9 and 1/9 (light),– 6/9 and 3/9 (medium), and– 4/9 and 5/9 (heavy).Real-Time Scalabilitywill only show graphsfor these28

HRT Summary PEDF usually wins.» Exception: Lots of heavy tasks (makes bin-packinghard). S-PD2 usually does well.» Staggering has an impact. PD2 and GEDF are quite poor.» PD2 is negatively impacted by high preemption andmigration costs due to aligned quanta.» GEDF suffers from high scheduling costs (due tothe global queue).Real-Time Scalability29

HRT, Bimodal LightPEDF peforms pretty well if mosttask utilizations are low.S-PD2 performs pretty well too.Real-Time Scalability30

HRT, Bimodal LightReal-Time Scalability31

HRT, Bimodal MediumIn this and the next slide, as thefraction of heavy tasks grows, the gapbetween S-PD2 and PEDF narrows.Real-Time Scalability32

HRT, Bimodal MediumReal-Time Scalability33

HRT, Bimodal HeavyReal-Time Scalability34

SRT SummaryPEDF is not as effective as before, butstill OK in light-mostly cases. CEDF performs the best in most cases. S-PD2 still performs generally well. GEDF is still negatively impacted byhigher scheduling costs. » Note: SRT schedulability for GEDF entailsno utilization loss.» NP-GEDF and GEDF are about the same. Note: The scale is different from before.Real-Time Scalability35

SRT, Bimodal LightPEDF and CEDF perform well if tasksare mostly light.Note: S-PD2 never performs really badlyin any experiment.Real-Time Scalability36

SRT, Bimodal LightReal-Time Scalability37

SRT, Bimodal MediumThis and the next slide show that as thefrequency of heavy tasks increases,PEDF degrades. CEDF isn’t affectedby this increase much.Real-Time Scalability38

SRT, Bimodal MediumReal-Time Scalability39

SRT, Bimodal HeavyReal-Time Scalability40

Outline Background.» Real-time workload assumed.» Scheduling algorithms evaluated.» Some properties of these algorithms.Research questions addressed. Experimental results. Observations/speculation. Future work. Real-Time Scalability41

Observations/Speculation Global algorithms are really sensitive to howshared queues are implemented.» Saw 100X performance improvement by switchingfrom linked lists to binomial heaps.» Still working on this » Speculation: Can reduce GEDF costs to close toPEDF costs for systems with 32 cores. Per algorithm, preempt. cost mig. cost.» Due to having a shared cache.» One catch: Migrations increase both costs. Quantum staggering is very effective.Real-Time Scalability42

Observations/Speculation (Cont’d)No one “best” algorithm. Intel has claimed they will produce an 80core general-purpose chip. If they do » the cores will have to be simple highexecution costs high utilizations PEDFwill suffer;» “pure” global algorithms will not scale;» some instantiation of CEDF (or maybe CSPD2) will hit the “sweet spot”.Real-Time Scalability43

Future WorkThoroughly study “how to implement sharedqueues”. Repeat this study on Intel and embeddedmachines. Examine mixed HRT/SRT workloads. Factor in synchronization and dynamicbehavior. » In past work, PEDF was seen to be morenegatively impacted by these things.Real-Time Scalability44

Thanks! Questions?Real-Time Scalability45

SRT Tardiness, Uniform MediumReal-Time Scalability46

Measuring Overheads Done using a UNC-produced tracer calledFeather-Trace.» http://www.cs.unc.edu/ bbb/feathertrace/ Highest 1% of values were tossed.» Eliminates “outliers” due to non-deterministicbehavior in Linux, warm-up effects, etc. Used worst-case (average-case) values forHRT (SRT) schedulability.Used linear regression analysis to producelinear (in the task count) overheadexpressions.Real-Time Scalability47

Obtaining Kernel OverheadsRan 90 (synthetic) task sets perscheduling algorithm for 30 sec. In total, over 600 million individualoverheads were recorded (45 GB ofdata). Real-Time Scalability48

Kernel Overheads (in µs)(N no. of tasks)Worst-CaseAlgPD2S-PD2GEDFCEDFPEFTick11.2 .3N4.8 .3N3 .003N3.22.7 .002NSchedule32.743.155.2 .26N14.8 .01N8.6 .01NContext SW3.1 .01N3.2 .003N29.26.114.9 .04NRelease----45 .3N30.34.7 .009NContext SW2.6 .001N2.5 .001N7.63.24.7 .005NRelease----5.8 .1N16.54 .005NAverageAlgPD2S-PD2GEDFCEDFPEDFReal-Time ScalabilityTick4.3 .03N2.1 .02N2.1 .002N2.82.1 .002NSchedule4.74.211.8 .06N6.1 .01N2.7 .008N49

Kernel Overheads (in µs)(N no. of tasks)Worst-CaseAlgPD2S-PD2GEDFCEDFPEFTick11.2 .3N4.8 .3N3 .003N3.22.7 .002NSchedule32.743.155.2 .26N14.8 .01N8.6 .01NContext SW3.1 .01N3.2 .003N29.26.114.9 .04NRelease----45 .3N30.34.7 .009NContext SW2.6 .001N2.5 .001N7.63.24.7 .005NRelease----5.8 .1N16.54 .005NAverageAlgPD2S-PD2GEDFCEDFPEDFReal-Time ScalabilityTick4.3 .03N2.1 .02N2.1 .002N2.82.1 .002NSchedule4.74.211.8 .06N6.1 .01N2.7 .008N50

Obtaining Preemption/MigrationOverheads Ran 90 (synthetic) task sets per schedulingalgorithm for 60 sec.Each task has a 64K working set (WS) that itaccesses repeatedly with a 75/25 read/writeratio.Recorded time to access WS afterpreemption/migration minus “cache-warmaccess”.In total, over 105 million individual preemption/migration overheads were recorded (15 GB ofdata).Real-Time Scalability51

Preemption/Migration Overheads (in µs)(N no. of 6139.1Intra-Cluster Mig654.2103.4326.8167.3---Inter-Cluster EDFOverall17289.3736772.3Real-Time uster Mig141.887.873.564.8---Inter-Cluster Mig187.690.272.6----52

Preemption/Migration Overheads (in µs)(N no. of 6139.1Intra-Cluster Mig654.2103.4326.8167.3---Inter-Cluster EDFOverall17289.3736772.3Real-Time uster Mig141.887.873.564.8---Inter-Cluster Mig187.690.272.6----53

HRT, Uniform LightThis is the easiest case for partitioning,so PEDF wins.S-PD2 does pretty well too.Real-Time Scalability54

HRT, Uniform LightReal-Time Scalability55

HRT, Uniform MediumSimilar to before.Utilizations aren’t high enough to startcausing problems for partitioning.Real-Time Scalability56

HRT, Uniform MediumReal-Time Scalability57

HRT, Uniform HeavyUtilizations are high enough to causeproblems for partitioning.S-PD2 wins now.Real-Time Scalability58

HRT, Uniform HeavyReal-Time Scalability59

SRT, Uniform LightPEDF wins, S-PD2 performs pretty well.Real-Time Scalability60

SRT, Uniform LightReal-Time Scalability61

SRT, Uniform MediumCEDF really benefits from using a“no utilization loss” schedulability testwithin each cluster.Real-Time Scalability62

SRT, Uniform MediumReal-Time Scalability63

SRT, Uniform HeavyGEDF and NP-GEDF actually win inthis case.CEDF and S-PD2 perform pretty well.PEDF loses.Real-Time Scalability64

SRT, Uniform HeavyReal-Time Scalability65

On the Implementationof Global Real-Time SchedulersSimon Fraser UniversityApril 15, 2010Sathish GopalakrishnanThe University of British ColumbiaWork supported by IBM, SUN, and Intel Corps., NSF grants CNS 0834270, CNS 0834132, and CNS 0615197, and ARO grant W911NF-06-1-0425.Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersUNC’s Implementation Studies (I)Calandrino et al. (2006) Are commonly-studied RT schedulers implementable? In Linux on common hardware platforms?Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.2Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersUNC’s Implementation Studies (I)Calandrino et al. (2006) Are commonly-studied RT schedulers implementable? In Linux on common hardware platforms?L2CacheProc. 1L2CacheProc. 2Main MemoryIntel 4x 2.7 GHz Xeon SMP(few, fast processors; private caches)L2CacheL2CacheProc. 3Proc. 4Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.3Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersUNC’s Implementation Studies (I)Calandrino et al. (2006) Are commonly-studied RT schedulers implementable? In Linux on common hardware platforms?L2CacheProc. 1L2CacheProc. 2partitioned EDFP-EDFG-NP-EDFMain Memory2 x global EDFG-EDFL2CacheL2CacheProc. 3PD22 x PFAIRProc. 4S-PD2Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.4Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersUNC’s Implementation Studies (I)“fortested scheme, scenarios existCalandrino etal.each(2006)in whichit is a viablechoice” Are commonly-studiedRT schedulersimplementable? In Linux on common hardware platforms?L2CacheProc. 1L2CacheProc. 2P-EDFG-NP-EDFMain MemoryG-EDFL2CacheL2CacheProc. 3Proc. 4PD2S-PD2Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.5Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersUNC’s Implementation Studies (II)Brandenburg et al. (2008) What if there are many slow processors?L2CacheProc. 1L2CacheProc. 2P-EDFG-NP-EDFMain MemoryG-EDFL2CacheL2CacheProc. 3Proc. 4PD2S-PD2Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.6Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersUNC’s Implementation Studies (II)Brandenburg et al. (2008) What if there are many slow processors? Explored scalability of RT schedulers on a Sun hread3Core 1Core 2Core 3Core 4L1CacheL1CacheL1CacheL1CacheMain MemoryThread1Thread4G-NP-EDFG-EDFL2 CacheThread1L1CacheL1CacheL1CacheL1CacheCore 5Core 6Core 7Core ead3PD2Thread4S-PD2Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.7Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersUNC’s Implementation Studies (II)Brandenburg et al. (2008) What if there are many slow processors? Explored scalability of RT schedulers on a Sun Niagara.Main ead2Thread3Core 1Core 2Core 3Core 4L1CacheL1CacheL1CacheL1CacheThread4G-NP-EDFG-EDF: high overheads, low schedulability.L2 CacheThread1L1CacheL1CacheL1CacheL1CacheCore 5Core 6Core 7Core -EDFG-EDFPD2Thread4S-PD2Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.8Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersToday’s discussionHow to implement global d2Thread3Core 1Core 2Core 3Core 4L1CacheL1CacheL1CacheL1CacheMain MemoryThread1Thread4G-NP-EDFG-EDFL2 CacheThread1L1CacheL1CacheL1CacheL1CacheCore 5Core 6Core 7Core ead3PD2Thread4S-PD2Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.9Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersToday’s discussionHow to implement global schedulers? Explore how implementation tradeoffs affect hread2Thread3Core 1Core 2Core 3Core 4L1CacheL1CacheL1CacheL1CacheMain MemoryThread1Thread4Instead ofconsideringoneimplementationof severaldifferentschedulingalgorithms L2 CacheThread1L1CacheL1CacheL1CacheL1CacheCore 5Core 6Core 7Core hread4Calandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.10Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersToday’s discussionHow to implement global schedulers? Explore how implementation tradeoffs affect schedulability. Case study: nine G-EDF variants on a Sun hread3Core 1Core 2Core 3Core 4L1CacheL1CacheL1CacheL1CacheMain MemoryThread1Thread4G-EDF G-EDFG-EDFL2 CacheThread1L1CacheL1CacheL1CacheL1CacheCore 5Core 6Core 7Core Thread1G-EDF DF G-EDFThread4G-EDF G-EDFCalandrino et al. (2006), LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE Real-Time Systems Symposium, pages 111–123.Brandenburg et al. (2008), On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proceedings of the 29th IEEE Real-Time Systems Symposium, pages 157–169.11Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersDesign Choices12Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersDesign Choices When to schedule. Quantum alignment. How to handle interrupts. How to queue pending jobs. How to manage future releases. How to avoid unnecessary preemptions.13Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersScheduler Invocation14Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersScheduler InvocationEvent-Driven on job release on job completion preemptions etion15Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersScheduler InvocationEvent-Driven on job release on job completion preemptions ven on every timer tick easier to implement on release a job is justenqueued; scheduler isinvoked at next tickP1015partially-used sday, April 5, 2011

On the Implementation of Global Real-Time SchedulersQuantum AlignmentAligned Tick synchronizedacross processors. Contention atquantum boundary!P1P20510release15completion17Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersQuantum AlignmentStaggered Ticks spread outacross quantum. Reduced bus andlock contention. Additional latency.Aligned Tick synchronizedacross processors. Contention atquantum esday, April 5, 2011

On the Implementation of Global Real-Time SchedulersQuantum AlignmentStaggered Ticks spread outacross quantum. Reduced bus andlock contention. Additional latency.P1P205delayAligned Tick synchronizedacross processors. Contention atquantum boundary!P1015partially-used sday, April 5, 2011

On the Implementation of Global Real-Time SchedulersQuantum AlignmentStaggered Ticks spread outacross quantum. Reduced bus andlock contention. Additional latency.staggering delaysP1T3zT2yT2yP205delayAligned Tick synchronizedacross processors. Contention atquantum boundary!P101015partially-used esday, April 5, 2011

On the Implementation of Global Real-Time SchedulersQuantum AlignmentStaggered Ticks spread outacross quantum. Reduced bus andlock contention. Additional latency.staggering delaysP1T3zT2yT2yP205delayAligned Tick synchronizedacross processors. Contention atquantum boundary!P101015partially-used esday, April 5, 2011

On the Implementation of Global Real-Time SchedulersInterrupt Handling22Tuesday, April 5, 2011

On the Implementation of Global Real-Time SchedulersInterrupt hread3Core 1Core 2Core 3Core 4L1CacheL1CacheL1CacheL1CacheMain MemoryThread1Thread4L2 CacheThread1L1CacheL1CacheL1CacheL1CacheCore 5Core 6Core 7Core lobal interrupt handling. Job releases triggered by interrupts. Interrupts may fire on any processor. Jobs may execute on any processor. Thus, in the worst case, a job may bedelayed by each interrupt.Thread423Tuesday, April

» Such algorithms schedule jobs one quantum at a time. - May therefore preempt and migrate jobs frequently. - Perhaps less of a concern on a multicore platform. Under most global algorithms, if utilization is not capped, deadline tardiness is bounded. » Sufficient for soft real-time systems.