Transcription
Spin Locks and ContentionCompanion slides forThe Art of Multiprocessor Programmingby Maurice Herlihy & Nir Shavit
Focus so far: Correctness andProgress Models– Accurate (we never lied to you)– But idealized (so we forgot to mention a few things) Protocols– Elegant– Important– But naïveArt of Multiprocessor Programming2
New Focus: Performance Models– More complicated (not the same as complex!)– Still focus on principles (not soon obsolete) Protocols– Elegant (in their fashion)– Important (why else would we pay attention)– And realistic (your mileage may vary)Art of Multiprocessor Programming3
Kinds of Architectures SISD (Uniprocessor)– Single instruction stream– Single data stream SIMD (Vector)– Single instruction– Multiple data MIMD (Multiprocessors)– Multiple instruction– Multiple data.Art of Multiprocessor Programming4
Kinds of Architectures SISD (Uniprocessor)– Single instruction stream– Single data stream SIMD (Vector)Our space– Single instruction– Multiple data MIMD (Multiprocessors)– Multiple instruction– Multiple data.(1)Art of Multiprocessor Programming5
MIMD ArchitecturesmemoryShared BusDistributed Memory Contention Communication Contention Communication LatencyArt of Multiprocessor Programming6
Today: Revisit Mutual Exclusion Performance, not just correctness Proper use of multiprocessorarchitectures A collection of locking algorithms Art of Multiprocessor Programming7 (1)
What Should you do if you can’tget a lock? Keep trying– “spin” or “busy-wait”– Good if delays are short Give up the processor– Good if delays are long– Always good on uniprocessorArt of Multiprocessor Programming8 (1)
What Should you do if you can’tget a lock? Keep trying– “spin” or “busy-wait”– Good if delays are short Give up the processor– Good if delays are long– Always good on uniprocessorour focusArt of Multiprocessor Programming9
Basic Spin-Lock.CSspinlockcriticalsectionArt of Multiprocessor ProgrammingResets lockupon exit10
Basic Spin-Lock lock introducessequential bottleneck.CSspinlockcriticalsectionArt of Multiprocessor ProgrammingResets lockupon exit11
Basic Spin-Lock lock suffers from contention.CSspinlockcriticalsectionArt of Multiprocessor ProgrammingResets lockupon exit12
Basic Spin-Lock lock suffers from contention.CSspinlockcriticalsectionResets lockupon exitNotice: these are distinctphenomenaArt of Multiprocessor Programming13
Basic Spin-Lock lock suffers from contention.CSspinlockcriticalsectionResets lockupon exitSeq Bottleneck no parallelismArt of Multiprocessor Programming14
Basic Spin-Lock lock suffers from contention.CSspinlockcriticalsectionResets lockupon exitContention ?Art of Multiprocessor Programming15
Review: Test-and-Set Boolean value Test-and-set (TAS)– Swap true with current value– Return value tells if prior value was true orfalse Can reset just by writing false TAS aka “getAndSet”Art of Multiprocessor Programming16
Review: Test-and-Setpublic class AtomicBoolean {boolean value;public synchronized booleangetAndSet(boolean newValue) {boolean prior value;value newValue;return prior;}}Art of Multiprocessor Programming17 (5)
Review: Test-and-Setpublic class AtomicBoolean {boolean value;}public synchronized booleangetAndSet(boolean newValue) {boolean prior value;value newValue;return prior;}Packagejava.util.concurrent.atomicArt of Multiprocessor Programming18
Review: Test-and-Setpublic class AtomicBoolean {boolean value;public synchronized booleangetAndSet(boolean newValue) {boolean prior value;value newValue;return prior;}}Swap old and newvaluesArt of Multiprocessor Programming19
Review: Test-and-SetAtomicBoolean lock new AtomicBoolean(false) boolean prior lock.getAndSet(true)Art of Multiprocessor Programming20
Review: Test-and-SetAtomicBoolean lock new AtomicBoolean(false) boolean prior lock.getAndSet(true)Swapping in true is called“test-and-set” or TASArt of Multiprocessor Programming21 (5)
Test-and-Set Locks Locking– Lock is free: value is false– Lock is taken: value is true Acquire lock by calling TAS– If result is false, you win– If result is true, you lose Release lock by writing falseArt of Multiprocessor Programming22
Test-and-set Lockclass TASlock {AtomicBoolean state new AtomicBoolean(false);void lock() {while (state.getAndSet(true)) {}}void unlock() {state.set(false);}}Art of Multiprocessor Programming23
Test-and-set Lockclass TASlock {AtomicBoolean state new AtomicBoolean(false);void lock() {while (state.getAndSet(true)) {}}void unlock() {state.set(false);Lock state is}}AtomicBooleanArt of Multiprocessor Programming24
Test-and-set Lockclass TASlock {AtomicBoolean state new AtomicBoolean(false);void lock() {while (state.getAndSet(true)) {}}void unlock() {state.set(false);Keep trying until}}lock acquiredArt of Multiprocessor Programming25
Test-and-set Lockclass TASlock {ReleaselockAtomicBooleanstate by resettingnew AtomicBoolean(false);state to falsevoid lock() {while (state.getAndSet(true)) {}}void unlock() {state.set(false);}}Art of Multiprocessor Programming26
Space Complexity TAS spin-lock has small “footprint”N thread spin-lock uses O(1) spaceAs opposed to O(n) Peterson/BakeryHow did we overcome the W(n) lowerbound? We used a RMW operation Art of Multiprocessor Programming27
Performance Experiment– n threads– Increment shared counter 1 million times How long should it take? How long does it take?Art of Multiprocessor Programming28
timeGraphno speedupbecause ofsequentialbottleneckidealthreadsArt of Multiprocessor Programming29
Mystery #1timeTAS lockIdealthreadsArt of Multiprocessor ProgrammingWhat isgoingon?30
Test-and-Test-and-Set Locks Lurking stage– Wait until lock “looks” free– Spin while read returns true (lock taken) Pouncing state– As soon as lock “looks” available– Read returns false (lock free)– Call TAS to acquire lock– If TAS loses, back to lurkingArt of Multiprocessor Programming31
Test-and-test-and-set Lockclass TTASlock {AtomicBoolean state new AtomicBoolean(false);void lock() {while (true) {while (state.get()) {}if (!state.getAndSet(true))return;}}Art of Multiprocessor Programming32
Test-and-test-and-set Lockclass TTASlock {AtomicBoolean state new AtomicBoolean(false);void lock() {while (true) {while (state.get()) {}if (!state.getAndSet(true))return;}}Wait until lock looks freeArt of Multiprocessor Programming33
Test-and-test-and-set Lockclass TTASlock {AtomicBoolean state new AtomicBoolean(false);Then try toacquire itvoid lock() {while (true) {while (state.get()) {}if (!state.getAndSet(true))return;}}Art of Multiprocessor Programming34
Mystery #2TAS locktimeTTAS lockIdealthreadsArt of Multiprocessor Programming35
Mystery Both– TAS and TTAS– Do the same thing (in our model) Except that– TTAS performs much better than TAS– Neither approaches idealArt of Multiprocessor Programming36
Opinion Our memory abstraction is broken TAS & TTAS methods– Are provably the same (in our model)– Except they aren’t (in field tests) Need a more detailed model Art of Multiprocessor Programming37
Bus-Based ArchitecturescachecachecacheBusmemoryArt of Multiprocessor Programming38
Bus-Based ArchitecturesRandom access memory(10s of cycles)cachecachecacheBusmemoryArt of Multiprocessor Programming39
Bus-Based ArchitecturesShared Bus Broadcast medium One broadcaster at a time Processors and memory all“snoop”cachecachecacheBusmemoryArt of Multiprocessor Programming40
Per-Processor CachesBus-Based Small Architectures Fast: 1 or 2 cycles Address & state informationcachecachecacheBusmemoryArt of Multiprocessor Programming41
Granularity Caches operate at a larger granularitythan a word Cache line: fixed-size block containingthe address (today 64 or 128 bytes)Art of Multiprocessor Programming42
Locality If you use an address now, you willprobably use it again soon– Fetch from cache, not memory If you use an address now, you willprobably use a nearby address soon– In the same cache lineArt of Multiprocessor Programming43
L1 and L2 CachesL2L1Art of Multiprocessor Programming44
L1 and L2 CachesL2L1Art of Multiprocessor ProgrammingSmall & fast1 or 2 cycles45
L1 and L2 CachesLarger and slower10s of cycles 128 byte lineL2L1Art of Multiprocessor Programming46
Jargon Watch Cache hit– “I found what I wanted in my cache”– Good Thing Art of Multiprocessor Programming47
Jargon Watch Cache hit– “I found what I wanted in my cache”– Good Thing Cache miss– “I had to shlep all the way to memory forthat data”– Bad Thing Art of Multiprocessor Programming48
Cave Canem This model is still a simplification– But not in any essential way– Illustrates basic principles Will discuss complexities laterArt of Multiprocessor Programming49
When a Cache BecomesFull Need to make room for new entry By evicting an existing entry Need a replacement policy– Usually some kind of least recently usedheuristicArt of Multiprocessor Programming50
Fully Associative Cache Any line can be anywhere in the cache– Advantage: can replace any line– Disadvantage: hard to find linesArt of Multiprocessor Programming51
Direct Mapped Cache Every address has exactly 1 slot– Advantage: easy to find a line– Disadvantage: must replace fixed lineArt of Multiprocessor Programming52
K-way Set Associative Cache Each slot holds k lines– Advantage: pretty easy to find a line– Advantage: some choice in replacing lineArt of Multiprocessor Programming53
Multicore Set Associativity k is 8 or even 16 and growing – Why? Because cores share sets– Threads cut effective size if accessingdifferent dataArt of Multiprocessor Programming54
Cache Coherence A and B both cache address x A writes to x– Updates cache How does B find out? Many cache coherence protocols inliteratureArt of Multiprocessor Programming55
MESI Modified– Have modified cached data, must writeback to memoryArt of Multiprocessor Programming56
MESI Modified– Have modified cached data, must writeback to memory Exclusive– Not modified, I have only copyArt of Multiprocessor Programming57
MESI Modified– Have modified cached data, must writeback to memory Exclusive– Not modified, I have only copy Shared– Not modified, may be cached elsewhereArt of Multiprocessor Programming58
MESI Modified– Have modified cached data, must write back tomemory Exclusive– Not modified, I have only copy Shared– Not modified, may be cached elsewhere Invalid– Cache contents not meaningfulArt of Multiprocessor Programming59
Processor Issues Load Requestload xcachecachecacheBusmemoryArt of Multiprocessor Programmingdata60
Memory RespondsEcachecachecacheBusBusGot it!memoryArt of Multiprocessor Programmingdata61
Processor Issues Load RequestLoad xEdatacachecacheBusmemoryArt of Multiprocessor Programmingdata62
Other Processor RespondsGot itSEdataScachecacheBusmemoryArt of Multiprocessor ProgrammingBusdata63
Modify Cached DataSdataSdatadatacacheBusmemoryArt of Multiprocessor Programmingdata64
Write-Through CacheWrite x!SdataSdatacacheBusmemoryArt of Multiprocessor Programmingdata65
Write-Through Caches Immediately broadcast changes Good– Memory, caches always agree– More read hits, maybe Bad– Bus traffic on all writes– Most writes to unshared data– For example, loop indexes Art of Multiprocessor Programming66
Write-Through Caches Immediately broadcast changes Good“show stoppers”– Memory, caches always agree– More read hits, maybe Bad– Bus traffic on all writes– Most writes to unshared data– For example, loop indexes Art of Multiprocessor Programming67
Write-Back Caches Accumulate changes in cache Write back when line evicted– Need the cache for something else– Another processor wants itArt of Multiprocessor Programming68
ryArt of Multiprocessor Programmingdata69
InvalidatecachedatacacheBusThis cache acquires write permissionmemoryArt of Multiprocessor Programmingdata70
InvalidateOther caches lose read permissioncachedatacacheBusThis cache acquires write permissionmemoryArt of Multiprocessor Programmingdata71
InvalidateMemory provides data only if not presentin any cache, so no need to change itdatacachecache now (expensive)BusmemoryArt of Multiprocessor Programmingdata72
Mutual Exclusion What do we want to optimize?– Bus bandwidth used by spinning threads– Release/Acquire latency– Acquire latency for idle lockArt of Multiprocessor Programming73
Simple TASLock TAS invalidates cache lines Spinners– Miss in cache– Go to bus Thread wants to release lock– delayed behind spinnersArt of Multiprocessor Programming74
Test-and-test-and-set Wait until lock “looks” free– Spin on local cache– No bus use while lock busy Problem: when lock is released– Invalidation storm Art of Multiprocessor Programming75
Local Spinning while Lock isBusybusybusybusyBusmemoryArt of Multiprocessor Programmingbusy76
On ReleaseinvalidinvalidfreeBusmemoryArt of Multiprocessor Programmingfree77
On ReleaseEveryone rt of Multiprocessor Programmingfree78 (1)
On ReleaseEveryone tries TASTAS( )invalidTAS( )invalidfreeBusmemoryArt of Multiprocessor Programmingfree79 (1)
Problems Everyone misses– Reads satisfied sequentially Everyone does TAS– Invalidates others’ caches Eventually quiesces after lock acquired– How long does this take?Art of Multiprocessor Programming80
Measuring Quiescence Time Acquire lock Pause without using bus Use bus heavilyP1P2PnIf pause quiescence time,critical section duration independent of number of threadsIf pause quiescence time,critical section duration slower with more threadsArt of Multiprocessor Programming81
Quiescence TimetimeIncreseslinearly withthe number ofprocessors forbus architecturethreadsArt of Multiprocessor Programming82
Mystery ExplainedTAS locktimeTTAS lockIdealBetter than TASthreads but still not asgood as idealArt of Multiprocessor Programming83
Solution: Introduce Delay If the lock looks free But I fail to get it There must be contention Better to back off than to collide againtimer2dr1dArt of Multiprocessor Programmingdspin lock84
Dynamic Example:Exponential Backofftime4d2ddspin lockIf I fail to get lock– Wait random duration before retry– Each subsequent failure doublesexpected waitArt of Multiprocessor Programming85
Exponential Backoff Lockpublic class Backoff implements lock {public void lock() {int delay MIN DELAY;while (true) {while (state.get()) {}if (!lock.getAndSet(true))return;sleep(random() % delay);if (delay MAX DELAY)delay 2 * delay;}}}Art of Multiprocessor Programming86
Exponential Backoff Lockpublic class Backoff implements lock {public void lock() {int delay MIN DELAY;while (true) {while (state.get()) {}if (!lock.getAndSet(true))return;sleep(random() % delay);if (delay MAX DELAY)delay 2 * delay;Fix minimum delay}}}Art of Multiprocessor Programming87
Exponential Backoff Lockpublic class Backoff implements lock {public void lock() {int delay MIN DELAY;while (true) {while (state.get()) {}if (!lock.getAndSet(true))return;sleep(random() % delay);if (delay MAX DELAY)delay 2 * delay;Wait until lock looks free}}}Art of Multiprocessor Programming88
Exponential Backoff Lockpublic class Backoff implements lock {public void lock() {int delay MIN DELAY;while (true) {while (state.get()) {}if (!lock.getAndSet(true))return;sleep(random() % delay);if (delay MAX DELAY)delay 2 * delay;If we win, return}}}Art of Multiprocessor Programming89
Exponential Backoff Lockpublic class Backoff implements lock {Backlock()off for{ random durationpublic voidint delay MIN DELAY;while (true) {while (state.get()) {}if (!lock.getAndSet(true))return;sleep(random() % delay);if (delay MAX DELAY)delay 2 * delay;}}}Art of Multiprocessor Programming90
Exponential Backoff Lockpublic class Backoff implements lock {Doubledelay,within reasonpublicvoid maxlock(){int delay MIN DELAY;while (true) {while (state.get()) {}if (!lock.getAndSet(true))return;sleep(random() % delay);if (delay MAX DELAY)delay 2 * delay;}}}Art of Multiprocessor Programming91
Spin-Waiting OverheadtimeTTAS LockBackoff lockthreadsArt of Multiprocessor Programming92
Backoff: Other Issues Good– Easy to implement– Beats TTAS lock Bad– Must choose parameters carefully– Not portable across platformsArt of Multiprocessor Programming93
Idea Avoid useless invalidations– By keeping a queue of threads Each thread– Notifies next in line– Without bothering the othersArt of Multiprocessor Programming95
Anderson Queue LockidlenextflagsTFFFFArt of Multiprocessor ProgrammingFFF96
Anderson Queue LockacquiringnextgetAndIncrementflagsTFFFFArt of Multiprocessor ProgrammingFFF97
Anderson Queue LockacquiringnextgetAndIncrementflagsTFFFFArt of Multiprocessor ProgrammingFFF98
Anderson Queue LockacquirednextMine!flagsTFFFFArt of Multiprocessor ProgrammingFFF99
Anderson Queue LockacquiringacquirednextflagsTFFFFArt of Multiprocessor ProgrammingFFF100
Anderson Queue Art of Multiprocessor ProgrammingFFF101
Anderson Queue Art of Multiprocessor ProgrammingFFF102
Anderson Queue LockacquiringacquirednextflagsTFFFFArt of Multiprocessor ProgrammingFFF103
Anderson Queue LockacquiredreleasednextflagsTTFFFArt of Multiprocessor ProgrammingFFF104
Anderson Queue LockacquiredreleasednextYow!flagsTTFFFArt of Multiprocessor ProgrammingFFF105
Anderson Queue Lockclass ALock implements Lock {boolean[] flags {true,false, ,false};AtomicInteger next new AtomicInteger(0);ThreadLocal Integer mySlot;Art of Multiprocessor Programming106
Anderson Queue Lockclass ALock implements Lock {boolean[] flags {true,false, ,false};AtomicInteger next new AtomicInteger(0);ThreadLocal Integer mySlot;One flag per threadArt of Multiprocessor Programming107
Anderson Queue Lockclass ALock implements Lock {boolean[] flags {true,false, ,false};AtomicInteger next new AtomicInteger(0);ThreadLocal Integer mySlot;Next flag to useArt of Multiprocessor Programming108
Anderson Queue Lockclass ALock implements Lock {boolean[] flags {true,false, ,false};AtomicInteger next new AtomicInteger(0);ThreadLocal Integer mySlot;Thread-local variableArt of Multiprocessor Programming109
Anderson Queue Lockpublic lock() {mySlot next.getAndIncrement();while (!flags[mySlot % n]) {};flags[mySlot % n] false;}public unlock() {flags[(mySlot 1) % n] true;}Art of Multiprocessor Programming110
Anderson Queue Lockpublic lock() {mySlot next.getAndIncrement();while (!flags[mySlot % n]) {};flags[mySlot % n] false;}public unlock() {flags[(mySlot 1) % n] true;Take next}Art of Multiprocessor Programmingslot111
Anderson Queue Lockpublic lock() {mySlot next.getAndIncrement();while (!flags[mySlot % n]) {};flags[mySlot % n] false;}public unlock() {flags[(mySlot 1) % n] true;Spin until told}Art of Multiprocessor Programmingto go112
Anderson Queue Lockpublic lock() {myslot next.getAndIncrement();while (!flags[myslot % n]) {};flags[myslot % n] false;}public unlock() {flags[(myslot 1) % n] true;}Prepare slot forArt of Multiprocessor Programmingre-use113
Anderson Queue Lockpublic lock() {Tell next thread tomySlot next.getAndIncrement();while (!flags[mySlot % n]) {};flags[mySlot % n] false;}gopublic unlock() {flags[(mySlot 1) % n] true;}Art of Multiprocessor Programming114
Local FFUnfortunately many bits share cache lineArt of Multiprocessor Programming115
False SharingacquiredreleasednextSpinonmybitSpinning threadResult:contentionflagsTFFFFgets cacheinvalidation onFF of storeFaccountby threads it isnot waiting forLine 1Art of Multiprocessor Programming Line 2116
The Solution: Line 1Art of Multiprocessor Programming Line 2//117
PerformanceTTAS Shorter handover thanbackoffqueue Curve is practically flat Scalable performanceArt of Multiprocessor Programming118
Anderson Queue LockGood– First truly scalable lock– Simple, easy to implement– Back to FCFS order (like Bakery)Art of Multiprocessor Programming119
Anderson Queue LockBad– Space hog – One bit per thread one cache lineper thread What if unknown number of threads? What if small number of actualcontenders?Art of Multiprocessor Programming120
CLH Lock FCFS order Small, constant-size overhead perthreadArt of Multiprocessor Programming121
InitiallyidletailfalseArt of Multiprocessor Programming122
InitiallyidletailfalseArt of Multiprocessor ProgrammingQueue tail123
InitiallyidleLock is freetailfalseArt of Multiprocessor Programming124
InitiallyidletailfalseArt of Multiprocessor Programming125
Purple Wants the LockacquiringtailfalseArt of Multiprocessor Programming126
Purple Wants the LockacquiringtailfalsetrueArt of Multiprocessor Programming127
Purple Wants the LockacquiringSwaptailfalsetrueArt of Multiprocessor Programming128
Purple Has the LockacquiredtailfalsetrueArt of Multiprocessor Programming129
Red Wants the LockacquiredacquiringtailfalsetrueArt of Multiprocessor Programmingtrue130
Red Wants the LockacquiredacquiringSwaptailfalsetrueArt of Multiprocessor Programmingtrue131
Red Wants the LockacquiredacquiringtailfalsetrueArt of Multiprocessor Programmingtrue132
Red Wants the LockacquiredacquiringtailfalsetrueArt of Multiprocessor Programmingtrue133
Red Wants the LockacquiredacquiringImplicitLinked listtailfalsetrueArt of Multiprocessor Programmingtrue134
Red Wants the LockacquiredacquiringtailfalsetrueArt of Multiprocessor Programmingtrue135
Red Wants the LockacquiredacquiringActually, itspins oncached copytruetailfalsetrueArt of Multiprocessor Programmingtrue136
Purple rt of Multiprocessor Programmingtrue137
Purple ReleasesreleasedacquiredtailtrueArt of Multiprocessor Programming138
Space Usage Let– L number of locks– N number of threads ALock– O(LN) CLH lock– O(L N)Art of Multiprocessor Programming139
CLH Queue Lockclass QNode {AtomicBoolean locked new AtomicBoolean(true);}Art of Multiprocessor Programming140
CLH Queue Lockclass QNode {AtomicBoolean locked new AtomicBoolean(true);}Not released yetArt of Multiprocessor Programming141
CLH Queue Lockclass CLHLock implements Lock {AtomicReference QNode tail;ThreadLocal QNode myNode new QNode();public void lock() {QNode pred tail.getAndSet(myNode);while (pred.locked) {}}}Art of Multiprocessor Programming142
CLH Queue Lockclass CLHLock implements Lock {AtomicReference QNode tail;ThreadLocal QNode myNode new QNode();public void lock() {QNode pred tail.getAndSet(myNode);while (pred.locked) {}}}Queue tailArt of Multiprocessor Programming143
CLH Queue Lockclass CLHLock implements Lock {AtomicReference QNode tail;ThreadLocal QNode myNode new QNode();public void lock() {QNode pred tail.getAndSet(myNode);while (pred.locked) {}}}Thread-local QNodeArt of Multiprocessor Programming144
CLH Queue Lockclass CLHLock implements Lock {AtomicReference QNode tail;ThreadLocal QNode myNode new QNode();Swap in my nodepublic void lock() {QNode pred tail.getAndSet(myNode);while (pred.locked) {}}}Art of Multiprocessor Programming145
CLH Queue Lockclass CLHLock implements Lock {AtomicReference QNode tail;ThreadLocal QNode myNode new QNode();Spin until predecessorreleases lockpublic void lock() {QNode pred tail.getAndSet(myNode);while (pred.locked) {}}}Art of Multiprocessor Programming146
CLH Queue LockClass CLHLock implements Lock { public void unlock() {myNode.locked.set(false);myNode pred;}}Art of Multiprocessor Programming147
CLH Queue LockClass CLHLock implements Lock { public void unlock() {myNode.locked.set(false);myNode pred;}}Notify successorArt of Multiprocessor Programming148
CLH Queue LockClass CLHLock implements Lock { public void unlock() {myNode.locked.set(false);myNode pred;}}Recyclepredecessor’s nodeArt of Multiprocessor Programming149
CLH Queue LockClass CLHLock implements Lock { public void unlock() {myNode.locked.set(false);myNode pred;}}(we don’t actually reuse myNode.Code in book shows how it’s done.)Art of Multiprocessor Programming150
CLH Lock Good– Lock release affects predecessor only– Small, constant-sized space Bad– Doesn’t work for uncached NUMAarchitecturesArt of Multiprocessor Programming151
NUMA and cc-NUMAArchitectures Acronym:– Non-Uniform Memory Architecture– ccNUMA cache coherent NUMA Illusion:– Flat shared memory Truth:– No caches (sometimes)– Some memory regions faster than othersArt of Multiprocessor Programming152
NUMA MachinesSpinning on localmemory is fastArt of Multiprocessor Programming153
NUMA MachinesSpinning on remotememory is slowArt of Multiprocessor Programming154
CLH Lock Each thread spins on predecessor’smemory Could be far away Art of Multiprocessor Programming155
MCS Lock FCFS order Spin on local memory only Small, Constant-size overheadArt of Multiprocessor Programming156
InitiallyidletailfalseArt of Multiprocessor Programming157
Acquiringacquiring(allocate QNode)truetailfalseArt of Multiprocessor Programming158
AcquiringacquiredswaptruetailfalseArt of Multiprocessor Programming159
AcquiringacquiredtruetailfalseArt of Multiprocessor Programming160
AcquiredacquiredtruetailfalseArt of Multiprocessor Programming161
AcquiringacquiredacquiringfalsetailswapArt of Multiprocessor Programmingtrue162
AcquiringacquiredacquiringfalsetailtrueArt of Multiprocessor Programming163
AcquiringacquiredacquiringfalsetailtrueArt of Multiprocessor Programming164
AcquiringacquiredacquiringfalsetailtrueArt of Multiprocessor Programming165
AcquiringacquiredacquiringtruetailtruefalseArt of Multiprocessor Programming166
AcquiringacquiredacquiringYes!truetailfalsetrueArt of Multiprocessor Programming167
MCS Queue Lockclass QNode {volatile boolean locked false;volatile qnodenext null;}Art of Multiprocessor Programming168
MCS Queue Lockclass MCSLock implements Lock {AtomicReference tail;public void lock() {QNode qnode new QNode();QNode pred tail.getAndSet(qnode);if (pred ! null) {qnode.locked true;pred.next qnode;while (qnode.locked) {}}}}Art of Multiprocessor Programming169
MCS Queue Lockclass MCSLock implements Lock {Make aAtomicReference tail;QNodepublic void lock() {QNode qnode new QNode();QNode pred tail.getAndSet(qnode);if (pred ! null) {qnode.locked true;pred.next qnode;while (qnode.locked) {}}}}Art of Multiprocessor Programming170
MCS Queue Lockclass MCSLock implements Lock {AtomicReference tail;public void lock() {QNode qnode new QNode();QNode pred tail.getAndSet(qnode);if (pred ! null) {qnode.locked true;add my Node topred.next qnode;the tail ofwhile (qnode.locked) {}queue}}}Art of Multiprocessor Programming171
MCS Queue Lockclass MCSLock implements Lock {Fix if queue wasAtomicReference tail;non-emptypublic void lock() {QNode qnode new QNode();QNode pred tail.getAndSet(qnode);if (pred ! null) {qnode.locked true;pred.next qnode;while (qnode.locked) {}}}}Art of Multiprocessor Programming172
MCS Queue Lockclass MCSLock implements Lock {AtomicReference tail;Wait untilpublic void lock() {unlockedQNode qnode new QNode();QNode pred tail.getAndSet(qnode);if (pred ! null) {qnode.locked true;pred.next qnode;while (qnode.locked) {}}}}Art of Multiprocessor Programming173
Purple ReleasereleasingswapfalsefalseArt of Multiprocessor Programming174
I don’t Releasesee a successor. But byPurplelooking at the queue, I seeanother thread is activereleasingswapfalsefalseArt of Multiprocessor Programming175
I don’t Releasesee a successor. But byPurplelooking at the queue, I seeanother thread is activereleasingswapfalsefalseI have to release thatthread so must wait for itArt of Multiprocessor Programming to identify its node176
Purple Releasereleasingprepare to spintruefalseArt of Multiprocessor Programming177
Purple ReleasereleasingspinningtruefalseArt of Multiprocessor Programming178
Purple ReleasereleasingspinningfalsetruefalseArt of Multiprocessor Programming179
Purple ReleasereleasingAcquired lockfalsetruefalseArt of Multiprocessor Programming180
MCS Queue Unlockclass MCSLock implements Lock {AtomicReference tail;public void unlock() {if (qnode.next null) {if (tail.CAS(qnode, null)return;while (qnode.next null) {}}qnode.next.locked false;}}Art of Multiprocessor Programming181
MCS Queue Lockclass MCSLock implements Lock {AtomicReference tail;public void unlock() {if (qnode.next null) {if (tail.CAS(qnode, null)return;while (qnode.next null) {}}qnode.next.locked false; Missingsuccessor}}Art of Multiprocessor Programming?182
MCS Queue Lockclass MCSLock implements Lock {If really no successor,AtomicReference tail;returnpublic void unlock() {if (qnode.next null) {if (tail.CAS(qnode, null)return;while (qnode.next null) {}}qnode.next.locked false;}}Art of Multiprocessor Programming183
MCS Queue Lockclass MCSLock implements Lock {Otherwise wait forAtomicReference tail;successortocatchuppublic void unlock() {if (qnode.next null) {if (tail.CAS(qnode, null)return;while (qnode.next null) {}}qnode.next.locked false;}}Art of Multiprocessor Programming184
MCS Queue Lockclass MCSLock implements Lock {AtomicReference queue;lock{to successorpublic voidPassunlock()if (qn
(we never lied to you) -But idealized (so we forgot to mention a few things) Protocols -Elegant -Important -But naïve. Art of Multiprocessor Programming 3 New Focus: Performance . probably use it again soon -Fetch from cache, not memory If you use an address now, you will