Learning Gem5 Part III

Transcription

Learning gem5 – Part IIIModeling Cache Coherence with Ruby and SLICCJason .engineering.ucdavis.edu/lowepower/Jason Lowe-Power jason@lowepower.com 1

gem5 historyM5 GEMSM5: “Classic” caches, CPU model, master/slave port interfaceGEMS: Ruby networkJason Lowe-Power jason@lowepower.com 2

OutlineRuby overviewSLICC controller detailsConfiguring RubyA few other small thingsJason Lowe-Power jason@lowepower.com 3

CPUCPUCPUDMAOther“Classic” portsRUBY“Classic” portsDRAMCtrlDRAMCtrlJason Lowe-Power jason@lowepower.com 4

RubyL2 CachecontrollerL1 CachecontrollerL1 ectorycontrollerJason Lowe-Power jason@lowepower.com DirectorycontrollerYYYController5

Ruby componentsController models (e.g., caches)Controller topology (how are caches connected)Main goalNetwork model (e.g., on-chip routers)Flexibility, not usabilityInterface (“classic” ports in/out)Jason Lowe-Power jason@lowepower.com 6

Controller ModelsImplemented in SLICCCode for controllers is “generated” via SLICC compilerSLICC: Specification Language including Cache CoherenceJason Lowe-Power jason@lowepower.com 7

SLICC original purposeFrom: A Primer on Memory Consistency and Cache CoherenceDaniel J. Sorin, Mark D. Hill, and David A. WoodJason Lowe-Power jason@lowepower.com 8

SLICC original purpose**Actual outputJason Lowe-Power jason@lowepower.com 9

ExamplesThis is a very quick overviewSee http://learning.gem5.org/book/part3 for more detailsBased on coherence protocols in Synthesis LectureA Primer on Memory Consistency and Cache CoherenceDaniel J. Sorin, Mark D. Hill, and David A. WoodJason Lowe-Power jason@lowepower.com 10

MSI-cache.smmachine(MachineType:L1Cache, "MSI cache"): Sequencer *sequencer; // Incoming request from CPU come from thisCacheMemory *cacheMemory; // This stores the data and cache statesbool send evictions; // Needed to support O3 CPU and mwait. . .{. . .}Jason Lowe-Power jason@lowepower.com 11

MSI-cache.smL1Cache Controller.pyL1Cache Controller.cc/hhSwitch!Important!Never modify these files! SimObject “declaration file” Inherits from AbstractController bool send evictions - send evictions Param.Bool("") Implementation of the SimObject Just a SimObjectL1Cache Entry.cc/hhL1Cache State.cc/hhL1Cache Transitions.cc/hhL1Cache Wakeup.cc/hhOthers Jason Lowe-Power jason@lowepower.com 12

Cache state machine outlineParameters:Cache memory: Where the data is storedMessage buffers: Sending/receiving messages from networkState declarations: The stable and transient statesEvent declarations: State machine events that will be “triggered”Other structures and functions: Entries, TBEs, get/setState, etc.In ports: Trigger events based on incoming messagesActions: Execute single operations on cache structuresTransitions: Move from state to state and execute actionsJason Lowe-Power jason@lowepower.com 13

Cache memorySee src/mem/ruby/structures/CacheMemoryStores the cache data (Entry) and the state (State)cacheProbe() returns the replacement address if cache is fullImportant!Must call setMRU on each access!Jason Lowe-Power jason@lowepower.com 14

Message buffersDeclaring is confusing!MessageBuffer * requestToDir, network "To", virtual network "0", vnet type "request";MessageBuffer * forwardFromDir, network "From", virtual network "1", vnet type "forward";Switch!peek(): Get the head messagepop(): Remove head message (don’t forget this!)isReady(): Is there a message?recycle(): Move the head to the tail (better perf., but unrealisitic)stallAndWait(): Move (stalled) message to different bufferJason Lowe-Power jason@lowepower.com 15

State declarationsAccessPermission: UsedAccessPermission:Invalid, desc "Not present/Invalid";for functional accessesstate declaration(State, desc "Cache states") {I,// States moving out of IIS D,AccessPermission:Invalid, desc "Invalid, moving to S, waiting for data“;IM AD,AccessPermission:Invalid, desc "Invalid, moving to M, waiting for acks and data";IM A,AccessPermission:Busy,S,desc "Invalid, moving to M, waiting for acks";IS D - Read: “Invalid transitioning toAccessPermission:Read Only,desc "Shared.Read-only,caches may have the block";SharedwaitingforotherData”. . .}Jason Lowe-Power jason@lowepower.com 16

Event declarationsenumeration(Event, desc "Cache events") {// From the processor/sequencer/mandatory queueLoad,desc "Load from processor";Store,desc "Store from processor";// Internal event (only triggered from processor requests)Replacement,desc "Triggered when block is chosen as victim";// Forwarded request from other cache via dir on theFwdGetS,desc "Directory sent us a request to"We must have the block in M toFwdGetM,desc "Directory sent us a request to. . .Jason Lowe-Power jason@lowepower.com forwardsatisfyrespondsatisfynetworkGetS. ";to this.";GetM. ";17

Other structures and functionsEntry: Declare the data structure for each entryBlock data, block state, sometimes others (e.g., tokens)TBE/TBETable: Transient Buffer EntryLike an MSHR, but not exactly (allocated more often)Holds data for blocks in transient statesget/set State, AccessPermissions, functional read/writeRequired to implement AbstractControllerUsually just copy-paste from examplesJason Lowe-Power jason@lowepower.com 18

Ports/Message buffersNot gem5 ports!out port: “Rename” the message buffer and declare message typein port: Much of the SLICC “magic” here.Called every cycleLook at head messageTrigger eventsJason Lowe-Power jason@lowepower.com Switch!19

Weirdsyntax!In portsAutomatically populates “in msg”in port(forward in, RequestMsg,forwardToCache)in the followingblock {if (forward in.isReady(clockEdge())) {peek(forward in, RequestMsg) {Entry cache entry : getCacheEntry(in msg.addr);TBE tbe : TBEs[in msg.addr];if (in msg.Type CoherenceRequestType:GetS) {trigger(Event:FwdGetS, in msg.addr, cache entry, tbe);} elseTrigger() looks for a transition. It. . .also ensures resources available.Jason Lowe-Power jason@lowepower.com 20

ActionsLike “peek”, but populates out msgaction(sendGetM, "gM", desc "Send GetM to the directory") {enqueue(request out, RequestMsg, 1) {out msg.addr : address;out msg.Type : CoherenceRequestType:GetM;out chineType:Directory));Some variables are implicitin actions. Theseout msg.MessageSize: MessageSizeType:Control;are passedin via trigger() in in port.out msg.Requestor : machineID;address, cache entry, tbe}}Switch!Jason Lowe-Power jason@lowepower.com 21

TransitionsBegin statetransition(I, Store, IM AD) oryQueue;}End stateOn eventEither eventtransition({IM AD, SM AD}, {DataDirNoAcks, DataOwner}, M) opResponseQueue;}Either stateJason Lowe-Power jason@lowepower.com 22

Complete x.htmlJason Lowe-Power jason@lowepower.com 23

More details athttp://learning.gem5.org/book/part3Jason Lowe-Power jason@lowepower.com 24

Ruby config scriptsDon’t follow gem5 style closely :(Require lots of boilerplateJason Lowe-Power jason@lowepower.com 25

Ruby config scripts1. Instantiate the controllersHere is where you pass all of the options from the *.sm file2. Create a Sequencer for each CPUMore details in a moment3. Create and connect all of the network routersJason Lowe-Power jason@lowepower.com 26

Creating the topologyUsually hidden in “create topology” (see configs/topologies)Problem: These make assumptions about controllersInappropriate for non-default protocolsPoint-to-point exampleJason Lowe-Power jason@lowepower.com 27

self.routers self.ext linksAn “external” link between the[Switch(router id i) for i in range(len(controllers))]controller and the networkOne router per ext node c, [SimpleExtLink(link id i,int node self.routers[i])for i, controllerc in enumerate(controllers)]link count 0self.int links []for ri in self.routers:for rj in self.routers:if ri rj: continue # Don't connect a router to itself!link count 1self.int links.append(SimpleIntLink(link id link count,src node ri,dst node rj))An “internal” link between each ofthe routers to every other routerJason Lowe-Power jason@lowepower.com 28

Ports - Ruby interfaceCPUCPUCPUDMAOther“Classic” portsRUBY“Classic” portsDRAMCtrlDRAMCtrlJason Lowe-Power jason@lowepower.com 29

Ruby - MemoryRUBYDRAMCtrlDRAMCtrlAny controller can connect its “memory” port.Usually, only “directory controllers.You can send messages on this port in SLICCwith queueMemoryRead/WriteResponses come on special message buffer(responseFromMemory)Jason Lowe-Power jason@lowepower.com 30

CPU- Ruby: SequencersConfusing: Two names, same thing: RubyPort and SequencerSequencer is a MemObject (classic ports)Converts gem5 packets to RubyRequestsNew messages delivered to the “MandatoryQueue”Jason Lowe-Power jason@lowepower.com CPUSequencerRUBYSwitch!31

Where is . . . figs/rubyConfiguration of network modelsDefault cache topologiesProtocol config and Ruby configRuby config: configs/ruby/Ruby.pyEntry point for Ruby configs and helper functionsSelects the right protocol config “automatically”Jason Lowe-Power jason@lowepower.com 32

Where is . . . ?Don’t be afraid to dig into thecompiler! It’s often necessary.SLICCsrc/mem/sliccCode for the compilersrc/mem/ruby/slicc interfaceStructures used only in generated codeAbstractControllerJason Lowe-Power jason@lowepower.com 33

Where is . . . ?src/mem/ruby/structuresStructures used in Ruby (e.g., cache memory, replace policy)src/mem/ruby/systemRuby wrapper code and entry pointRubyPort/SequencerRubySystem: Centralized information, checkpointing, etc.Jason Lowe-Power jason@lowepower.com 34

Where is . . . ?src/mem/ruby/commonGeneral data structures, etc.src/mem/ruby/filtersBloom filters, etc.src/mem/ruby/networkNetwork modelsrc/mem/ruby/profilerProfiling for coherence protocolsJason Lowe-Power jason@lowepower.com 35

Current protocols (src/mem/protocol)GPU rfo (Read for ownership GPU-CPU protocol)GPU VIPER (“Realistic” GPU-CPU protocol)GPU VIPER Region (HSA paper)Garnet standalone (No coherence, just traffic injection)MESI Three level (like two level, but with L0 cache)MESI Two level (private L1s shared L2)MI example (Example: Do not use for performance)MOESI AMD (?)MOESI CMP directoryMOESI CMP tokenMOESI hammer (Like AMD hammer protocol for opteron/hyper transport)Jason Lowe-Power jason@lowepower.com 36

Things not coveredWriting a coherence protocolVirtual networksStalling requestsExtra transient statesDebugging a coherence protocolRubyRandomTester ProtocolTraceOther Ruby debug flags also usefulJason Lowe-Power jason@lowepower.com 37

Questions?We coveredRuby’s designSLICC state machine filesparameters, message buffers, ports, events, states, actions, transitionsHow to configure RubyStandard protocols and topologiesJason Lowe-Power jason@lowepower.com 38

More org/SLICChttp://gem5.org/RubyJason Lowe-Power jason@lowepower.com 39

Ruby - Memory Jason Lowe-Power jason@lowepower.com 30 RUBY DRAM Ctrl DRAM Ctrl Any controller can connect its “memory” port. Usually, only “directory controllers. You can send messages on this port in SLICC with queueMemoryRead/Write Resp