SingleCycle - Computer Science

Transcription

2/9/17AGENDAIntro to Microarchitecture:Single-Cycle Review from last lecture ISA tradeoffsCS 3330 Single-cycle MicroarchitectureSamira KhanUniversity of VirginiaFeb 9, 20172Review: ISAReview: ISA vs. Microarchitecture ISA (Instruction Set Architecture) Agreed upon interface between software andhardware SW/compiler assumes, HW promises What the software writer needs to know to write anddebug system/user programs Microarchitecture Specific implementation of an ISA Not visible to the software Microprocessor ISA, uarch, circuits “Architecture” ISA microarchitecture InstructionsProblem Opcodes, Addressing M odes, Data Types Instruction Types and Formats Registers, Condition CodesAlgorithm MemoryProgram Address space, Addressability, Alignment Virtual m em ory m anagem entISAMicroarchitectureCircuitsTransistors3 Call, Interrupt/Exception Handling Access Control, Priority/Privilege I/O: memory-mapped vs. instr. Task/thread Management Power and Thermal Management Multi-threading support, Multiprocessor support41

2/9/17MicroarchitectureProperty of ISA vs. Uarch? Implementation of the ISA under specific design constraints and goals Anything done in hardware without exposure to software ADD instruction’s opcode Number of general purpose registers Number of ports to the register file Number of cycles to execute the MUL instruction Whether or not the machine employs pipelined instruction executionPipelining (will see later)Clock gatingCaching? Levels, size, associativity, replacement policyPrefetching?Voltage/frequency scaling?Error correction? Remember Microarchitecture: Implementation of the ISA under specific design constraintsand goals56Design PointDesign Point A set of design considerations and their importance A set of design considerations and their importance Considerations Considerations leads to tradeoffs in both ISA and uarch leads to tradeoffs in both ISA and uarchCostPerformanceMaximum power consumptionEnergy consumption (battery life)AvailabilityReliability and CorrectnessTime to Market Design point determined by the “Problem” space (application space),the intended users/marketCostPerformanceMaximum power consumptionEnergy consumption (battery life)AvailabilityReliability and CorrectnessTime to Market Design point determined by the “Problem” space (application space),the intended users/marketLook Forward & Up782

2/9/17ROLE OF THE (COMPUTER) ARCHITECTROLE OF THE (COMPUTER) ARCHITECT Look backward (to the past) Understand tradeoffs and designs, upsides/downsides, pastworkloads. Analyze and evaluate the past Look forward (to the future) Be the dreamer and create new designs. Listen to dreamers Push the state of the art. Evaluate new design choices Look up (towards problems in the computing stack) Understand important problems and their nature Develop architectures and ideas to solve important problems Look down (towards device/circuit technology)from Yale Patt’s lecture notes9Application Space Understand the capabilities of the underlying technology Predict and adapt to the future of technology (you are designing forN years ahead). Enable the future technology10Tradeoffs: Soul of Computer Architecture ISA-level tradeoffs Dream, and they will appear Microarchitecture-level tradeoffs System and Task-level tradeoffs How to divide the labor between hardware and software Computer architecture is the science and art of makingthe appropriate trade-offs to meet a design point Why art?11123

2/9/17Many Different ISAs Over Decades ISA Principles and Tradeoffsx86PDP-x: Programmed Data Processor (PDP-11)VAXIBM 360CDC 6600SIMD ISAs: CRAY-1, Connection MachineVLIW ISAs: Multiflow, Cydrome, IA-64 (EPIC)PowerPC, POWERRISC ISAs: Alpha, MIPS, SPARC, ARM What are the fundamental differences? E.g., how instructions are specified and what they do E.g., how complex are the pe15164

2/9/17What Are the Elements of An ISA?Data Type Tradeoffs Instructions What is the benefit of having more or high-level data types in the ISA? What is the disadvantage? Opcode Operand specifiers (addressing modes) How to obtain the operand?Why are there different addressing modes? Think compiler/programmer vs. microarchitect Data types Definition: Representation of information for which there areinstructions that operate on the representation Integer, floating point, character, binary, decimal, BCD Doubly linked list, queue, string, bit vector, stack Concept of semantic gap Data types coupled tightly to the semantic level, or complexity of instructions VAX: INSQUEUE and REMQUEUE instructions on a doubly linked list orqueue; FINDFIRST Digital Equipment Corp., “VAX11 780 Architecture Handbook,” 1977. X86: SCAN opcode operates on character strings; PUSH/POP Example: Early RISC architectures vs. Intel 432 Early RISC: Only integer data type Intel 432: Object data type, capability based machine17Complex vs. Simple Instructions18Complex vs. Simple Instructions Complex instruction: An instruction does a lot of work,e.g. many operations Advantages of Complex instructions Denser encoding à smaller code size à better memoryutilization, saves off-chip bandwidth, better cache hit rate(better packing of instructions) Simpler compiler: no need to optimize small instructions asmuch Insert in a doubly linked list Compute FFT String copy Simple instruction: An instruction does small amount ofwork, it is a primitive using which complex operationscan be built Disadvantages of Complex Instructions- Larger chunks of work à compiler has less opportunity tooptimize (limited in fine-grained optimizations it can do)- More complex hardware à translation from a high level tocontrol signals and optimization needs to be done byhardware Add XOR Multiply19205

2/9/17ISA-level Tradeoffs: Semantic GapISA-level Tradeoffs: Semantic Gap Where to place the ISA? Semantic gap Some tradeoffs (for you to think about) Closer to high-level language (HLL) à Small semantic gap, complexinstructions Closer to hardware control signals? à Large semantic gap, simple instructions Simple compiler, complex hardware vs. complex compiler, simplehardware RISC vs. CISC machines RISC: Reduced instruction set computer CISC: Complex instruction set computer Burden of backward compatibility FFT, QUICKSORT, POLY, FP instructions? VAX INDEX instruction (array access with bounds checking) Performance? Energy Consumption? Optimization opportunity: Example of VAX INDEX instruction: who (compilervs. hardware) puts more effort into optimization? Instruction size, code size2122ISA-level Tradeoffs: Instruction LengthSmall versus Large Semantic Gap Fixed length: Length of all instructions the same CISC vs. RISC Complex instruction set computer à complex instructions Easier to decode single instruction in hardware Easier to decode m ultiple instructions concurrently-- Wasted bits in instructions (W hy is this bad?)-- Harder-to-extend ISA (how to add new instructions?) Initially motivated by “not good enough” code generation Reduced instruction set computer à simple instructions John Cocke, mid 1970s, IBM 801 Variable length: Length of instructions different (determined byopcode and sub-opcode) Goal: enable better com piler control and optim ization RISC motivated by Com pact encoding (W hy is this good?)Intel 432: 6 to 321 bit instructions.-- M ore logic to decode a single instruction-- Harder to decode multiple instructions concurrently Memory stalls (no work done in a complex instruction whenthere is a memory stall?) When is this correct? Tradeoffs Simplifying the hardware à lower cost, higher frequency Enabling the compiler to optimize the code better Code size (m em ory space, bandwidth, latency) vs. hardware com plexity ISA extensibility and expressiveness vs. hardware complexity Perform ance? Energy? Sm aller code vs. ease of decode Find fine-grained parallelism to reduce stalls23246

2/9/17ISA-level Tradeoffs: Uniform DecodeISA-level Tradeoffs: Number of Registers Uniform decode: Same bits in each instructioncorrespond to the same meaning Affects: Number of bits used for encoding register address Number of values kept in fast storage (register file) (uarch) Size, access time, power consumption of register file Opcode is always in the same location Ditto operand specifiers, immediate values, Many “RISC” ISAs: Alpha, MIPS, SPARC Easier decode, simpler hardware Enables parallelism: generate target address before knowing theinstruction is a branch-- Restricts instruction format (fewer instructions?) or wastes space Large number of registers: Enables better register allocation (and optimizations) by compiler àfewer saves/restores-- Larger instruction size-- Larger register file size Non-uniform decode E.g., opcode can be the 1st-7th byte in x86 More compact and powerful instruction format-- More complex decode logic25ISA-level Tradeoffs: Addressing Modes26A Note on RISC vs. CISC Addressing mode specifies how to obtain an operand of aninstruction Usually, Register Immediate Memory (displacement, register indirect, indexed, absolute,memory indirect, autoincrement, autodecrement, ) RISC More modes: help better support programming constructs (arrays, pointerbased accesses)-- make it harder for the architect to design-- too many choices for the compiler?Simple instructionsFixed lengthUniform decodeFew addressing modes CISC Many ways to do the same thing complicates compiler design Wulf, “Compilers and Computer Architecture,” IEEE Computer 198127Complex instructionsVariable lengthNon-uniform decodeMany addressing modes287

2/9/17Y86-64 Instruction Set #1Food for Thought for You How would you design a new ISA? Where would you place it? What design choices would you make in terms of ISAproperties? What would be the first question you ask in thisprocess? “What is my design point?”Look Forward & Up29Byte0halt00nop10cmovXX rA, rB2fnrA rBirmovq V , rB30FrBVrmmovq rA, D (rB)40rA rBDmrmovq D (rB), rA50rA rBDOPq rA, rB6fnrA rBjXX D est7fnD estcall D est80D estret90pushq rAA0rA Fpopq rAB0rA F12345678930Now That We Have an ISA How do we implement it?Implementing the ISA:Microarchitecture Basics i.e., how do we design a system that obeys thehardware/software interface?318

2/9/17The “Process instruction” StepHow Does a Machine Process Instructions? ISA specifies abstractly what AS’ should be, given an instruction andAS What does processing an instruction mean? Remember the von Neumann model It defines an abstract finite state machine whereAS Architectural (programmer visible) state before an instruction is processed State program m er-visible state Next-state logic instruction execution specification From ISA point of view, there are no “intermediate states” between AS and AS’during instruction execution One state transition per instructionProcess instruction Microarchitecture implements how AS is transformed to AS’AS’ Architectural (programmer visible) state after an instruction is processed Processing an instruction: Transforming AS to AS’ according to the ISAspecification of the instruction There are many choices in implementation We can have programmer-invisible state to optimize the speed of instructionexecution: multiple state transitions per instruction Choice 1: AS à AS’ (transform AS to AS’ in a single clock cycle) Choice 2: AS à AS M S1 à AS M S2 à AS M S3 à AS’ (take m ultiple clock cycles totransform AS to AS’)33A Very Basic Instruction Processing Engine34A Very Basic Instruction Processing Engine Each instruction takes a single clock cycle to execute Only combinational logic is used to implement instruction execution Single-cycle machine No intermediate, programmer-invisible state updatesAS Architectural (programmer visible) stateat the beginning of a clock cycleCombinationalLogicAS’ (State)ASProcess instruction in one clock cycleAS’ Architectural (programmer visible) stateat the end of a clock cycle What is the clock cycle time determined by? What is the critical path of the combinational logicdetermined by?35369

2/9/17Single-cycle vs. Multi-cycle MachinesAssembly/Machine Code ViewCPURegistersPCConditionCodesAddresses Single-cycle machinesM emory Each instruction takes a single clock cycle All state updates made at the end of an instruction’s execution Big disadvantage: The slowest instruction determines cycle time à long clockcycle e State Multi-cycle machines PC: Program counter Address of next instruction Called “RIP” (x86-64) Register file Byte addressable array Code and user data Condition codes Stack to support procedures Heavily used program data Memory Store status inform ation about m ostrecent arithm etic or logical operation Used for conditional branchingnInstruction processing broken into multiple cycles/stagesState updates can be made during an instruction’s executionArchitectural state updates made only at the end of an instruction’s executionAdvantage over single-cycle: The slowest “stage” determines cycle timeBoth single-cycle and multi-cycle machines literally follow thevon Neumann model at the microarchitecture levelInstructions (and programs) specify how to transformthe values of programmer visible state37Instruction Processing “Stage”38Instruction Processing “Cycle” vs. Machine Clock Cycle Instructions are processed under the direction of a “controlunit” step by step. Single-cycle machine: All phases of the instruction processing cycle take a singlemachine clock cycle to complete Instruction stage: Sequence of steps to process an instruction Fundamentally, there are five phases: Multi-cycle machine: Fetch All six phases of the instruction processing cycle can takemultiple machine clock cycles to complete In fact, each phase can take multiple clock cycles to complete Decode Evaluate Address/Fetch Operands Execute Store Result Not all instructions require all stages394010

2/9/17Single-cycle vs. Multi-cycle: Control & DataInstruction Processing Viewed Another Way Single-cycle machine: Instructions transform Data (AS) to Data’ (AS’) This transformation is done by functional units Control signals are generated in the same clock cycle as theone during which data signals are operated on Everything related to an instruction happens in one clock cycle(serialized processing) Units that “operate” on data These units need to be told what to do to the data An instruction processing engine consists of two components Multi-cycle machine: Datapath: Consists of hardware elements that deal with and transformdata signals functional units that operate on data hardware structures (e.g. wires and muxes) that enable the flow of data intothe functional units and registers storage units that store data (e.g., registers) Control logic: Consists of hardware elements that determine controlsignals, i.e., signals that specify what the datapath elements should doto the data Control signals needed in the next cycle can be generated inthe current cycle Latency of control processing can be overlapped with latencyof datapath operation (more parallelism)4142Flash-Forward: Performance AnalysisMany Ways of Datapath and Control Design There are many ways of designing the data path and control logic Execution time of an instruction Single-cycle, multi-cycle, pipelined datapath and control Execution time of a program {CPI} x {clock cycle time} Sum over all instructions [{CPI} x {clock cycle time}] {# of instructions} x {Average CPI} x {clock cycle time} Hardwired/combinational vs. microcoded/microprogrammed control Control signals generated by combinational logic versus Control signals stored in a memory structure Single cycle microarchitecture performance CPI 1 Clock cycle time long Multi-cycle microarchitecture performance CPI different for each instruction Average CPI à hopefully small Clock cycle time short43Now, we havetwo degrees of freedomto optim ize independently4411

2/9/17Remember A Single-CycleMicroarchitectureA Closer Look Single-cycle machineCombinationalLogicAS’(State)AS46Let’s Start with the State ElementsFor Now, We Will AssumeRegWrite Data and control inputs “Magic” memory and register filevalAsrcAPCARegisterfilevalBsrcB0valWM UXW dstWBM UXSelectM iteData Synchronous write1DataMemM emRead the selected register is updated on the positive edge clocktransition when write enable is asserted Cannot affect read output in between clock edgesOperationReadDataABALU474812

2/9/17Instruction ProcessingInstruction Processing 6 (5) generic steps IFnew PCEX/AGID/RFInstrPC 6 (5) generic stepsInstruction fetch (IF)Instruction decode and register operand fetch (ID/RF)Execute/Evaluate memory address (EX/AG)Memory operand fetch (MEM)Store/writeback result (WB)PC UpdateAddr EMWriteDataRegisterfileWBReadDataAddress Instruction fetch (IF)Instruction decode and register operand fetch (ID/RF)Execute/Evaluate memory address (EX/AG)Memory operand fetch (MEM)Store/writeback result (WB)PC UpdateIFnew PCInstrAddr iteDataWBDataMem49Instruction ProcessingInstruction Processing 6 (5) generic steps new PCPC 6 (5) generic stepsInstruction fetch (IF)Instruction decode and register operand fetch (ID/RF)Execute/Evaluate memory address (EX/AG)Memory operand fetch (MEM)Store/writeback result (WB)PC UpdateIFEX/AGID/RFInstrAddr gisterfileALUMEMWriteDataWBReadDataAddress IFnew PCDataMemInstruction fetch (IF)Instruction decode and register operand fetch (ID/RF)Execute/Evaluate memory address (EX/AG)Memory operand fetch (MEM)Store/writeback result (WB)PC UpdatePCInstrAddr WBDataMem5213

2/9/17Executing Arith./Logical OperationSingle-Cycle Datapath forArithmetic and LogicalInstructionsOPq rA, rB6 Fetchfn rA rB Memory Read 2 bytes Do nothing Write back Decode Read operand registers Update register PC Update Execute Perform operation Increment PC by 2 Set condition codes54Stage Computation: Arith/Log. OpsFetchDecodeExecuteMemoryWriteOPq rA, rBicode:ifun M1 [PC]rA:rB M1 [PC 1]valP PC 2valA R[rA]valB R[rB]valE valB OP valASet CCR[rB] valEbackPC update PC valPALU DatapathRead instruction byteRead register byteCompute next PCRead operand ARead operand BPerform ALU operationSet condition coderegisterWrite back resultPCInstrAddr InstructionInstructionMem2Update PC Formulate instruction execution as sequence of simplesteps Use same general form for all UReadDataAddressRegisterfileWriteDataDataMemEXMEM WBADDif MEM[PC] OPq rA, rBR[rB] R[rB] op R[rA]PC PC 2**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]IFIDPCCombinationalstate update logic5614

2/9/17ALU DatapathPCInstrAddr EValEALUOPALUWriteDataRegisterfileWe did not cover these slidesin the classReadDataAddressDataMemWill learn about these in the next classThey are here for your benefitADDif MEM[PC] OPq rA, rBR[rB] R[rB] op R[rA]PC PC 2**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]IFIDEXMEM WBPCCombinationalstate update logic57Executing mrmovq (Load from Mem to Reg)Single-Cycle Datapath forData Movement Instructionsmrmovq D(rB) ,rA6fn rA rB D Fetch Memory Read 10 bytes Read from memory Decode Write back Read operand registers Write to Register Execute PC Update Compute effective Increment PC by 10address6015

2/9/17Stage Computation: mrmovqFetchDecodemrmovq D(rB), rAicode:ifun M1[PC]rA:rB M1[PC 1]valC M8[PC 2]valP PC 10Ld DatapathRead instruction byteRead register byteRead displacement DCompute next PCPCvalB R[rB]valE valB valCRead operand BCompute effective addressMemoryWritevalM M8[valE]R[rA] valMWrite value to memorybackPC updatePC valPUpdate PCExecuteInstructionMem10PCInstrAddr InstructionInstructionMemrB MrA dressWriteDataDataMemMEM WBPCFrom ALUFrom M emM UXSelectIFDataMemInstructionMemIDrB MrA UXrBvalBrAValADestEValEEXCombinationalstate update essWriteDataDataMemMEM WBPCDMUXFrom ALUFrom M em10ADDM UXSelectif MEM[PC] mrmovq Disp (rB), rAEA Disp R[rB]R[rA] MEM[EA]PC PC 10ALUOPRegisterfileMUXInstrAddr InstructionDADDrALd DatapathALUOPMvalBif MEM[PC] mrmovq Disp (rB), rAEA Disp R[rB]R[rA] MEM[EA]PC PC 1061RegWriterB MrA UXrBD Use ALU for address computationLd DatapathInstrAddr InstructionRegWriteIFIDEXMEM WBCombinationalstate update logicPC63if MEM[PC] mrmovq Disp (rB), rAEA Disp R[rB]R[rA] MEM[EA]PC PC 10MUXFrom ALUFrom M emM UXSelectIFIDEXCombinationalstate update logic6416

2/9/17Ld DatapathInstrAddr InstructionPCInstructionMemRegWriterB MrA UXrBvalBrAValADestEValELd eDataPCInstrAddr InstructionDataMemInstructionMemDADD10RegWriterB MrA UXrBvalBrAValADestEValEFrom ALUFrom M em10ADDM UXIFIDEXMEM WBCombinationalstate update logicPC65Executing rmmovq (St from reg to Memory)rmmovq rA, riteDataDataMemMEM WBPCDMUXif MEM[PC] mrmovq Disp (rB), rAEA Disp R[rB]R[rA] MEM[EA]PC PC 10ALUOPIFIDEXCombinationalstate update logic66Stage Computation: rmmovqDecodeExecute Fetch Memory Read 10 bytes Write to memory Decode Write back Read operand registers Do nothing Execute PC Update Compute effective address Increment PC by 10M UXSelectif MEM[PC] mrmovq Disp (rB), rAEA Disp R[rB]R[rA] MEM[EA]PC PC 10FetchrA rB DFrom ALUFrom M emMemoryWritebackPC updatermmovq rA, D(rB)icode:ifun M1[PC]rA:rB M1[PC 1]valC M8[PC 2]valP PC 10valA R[rA]valB R[rB]valE valB valCRead instruction byteRead register byteRead displacement DCompute next PCRead operand ARead operand BCompute effective addressM8[valE] valAWrite value to memoryPC valPUpdate PC Use ALU for address computation676817

2/9/17St DatapathPCInstrAddr InstructionInstructionMemRegWriterB MrA UXrBvalBrAValADestEValEALUMUXRegisterfileSt DatapathM emWriteALUOPReadDataAddressWriteDataPCInstrAddr InstructionDataMemInstructionMemD10MUXInstrAddr InstructionInstructionMemFrom ALUFrom M emM UXSelect10IFIDRegWriterB MrA UXrBvalBrAValADestEValEEXMEM WBPCCombinationalstate update logicSt UXRegisterfile69ReadDataAddressWriteDataDataMemMEM WBPCReadDataPCInstrAddr InstructionDataMemInstructionMemADDif MEM[PC] rnmovq rA, Disp (rB)EA Disp R[rB]MEM[EA] R[rA]PC PC 10From ALUFrom M emM UXSelectIFIDRegWriterB MrA UXrBvalBrAValADestEValEEXCombinationalstate update logicSt DatapathAddressWriteDataMUXif MEM[PC] rnmovq rA, Disp (rB)EA Disp R[rB]MEM[EA] R[rA]PC PC 10M emWriteALUOPADDD10rB MrA UXrBM emWriteALUOPDADDif MEM[PC] rmmovq rA, Disp (rB)EA Disp R[rB]MEM[EA] R[rA]PC PC 10PRegWriteM teDataDataMemMEM WBPCDMUXM UXSelectFrom ALUFrom M emIF10IDEXMEM WBCombinationalstate update logicPC71ADDif MEM[PC] rnmovq rA, Disp (rB)EA Disp R[rB]MEM[EA] R[rA]PC PC 10MUXFrom ALUFrom M emM UXSelectIFIDEXCombinationalstate update logic7218

2/9/17Stage Computation: immovqExecuting irmovq (Move imm to Reg)irmovq V, rB03FFetchrB V Fetch Read 10 bytes Decode Read operand registers Execute Add 0 to VMemoryWrite Memory Do nothing Write back Write V to rB PC Update Increment PC by 10backPC updateInstructionMemrB MrA UX10if MEM[PC] irmovq V, rBR[rB] VPC PC 10Write value to memoryPC valPUpdate PC7374rBvalBM ALUU OPXrAValAMDestEValE0UXRegisterfileRegWriteM emWriteALUReadDataAddressWriteDataPCInstrAddr InstructionDataMemInstructionMemDADDCompute effective addressR[rB] valAIRMov Datapath: Option 1RegWriteInstrAddr InstructionvalE 0 valC Use ALU for address computationIRMov Datapath: Option 1CRead instruction byteRead register byteRead displacement DCompute next PCDecodeExecutePirmovq V, rBicode:ifun M1[PC]rA:rB M1[PC 1]valC M8[PC 2]valP PC 10rB MrA UXrBvalBrAValADestEValEM ALUU OPX0MUXRegisterfileM emWriteALUReadDataAddressWriteDataDataMemMEM WBPCDMUXM UXSelectFrom ALU10From M emIFIDEXMEM WBCombinationalstate update logicPC75ADDif MEM[PC] irmovq V, rBR[rB] V 0PC PC 10MUXM UXSelectFrom ALUFrom M emIFIDEXCombinationalstate update logic7619

2/9/17IRMov Datapath: Option 1IRMov Datapath: Option 1RegWritePCInstrAddr InstructionInstructionMemrB MrA UXrBvalBM ALUU OPXrAValAMDestEValE0UXRegisterfileRegWriteM emWriteALUReadDataAddressWriteDataPCInstrAddr InstructionDataMemInstructionMemD10From ALUMUX10From M emM UXSelectIFIDEXMEM WBCombinationalstate update logicInstructionMemrB MrA UX10if MEM[PC] irmovq V, rBR[rB] V 0PC PC 10ValErBvalBM ALUU dressWriteDataDataMemMEM WBPCFrom ALUMUXFrom M emM UXSelectIFIDEXCombinationalstate update logicUXRegisterfileRegWriteM emWriteALUReadDataAddressWriteDataPCInstrAddr v Datapath: Option 2RegWriteInstrAddr InstructionValAif MEM[PC] irmovq V, rBR[rB] V 0PC PC 10PCIRMov Datapath: Option 1CvalBrAM emWriteDADDif MEM[PC] irmovq V, rBR[rB] V 0PC PC 10PrB MrA UXrBM ALUU OPX0rB MrA UXrBvalBrAValADestEValEM DataDataMemMEM WBPCDMUXM UXSelectFrom ALU10From M emIFIDEXMEM WBCombinationalstate update logicPC79ADDif MEM[PC] irmovq V, rBR[rB] VPC PC 10MUXM UXSelectFrom ALUFrom M emIFIDEXCombinationalstate update logic8020

2/9/17IRMov Datapath: Option 2RegWriteInstrAddr InstructionPCInstructionMemrB MrA UXrBvalBrAValADestEValEM DataDataMemMEM WBPC Tradeoffs between option 1 and option 2?DMADD10UXif MEM[PC] irmovq V, rBR[rB] VPC PC 10From ALUFrom M emM UXSelectIFIDEXCombinationalstate update logic81Stage Computation: rrmovqExecuting rrmovq (Move from Reg to Reg)rrmovq rA, rB2082FetchrA rBDecodeExecute Fetch Read 2 bytes Decode Read operand register rA Execute Add 0 to val rA Memory Do nothing Write back Write val rA to rB PC Update Increment PC by 2rrmovq rA, rBicode:ifun M1[PC]rA:rB M1[PC 1]valP PC 2Read instruction byteRead register byteRead displacement DCompute next PCValA R[rA]valE 0 valACompute effective addressMemoryWriteR[rB] ß valEbackPC updatePC valPWrite value to memoryUpdate PC Use ALU for address computation838421

2/9/17rrMov Datapath: Option 1rrmov Datapath: Option 1RegWritePCInstrAddr InstructionInstructionMemrB MrA UXrBvalBM ALUU OPXrAValAMDestEValE0UXRegisterfileRegWriteM emWriteALUReadDataAddressWriteDataPCInstrAddr InstructionDataMemInstructionMemD2From ALUMUX2From M emM UXSelectIFIDEXMEM WBCombinationalstate update logicInstructionMemrB MrA UX2if MEM[PC] rrmovq rA, rBR[rB] R[rA]PC PC 2ValErBvalBM ALUU ressWriteDataDataMemMEM WBPCFrom ALUMUXFrom M emM UXSelectIFIDEXCombinationalstate update logic0UXRegisterfileRegWriteM emWriteALUReadDataAddressWriteDataPCInstrAddr v Datapath: Option 1RegWriteInstrAddr InstructionValAif MEM[PC] rrmovq rA, rBR[rB] R[rA]PC PC 2PCrrmov Datapath: Option 1CvalBrAM emWriteDADDif MEM[PC] rrmovq rA, rBR[rB] R[rA]PC PC 2PrB MrA UXrBM ALUU OPX0rB MrA UXrBvalBrAValADestEValEM ALUU OPX0MUXRegisterfileM emWriteALUReadDataAddressWriteDataDataMemMEM WBPCDMUXM UXSelectFrom ALU2From M emIFIDEXMEM WBCombinationalstate update logicPC87ADDif MEM[PC] rrmovq rA, rBR[rB] R[rA]PC PC 2MUXFrom ALUFrom M emM UXSelectIFIDEXCombinationalstate update logic8822

2/9/17rrmov Datapath: Option 2RegWritePCInstrAddr InstructionInstructionMemrB MrA UXrBvalBrAValADestEValE10ADDif MEM[PC] rrmovq rA, rBR[rB] R[rA]PC PC 2?MUXALUMUXRegisterfileDM emWriteALUOPReadDataAddressWriteDataIntro to Microarchitecture:Single-CycleCS 3330DataMemSamira KhanUniversity of VirginiaFeb 9, 2017From ALUFrom M emM UXSelectIFIDEXMEM WBCombinationalstate update logicPC8923

RISC ISAs: Alpha, MIPS, SPARC, ARM What are the fundamental differences? E.g., how instructions are specified and what they do E.g., how complex are the instructions 14 MIPS opcode 6-bit rs 5-bit rt 5-bit immediate 16-bit I-type 0 R-type 6-bit rs 5-bit rt 5-bit rd 5-bit shamt 5-bit funct 6-bit opcode 6-bit immediate 26-bit J .