Computer Architecture J. Daniel García Sánchez . PDF Free Download

2y ago

104 Views

1 Downloads

328.88 KB

66 Pages

Report/dmca

Download PDF

Transcription

Introduction to instruction level parallelismIntroduction to instruction level parallelismComputer ArchitectureJ. Daniel García Sánchez (coordinator)David Expósito SinghFrancisco Javier García BlasARCOS GroupComputer Science and Engineering DepartmentUniversity Carlos III of Madridcb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es1/66

Introduction to instruction level parallelismIntroduction to pipelining1Introduction to pipelining2Hazards3Multi-cycle operations4Conclusioncb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es2/66

Introduction to instruction level parallelismIntroduction to pipeliningPipelineImplementation technique to execute multiple instructionsoverlapped in time.A costly operation is divided into simple sub-operations.Sub-operations are executed into stages.Effects:Increases throughput.Latency is not decreased.cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es3/66

Introduction to instruction level parallelismIntroduction to pipeliningPipelineIF1Cycle 2IF2ID1IF3ID2EX1Cycle 4IF4ID3EX2M1Cycle 5IF5ID4EX3M2WB1Cycle 6IF6ID5EX4M3WB2IF7ID6EX5M4WB3IF8ID7EX6M5WB4Cycle 7Cycle 8cb e d –NormalCycle 3Pipeline fillingCycle 1Latency:Computer Architecture–One instructionrequires 5 stages.5 cycles.Throughput:ARCOS Group–One instructionfinalized per cycle(once pipeline isfull).1 instruction percycle.http://www.arcos.inf.uc3m.es4/66

Introduction to instruction level parallelismIntroduction to pipeliningPipeline stagesSimplified model (MIPS):5 stages.More realistic models require more stagesStages:Instruction Fetch.Instruction Decode.Execution.Memory.Write-back.cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es5/66

Introduction to instruction level parallelismIntroduction to pipeliningInstruction FetchNext PC4MUXSumSend PC to memory.cb e d –Update PC.IRMemoryPC ADDRRead instruction.Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es6/66

Introduction to instruction level parallelismIntroduction to pipeliningInstruction DecodeNext PCR1RDRegistersIRR2Decode instruction.MUXRead registers.ALUSign extend offsets.MUXInmcb e d –Compute possible branchaddress.SignExtComputer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es7/66

Introduction to instruction level parallelismIntroduction to pipeliningExecutionNext PCNext PCMUXR1CMPZeroNext PC or R1ALU operation onregisters.Alternatively, computeeffective address.Dir orDataALUInm or R2cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es8/66

Introduction to instruction level parallelismIntroduction to pipeliningMemoryDirDataMemoryRead from or write intomemory.DataDatacb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es9/66

Introduction to instruction level parallelismIntroduction to pipeliningWrite backData Mem.MUXWrite result into registerfile.RegsALUcb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es10/66

Introduction to instruction level parallelismIntroduction to pipeliningGeneral architectureMUX4Next PCSumCMPZeroR1RDMUXRegistersIRMemoryPC cb e d –Computer ArchitectureDecode–ARCOS sWrite-back11/66

Introduction to instruction level parallelismIntroduction to pipeliningPipeline effectsAn n depth pipeline multiplies by n the needed bandwidthcompared to a non-pipelined version with the same clockrate.Caching, caching, . . .Separation among data and instructions cachessuppresses some memory conflictsInstructions in the pipeline should not try to use the sameresource at the same time.Pipelining registers between successive stages.cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es12/66

Introduction to instruction level parallelismIntroduction to pipeliningStages communicationNext PCNext 2RegistersIF/IDMemoryPC ADDRR1MUXMUXInm SignExtRDRD RDWB datacb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es13/66

Introduction to instruction level parallelismIntroduction to pipeliningPipeline over timeTime (cycles)Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle LUDMInstrIMemRegRegister read in second half of cycle.Register write in first half of cycle.cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es14/66

Introduction to instruction level parallelismIntroduction to pipeliningExampleNon-pipelined processor.Clock cycle: 1 ns.40% ALU operations 4 cycles.20% branch operations 4 cycles.40% memory operations 5 cycles.Pipeline overhead 0.2 ns.Which is the pipeline speedup?torig cycleclock CPI 1ns (0.6 4 0.4 5) 4.4nstnew 1ns 0.2ns 1.2nsS cb e d –4.4ns 3.71.2nsComputer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es15/66

Introduction to instruction level parallelismHazards1Introduction to pipelining2Hazards3Multi-cycle operations4Conclusioncb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es16/66

Introduction to instruction level parallelismHazardsHazardA hazard is a situation preventing next instruction to startat the expected clock cycle.Hazards reduce performance in pipelined architectures.Types of hazards:Structural hazard.Data hazard.Control hazard.Simple approach to hazards:Stall the instruction flow.Already started instructions will continue.cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es17/66

Introduction to instruction level parallelismHazardsStructural hazards2HazardsStructural hazardsData hazardsControl hazardscb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es18/66

Introduction to instruction level parallelismHazardsStructural hazardsStructural hazardHappens when hardware cannot support all possibleinstruction sequences.In the same cycle two pipeline stages need to use thesame resource.Reasons:Functional units that are not fully pipelined.Functional units which are not duplicated.These hazards can be avoided at the cost of a moreexpensive hardware.cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es19/66

Introduction to instruction level parallelismHazardsStructural hazardsPipeline speedupSpeedup:tnonpipelined : Average instruction time in non-pipelinedarchitecture.tpipelined : Average instruction time in pipelined architecture.S tnonpipelinedCPInonpipelined cyclenonpipelined tpipelinedCPIpipelined cyclepipelinedIdeal case: pipelined CPI is 1.Need to add stall cycles per instruction.cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es20/66

Introduction to instruction level parallelismHazardsStructural hazardsPipeline speedupIn case of non-pipelined processor:CPI 1, with cyclenonpipelined cyclepipelined .cyclenonpipelined N cyclepipelined .N Pipeline depth.SpeedupS cb e d –N1 stalls per instructionComputer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es21/66

Introduction to instruction level parallelismHazardsStructural hazardsStructural hazards: exampleTime (cycles)Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8LOADInstr egALUMemInstr 2Instr 3RegAssuming single port memory.cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es22/66

Introduction to instruction level parallelismHazardsStructural hazardsStructural hazards: exampleTime (cycles)Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle llstallstallstallstallMemRegInstrInstr 1RegInstr 2Instr 3Instr 4Assuming single port memory.cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es23/66

Introduction to instruction level parallelismHazardsStructural hazardsExampleTwo alternative designs:A: No structural hazards.Clock cycle 1nsB: With structural hazards.Clock cycle 0.9nsData access instructions with hazards 30%.Which one is the fastest alternative?tinst (A) CPI cycle 1 1ns 1nstinst (B) CPI cycle (0.7 1 0.3 (1 1)) 0.9ns 1.17nscb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es24/66

Introduction to instruction level parallelismHazardsData hazards2HazardsStructural hazardsData hazardsControl hazardscb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es25/66

Introduction to instruction level parallelismHazardsData hazardsData hazardA data hazard happens when the pipeline changes theread/write access to operands ordering.I2 reads R1 before than I1modifies it.ExampleI1 :I2 :I3 :I4 :I5 :dadddsubandorxorI3 reads R1 before than I1modifies it.I4 gets the right value. 1 , 2 , 3 4 , 1 , 5 6 , 1 , 7 8 , 1 , 9 10 , 1 , 11Register file is read insecond half of cycle.I5 gets the right value.cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es26/66

Introduction to instruction level parallelismHazardsData hazardsData hazardsTime (cycles)Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8dadd 1, 2, 3dsub 4, 1, egALUMemRegMemRegALUMemand 6, 1, 7or 8, 1, 9xor 10, 1, 11cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.esReg27/66

Introduction to instruction level parallelismHazardsData hazardsStalls in data hazardsTime (cycles)Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8dadd 1, 2, 3dsub 4, 1, MemMemRegALUMemRegand 6, 1, 7or 8, 1, 9xor 10, 1, 11cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es28/66

Introduction to instruction level parallelismHazardsData hazardsData hazards: RAWRead After Write.Instruction i 1 tries to read a datum before instruction iwrites it.If there is a data dependency, instructions:ExampleCan neither be executed in parallel noroverlap.Instruction sub needs value from 1produced by instruction add.i:add 1, 2, 3i 1: sub 4, 1, 3Solutions:Hardware detection.Compiler control.cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es29/66

Introduction to instruction level parallelismHazardsData hazardsData hazards: WARWrite After Read.Instruction i 1 modifies operand before instruction i readsit.Also known as anti-dependence incompiler technology.Examplei : sub 4, 1, 3i 1: add 1, 2, 3i 2: mul 6, 1, 7Name reuse.Instruction add modifies 1 before subreads it.Cannot happen in a MIPS with 5-stages pipeline.All instructions with 5 stages.Reads always happen in stage 2.Writes always happen in stage 5.cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es30/66

Introduction to instruction level parallelismHazardsData hazardsData hazards: WAWWrite After Write.Instruction i 1 modifies operand before instruction imodifies it.Also known as output dependency incompiler technologyExamplei : sub 1, 4, 3i 1: add 1, 2, 3i 2: mul 6, 1, 7Name reuse.Instruction add modifies 1 before submodifies it.Cannot happen in a MIPS with 5-stages pipeline.All instructions with 5 stages.Writes always happen in stage 5.cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es31/66

Introduction to instruction level parallelismHazardsData hazardsSolutions to data hazardsRAW dependencies:Forwarding.WAR and WAW dependencies:Register renaming.Done statically by compiler.Done dynamically by hardware.cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es32/66

Introduction to instruction level parallelismHazardsData hazardsForwardingTechnique to avoid some data stalls.Basic idea:No need to wait until result is written into register file.Result is already in pipeline registers.Use that value instead of the one from the register file.Implementation:Results from EX and MEM stages written into ALU inputregisters.Forwarding logic selects between real input and forwardingregister.cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es33/66

Introduction to instruction level parallelismHazardsData hazardsForwardingdadd 1, 2, RegIMRegALUDMdsub 4, 1, 5and 6, 1, 7or 8, 1, 9xor 10, 1, 11cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.esReg34/66

Introduction to instruction level parallelismHazardsData hazardsforwarding limitationsNot every hazard can be avoided with forwarding.You cannot travel backwards in time!ExampleI1 :I2 :I3 :I4 :I5 :lwdsubandorxorIM 1 , ( 0 ) 2 4 , 1 , 5 6 , 1 , 7 8 , 1 , 9 10 , 1 , 11RegALUDMRegIMRegALUDMRegIf hazard cannot be avoided, a stall must be introduced.cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es35/66

Introduction to instruction level parallelismHazardsData hazardsStalls in memory accesseslw 1, 0( gstallIMRegALUDMRegIMRegALUDMdsub 4, 1, 5and 6, 1, 7or 8, 1, 9xor 10, 1, 11cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.esReg36/66

Introduction to instruction level parallelismHazardsControl hazards2HazardsStructural hazardsData hazardsControl hazardscb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es37/66

Introduction to instruction level parallelismHazardsControl hazardsControl hazardA control hazard happens in a PC modification instruction.Terminology:Taken branch: If PC is modified.Not-taken branch: if PC is not modified.Problem:Pipelining assumes that branch will not be taken.What if, after ID, we find out that branch needs to be taken?cb e d –Computer Architecture–ARCOS Group–http://www.arcos.inf.uc3m.es38/66

Introduction to instruction level parallelismHazardsControl hazardsAlternatives in control hazardsCompile time: Fixed asumption for the full programexecution.Software ma

Instr Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 LOAD Instr 1 Instr 2 Instr 3 Instr 4 Mem Reg ALU Mem Reg Mem Reg ALU Mem Reg Mem Reg ALU Mem Reg stall stall stall stall stall Mem Reg Assuming single port memory. – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 23/66