Transcription
Advanced Computer ArchitectureThe Architecture ofParallel Computers
Computer SystemsNo ComponentCan be TreatedIn IsolationFrom the chitecture
Hardware Issues Number and Type of ProcessorsProcessor ControlMemory HierarchyI/O devices and PeripheralsOperating System SupportApplications Software Compatibility
Operating System Issues Allocating and Managing Resources Access to Hardware Features– Multi-Processing– Multi-Threading I/O Management Access to Peripherals Efficiency
Applications Issues Compiler/Linker SupportProgrammabilityOS/Hardware Feature AvailabilityCompatibilityParallel Compilers– Preprocessor– Precompiler– Parallelizing Compiler
Architecture Evolution Scalar ArchitecturePrefetch Fetch/Execute OverlapMultiple Functional UnitsPipeliningVector ProcessorsLock-Step ProcessorsMulti-Processor
Flynn’s Classification Consider Instruction Streams and DataStreams Separately. SISD - Single Instruction, Single DataStream SIMD - Single Instruction, Multiple DataStreams MIMD - Multiple Instruction, Multiple DataStreams. MISD - (rare) Multiple Instruction, SingleData Stream
SISD Conventional Computers.Pipelined SystemsMultiple-Functional Unit SystemsPipelined Vector ProcessorsIncludes most computers encountered ineveryday life
SIMD Multiple Processors Execute a SingleProgram Each Processor operates on its own data Vector Processors Array Processors PRAM Theoretical Model
MIMD Multiple Processors cooperate on a singletask Each Processor runs a different program Each Processor operates on different data Many Commercial Examples Exist
MISD A Single Data Stream passes throughmultiple processors Different operations are triggered ondifferent processors Systolic Arrays Wave-Front Arrays
Programming Issues Parallel Computers are Difficult to Program Automatic Parallelization Techniques areonly Partially Successful Programming languages are few, not wellsupported, and difficult to use. Parallel Algorithms are difficult to design.
Performance IssuesClock Rate / Cycle Time τCycles Per Instruction (Average) CPIInstruction Count IcTime, T Ic CPI τp Processor Cycles, m Memory Cycles,k Memory/Processor cycle ratio T Ic (p m k) τ
Performance Issues II Ic & p affected by processor design andcompiler technology. m affected mainly by compiler technologyτ affected by processor design k affected by memory hierarchy structureand design
Other Measures MIPS rate - Millions of instructions persecond Clock Rate for similar processors MFLOPS rate - Millions of floating pointoperations per second. These measures are not neccessarily directlycomparable between different types ofprocessors.
Parallelizing Code Implicitly– Write Sequential Algorithms– Use a Parallelizing Compiler– Rely on compiler to find parallelism Explicitly– Design Parallel Algorithms– Write in a Parallel Language– Rely on Human to find Parallelism
Multi-Processors Multi-Processors generally share memory,while multi-computers do not.– Uniform memory model– Non-Uniform Memory Model– Cache-Only MIMD Machines
Multi-Computers Independent Computers that Don’t ShareMemory. Connected by High-Speed CommunicationNetwork More tightly coupled than a collection ofindependent computers Cooperate on a single problem
Vector Computers Independent Vector HardwareMay be an attached processorHas both scalar and vector instructionsVector instructions operate in highlypipelined mode Can be Memory-to-Memory or Register-toRegister
SIMD Computers One Control Processor Several Processing Elements All Processing Elements execute the sameinstruction at the same time Interconnection network between PEsdetermines memory access and PEinteraction
The PRAM Model SIMD Style ProgrammingUniform Global MemoryLocal Memory in Each PEMemory Conflict Resolution––––CRCW - Common Read, Common WriteCREW - Common Read, Exclusive WriteEREW - Exclusive Read, Exclusive WriteERCW - (rare) Exclusive Read, Common Write
The VLSI Model Implement Algorithm as a mostlycombinational circuit Determine the area required forimplementation Determine the depth of the circuit
Advanced Computer ArchitectureThe Architecture ofParallel Computers
Computer SystemsNo ComponentCan be TreatedIn IsolationFrom the chitecture
Hardware Issues Number and Type of ProcessorsProcessor ControlMemory HierarchyI/O devices and PeripheralsOperating System SupportApplications Software Compatibility
Operating System Issues Allocating and Managing Resources Access to Hardware Features– Multi-Processing– Multi-Threading I/O Management Access to Peripherals Efficiency
Applications Issues Compiler/Linker SupportProgrammabilityOS/Hardware Feature AvailabilityCompatibilityParallel Compilers– Preprocessor– Precompiler– Parallelizing Compiler
Architecture Evolution Scalar ArchitecturePrefetch Fetch/Execute OverlapMultiple Functional UnitsPipeliningVector ProcessorsLock-Step ProcessorsMulti-Processor
Flynn’s Classification Consider Instruction Streams and DataStreams Separately. SISD - Single Instruction, Single DataStream SIMD - Single Instruction, Multiple DataStreams MIMD - Multiple Instruction, Multiple DataStreams. MISD - (rare) Multiple Instruction, SingleData Stream
SISD Conventional Computers.Pipelined SystemsMultiple-Functional Unit SystemsPipelined Vector ProcessorsIncludes most computers encountered ineveryday life
SIMD Multiple Processors Execute a SingleProgram Each Processor operates on its own data Vector Processors Array Processors PRAM Theoretical Model
MIMD Multiple Processors cooperate on a singletask Each Processor runs a different program Each Processor operates on different data Many Commercial Examples Exist
MISD A Single Data Stream passes throughmultiple processors Different operations are triggered ondifferent processors Systolic Arrays Wave-Front Arrays
Programming Issues Parallel Computers are Difficult to Program Automatic Parallelization Techniques areonly Partially Successful Programming languages are few, not wellsupported, and difficult to use. Parallel Algorithms are difficult to design.
Performance IssuesClock Rate / Cycle Time τCycles Per Instruction (Average) CPIInstruction Count IcTime, T Ic CPI τp Processor Cycles, m Memory Cycles,k Memory/Processor cycle ratio T Ic (p m k) τ
Performance Issues II Ic & p affected by processor design andcompiler technology. m affected mainly by compiler technologyτ affected by processor design k affected by memory hierarchy structureand design
Other Measures MIPS rate - Millions of instructions persecond Clock Rate for similar processors MFLOPS rate - Millions of floating pointoperations per second. These measures are not neccessarily directlycomparable between different types ofprocessors.
Parallelizing Code Implicitly– Write Sequential Algorithms– Use a Parallelizing Compiler– Rely on compiler to find parallelism Explicitly– Design Parallel Algorithms– Write in a Parallel Language– Rely on Human to find Parallelism
Multi-Processors Multi-Processors generally share memory,while multi-computers do not.– Uniform memory model– Non-Uniform Memory Model– Cache-Only MIMD Machines
Multi-Computers Independent Computers that Don’t ShareMemory. Connected by High-Speed CommunicationNetwork More tightly coupled than a collection ofindependent computers Cooperate on a single problem
Vector ComputersIndependent Vector HardwareMay be an attached processorHas both scalar and vector instructionsVector instructions operate in highlypipelined mode Can be Memory-to-Memory or Register-toRegister
SIMD Computers One Control Processor Several Processing Elements All Processing Elements execute the sameinstruction at the same time Interconnection network between PEsdetermines memory access and PEinteraction
The PRAM Model SIMD Style ProgrammingUniform Global MemoryLocal Memory in Each PEMemory Conflict Resolution––––CRCW - Common Read, Common WriteCREW - Common Read, Exclusive WriteEREW - Exclusive Read, Exclusive WriteERCW - (rare) Exclusive Read, Common Write
The VLSI Model Implement Algorithm as a mostlycombinational circuit Determine the area required forimplementation Determine the depth of the circuit
SIMD Computers One Control Processor Several Processing Elements All Processing Elements execute the same instruction at the same time Interconnection network between PEs determines memory access and PE interaction. The PRAM Model SIMD Style Programming