Advanced Computer Architecture - Baylor University

Transcription

Advanced Computer ArchitectureThe Architecture ofParallel Computers

Computer SystemsNo ComponentCan be TreatedIn IsolationFrom the chitecture

Hardware Issues Number and Type of ProcessorsProcessor ControlMemory HierarchyI/O devices and PeripheralsOperating System SupportApplications Software Compatibility

Operating System Issues Allocating and Managing Resources Access to Hardware Features– Multi-Processing– Multi-Threading I/O Management Access to Peripherals Efficiency

Applications Issues Compiler/Linker SupportProgrammabilityOS/Hardware Feature AvailabilityCompatibilityParallel Compilers– Preprocessor– Precompiler– Parallelizing Compiler

Architecture Evolution Scalar ArchitecturePrefetch Fetch/Execute OverlapMultiple Functional UnitsPipeliningVector ProcessorsLock-Step ProcessorsMulti-Processor

Flynn’s Classification Consider Instruction Streams and DataStreams Separately. SISD - Single Instruction, Single DataStream SIMD - Single Instruction, Multiple DataStreams MIMD - Multiple Instruction, Multiple DataStreams. MISD - (rare) Multiple Instruction, SingleData Stream

SISD Conventional Computers.Pipelined SystemsMultiple-Functional Unit SystemsPipelined Vector ProcessorsIncludes most computers encountered ineveryday life

SIMD Multiple Processors Execute a SingleProgram Each Processor operates on its own data Vector Processors Array Processors PRAM Theoretical Model

MIMD Multiple Processors cooperate on a singletask Each Processor runs a different program Each Processor operates on different data Many Commercial Examples Exist

MISD A Single Data Stream passes throughmultiple processors Different operations are triggered ondifferent processors Systolic Arrays Wave-Front Arrays

Programming Issues Parallel Computers are Difficult to Program Automatic Parallelization Techniques areonly Partially Successful Programming languages are few, not wellsupported, and difficult to use. Parallel Algorithms are difficult to design.

Performance IssuesClock Rate / Cycle Time τCycles Per Instruction (Average) CPIInstruction Count IcTime, T Ic CPI τp Processor Cycles, m Memory Cycles,k Memory/Processor cycle ratio T Ic (p m k) τ

Performance Issues II Ic & p affected by processor design andcompiler technology. m affected mainly by compiler technologyτ affected by processor design k affected by memory hierarchy structureand design

Other Measures MIPS rate - Millions of instructions persecond Clock Rate for similar processors MFLOPS rate - Millions of floating pointoperations per second. These measures are not neccessarily directlycomparable between different types ofprocessors.

Parallelizing Code Implicitly– Write Sequential Algorithms– Use a Parallelizing Compiler– Rely on compiler to find parallelism Explicitly– Design Parallel Algorithms– Write in a Parallel Language– Rely on Human to find Parallelism

Multi-Processors Multi-Processors generally share memory,while multi-computers do not.– Uniform memory model– Non-Uniform Memory Model– Cache-Only MIMD Machines

Multi-Computers Independent Computers that Don’t ShareMemory. Connected by High-Speed CommunicationNetwork More tightly coupled than a collection ofindependent computers Cooperate on a single problem

Vector Computers Independent Vector HardwareMay be an attached processorHas both scalar and vector instructionsVector instructions operate in highlypipelined mode Can be Memory-to-Memory or Register-toRegister

SIMD Computers One Control Processor Several Processing Elements All Processing Elements execute the sameinstruction at the same time Interconnection network between PEsdetermines memory access and PEinteraction

The PRAM Model SIMD Style ProgrammingUniform Global MemoryLocal Memory in Each PEMemory Conflict Resolution––––CRCW - Common Read, Common WriteCREW - Common Read, Exclusive WriteEREW - Exclusive Read, Exclusive WriteERCW - (rare) Exclusive Read, Common Write

The VLSI Model Implement Algorithm as a mostlycombinational circuit Determine the area required forimplementation Determine the depth of the circuit

Advanced Computer ArchitectureThe Architecture ofParallel Computers

Computer SystemsNo ComponentCan be TreatedIn IsolationFrom the chitecture

Hardware Issues Number and Type of ProcessorsProcessor ControlMemory HierarchyI/O devices and PeripheralsOperating System SupportApplications Software Compatibility

Operating System Issues Allocating and Managing Resources Access to Hardware Features– Multi-Processing– Multi-Threading I/O Management Access to Peripherals Efficiency

Applications Issues Compiler/Linker SupportProgrammabilityOS/Hardware Feature AvailabilityCompatibilityParallel Compilers– Preprocessor– Precompiler– Parallelizing Compiler

Architecture Evolution Scalar ArchitecturePrefetch Fetch/Execute OverlapMultiple Functional UnitsPipeliningVector ProcessorsLock-Step ProcessorsMulti-Processor

Flynn’s Classification Consider Instruction Streams and DataStreams Separately. SISD - Single Instruction, Single DataStream SIMD - Single Instruction, Multiple DataStreams MIMD - Multiple Instruction, Multiple DataStreams. MISD - (rare) Multiple Instruction, SingleData Stream

SISD Conventional Computers.Pipelined SystemsMultiple-Functional Unit SystemsPipelined Vector ProcessorsIncludes most computers encountered ineveryday life

SIMD Multiple Processors Execute a SingleProgram Each Processor operates on its own data Vector Processors Array Processors PRAM Theoretical Model

MIMD Multiple Processors cooperate on a singletask Each Processor runs a different program Each Processor operates on different data Many Commercial Examples Exist

MISD A Single Data Stream passes throughmultiple processors Different operations are triggered ondifferent processors Systolic Arrays Wave-Front Arrays

Programming Issues Parallel Computers are Difficult to Program Automatic Parallelization Techniques areonly Partially Successful Programming languages are few, not wellsupported, and difficult to use. Parallel Algorithms are difficult to design.

Performance IssuesClock Rate / Cycle Time τCycles Per Instruction (Average) CPIInstruction Count IcTime, T Ic CPI τp Processor Cycles, m Memory Cycles,k Memory/Processor cycle ratio T Ic (p m k) τ

Performance Issues II Ic & p affected by processor design andcompiler technology. m affected mainly by compiler technologyτ affected by processor design k affected by memory hierarchy structureand design

Other Measures MIPS rate - Millions of instructions persecond Clock Rate for similar processors MFLOPS rate - Millions of floating pointoperations per second. These measures are not neccessarily directlycomparable between different types ofprocessors.

Parallelizing Code Implicitly– Write Sequential Algorithms– Use a Parallelizing Compiler– Rely on compiler to find parallelism Explicitly– Design Parallel Algorithms– Write in a Parallel Language– Rely on Human to find Parallelism

Multi-Processors Multi-Processors generally share memory,while multi-computers do not.– Uniform memory model– Non-Uniform Memory Model– Cache-Only MIMD Machines

Multi-Computers Independent Computers that Don’t ShareMemory. Connected by High-Speed CommunicationNetwork More tightly coupled than a collection ofindependent computers Cooperate on a single problem

Vector ComputersIndependent Vector HardwareMay be an attached processorHas both scalar and vector instructionsVector instructions operate in highlypipelined mode Can be Memory-to-Memory or Register-toRegister

SIMD Computers One Control Processor Several Processing Elements All Processing Elements execute the sameinstruction at the same time Interconnection network between PEsdetermines memory access and PEinteraction

The PRAM Model SIMD Style ProgrammingUniform Global MemoryLocal Memory in Each PEMemory Conflict Resolution––––CRCW - Common Read, Common WriteCREW - Common Read, Exclusive WriteEREW - Exclusive Read, Exclusive WriteERCW - (rare) Exclusive Read, Common Write

The VLSI Model Implement Algorithm as a mostlycombinational circuit Determine the area required forimplementation Determine the depth of the circuit

SIMD Computers One Control Processor Several Processing Elements All Processing Elements execute the same instruction at the same time Interconnection network between PEs determines memory access and PE interaction. The PRAM Model SIMD Style Programming