Parallel Algorithms

Transcription

Introduction to Parallel Computing (CMSC416 / CMSC818X)Parallel AlgorithmsAbhinav Bhatele, Department of Computer Science

Announcements Assignment 1 is due on Oct 11Assignment 2 will be posted on Oct 11 and due on Oct 18Abhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING2

Matrix multiplicationfor (i 0; i M; i )for (j 0; j N; j )for (k 0; k L; k )C[i][j] ix multiplicationAbhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING3

Matrix multiplicationfor (i 0; i M; i )for (j 0; j N; j )for (k 0; k L; k )C[i][j] A[i][k]*B[k][j];Any performance issues for large arrays?https://en.wikipedia.org/wiki/Matrix multiplicationAbhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING3

Blocking to improve cache performance Create smaller blocks that fit in cache: leads to cache reuseC12 A10 * B02 A11 * B12 A12 * B22 A13 * C33Abhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING4

Parallel matrix multiply Store A and B in a distributed mannerCommunication between processes to get the right sub-matrices to each processEach process computes a portion of CAbhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING5

Cannon’s 2D matrix 15A30A31A32A33B30B31B32B332D process gridAbhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING6

Cannon’s 2D matrix 15A30A31A32A33B30B31B32B332D process gridInitial skewAbhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING6

Cannon’s 2D matrix 1A3332B30B0131B1232B23332D process gridInitial skewAbhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING6

Cannon’s 2D matrix 030B110131B221232B33232D process gridShift-by-1Abhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING6

Agarwal’s 3D matrix multiply Copy A to all i-k planes and B to all j-k planesk93D process jk012345678iA00A01A02A10A11A12A20A21A22Abhinav Bhatele (CMSC416 / 0B01B11B21LIVE RECORDING7

Agarwal’s 3D matrix multiply Perform a single matrix multiply to calculate partial CAllreduce along i-j planes to calculate final B00k0B10B21B10B20B01B11B21B00B10B20B01B11C01C01A10 * B01A11 * B11A12 * B21C11C21C00C10C20B21Abhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING8

Communication algorithms ReductionAll-to-allAbhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING9

Types of reduction Scalar reduction: every process contributes one number Perform some commutative associate operationVector reduction: every process contributes an array of numbersAbhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING10

Parallelizing reductionAbhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING11

Parallelizing reduction Naive algorithm: every process sends to the rootAbhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING11

Parallelizing reduction Naive algorithm: every process sends to the rootSpanning tree: organize processes in a k-ary treeAbhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING11

Parallelizing reduction Naive algorithm: every process sends to the rootSpanning tree: organize processes in a k-ary treeStart at leaves and send to parentsIntermediate nodes wait to receive data from all their childrenAbhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING11

Parallelizing reduction Naive algorithm: every process sends to the rootSpanning tree: organize processes in a k-ary treeStart at leaves and send to parentsIntermediate nodes wait to receive data from all their childrenNumber of phases: logkpAbhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING11

All-to-all Each process sends a distinct message to every other processNaive algorithm: every process sends the data pair-wise to all other terAbhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING12

Virtual topology: 2D mesh Phase 1: every process sends to its row neighborsPhase 2: every process sends to column neighborsAbhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING13

Virtual topology: hypercube Hypercube is an n-dimensional analog of a square (n 2) and cube (n 3)Special case of k-ary d-dimensional meshhttps://en.wikipedia.org/wiki/HypercubeAbhinav Bhatele (CMSC416 / CMSC818X)LIVE RECORDING14

Abhinav Bhatele5218 Brendan Iribe Center (IRB) / College Park, MD 20742phone: 301.405.4507 / e-mail: bhatele@cs.umd.edu

Parallel Algorithms Abhinav Bhatele, Department of Computer Science Introduction to Parallel Computing (CMSC416 / CMSC818X) Abhinav Bhatele (CMSC416