Martin D. Weinberg UMass Astronomy Mdw@umass

Transcription

A850–Super-quick intro to parallel programmingMartin D. WeinbergUMass Astronomymdw@umass.eduJanuary 23, 2019Lec 0101/21/20 – slide 1

Topics for today Topics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with PythonLec 01A few comments on next week’s labOpenMP and MPI contrastedBrief intro to using OpenMPBrief intro to using MPIBrief intro to using Supercloud01/21/20 – slide 2

OpenMP quick summaryTopics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with Python Uses compiler directives to specify how and where toparallelize 3! omp p a r a l l e l!!Do s t u f f i n p a r a l l e l! omp end p a r a l l e l C/C –#pragmas113Lec 01Fortran–source code comments#pragma omp p a r a l l e l{// Do s t u f f i n p a r a l l e l here}01/21/20 – slide 3

OpenMP quick summaryTopics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with Python Lec 01Uses compiler directives to specify how and where toparallelizeGood for a small set of bottlenecks on a single node Degree of parallelism specificatione x p o r t OMP NUM THREADS 4or13#i n c l u d e omp . h // omp set num threads ( i n t n ) ;omp set num threads (4) ;01/21/20 – slide 3

MPI quick summaryTopics for todayOpenMP quicksummaryMPI quicksummaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with PythonParallelization between nodesExplicit calls to a communication library Language bindings: C/C i n t M P I I n i t ( i n t argv , c ha r a r g c [ ] )Fortran1M P I I n i t (INTEGER i e r r )Python wrappers: mpi4pyQuite complex (100 subroutines) but only smallnumber used frequentlyUser defined parallel distributionLec 0101/21/20 – slide 4

ComparisonTopics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with PythonMPIOpenMPConPro Complex to codeEasy to codeSlow data communicationFast data exchangeProConPorted to many architecturesMany tune-up options for parallelexecutionLec 01Fast memory accessLimited usabilityLimited user control01/21/20 – slide 5

Some OMP Library FunctionsTopics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMPLibrary FunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with PythonThe OMP library provides a number of useful routines. Someof the most commonly used:omp get thread num(): current thread index (0, 1, . . . ) Lec 01omp get num threads(): size of the active teamomp get max threads(): maximum number of threadsomp get num procs(): number of processors available01/21/20 – slide 6

Some OMP Library FunctionsTopics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMPLibrary FunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with Python Lec 01An example code snippet:1 #i n c l u d e i o s tre a m #i n c l u d e omp . h 3i n t main ( )5 {7 #pragma omp p a r a l l e l{9i n t n omp get thread num ( ) ;s t d : : co u t ” G r e e t i n g s from t h r e a d ” n ”\n ” ;11}s t d : : co u t ” G r e e t i n g s from the main t h r e a d !\n ” ;13return 0;15 }01/21/20 – slide 7

Parallel Loops in OpenMPTopics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with Python Lec 01OpenMP provides directives to support parallel loops.The full version1 #pragma omp p a r a l l e l#pragma omp f o r3 f o r ( i 0 ; i n ; i ).Abbreviated version2#pragma omp p a r a l l e l f o rf o r ( i s t a r t ; i end ; i ).01/21/20 – slide 8

Parallel Loops in OpenMPTopics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with Python Lec 01OpenMP provides directives to support parallel loops.The full version1 #pragma omp p a r a l l e l#pragma omp f o r3 f o r ( i 0 ; i n ; i ).Abbreviated version2#pragma omp p a r a l l e l f o rf o r ( i s t a r t ; i end ; i ).01/21/20 – slide 8

Parallel Loops in OpenMPTopics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with Python Lec 01OpenMP provides directives to support parallel loops.The full versionAbbreviated versionThere are some restrictions on the loop, including: The loop has to be of this simple form with start andend computable before the loop A simple comparison test A simple increment or decrement expression Exits with break, goto, or return are not allowed.01/21/20 – slide 8

A full OpenMP exampleConsider the following trivial program:Topics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with Python 1#i n c l u d e cmath 3i n t main ( ){c o n s t i n t n 200;f l o a t a[n ] , b[n ] ;57// I n i t i a l i z ef o r ( i n t i 0; i n ; i ) a [ i ] s q r t ( 0 . 5 i ) ;913// Sum a d j a c e n t elem en t sf o r ( i n t i 1; i n ; i )b [ i ] ( a [ i ] a [ i 1]) / 2 . 0 ;15return 0;11}Lec 0101/21/20 – slide 9

A full OpenMP exampleThis is trivial parallelized as follows:Topics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with Python#i n c l u d e cmath 246i n t main ( ){c o n s t i n t n 200;f l o a t a[n ] , b[n ] ; f o r ( i n t i 0; i n ; i ) a [ i ] s q r t ( 0 . 5 i ) ;81012#pragma omp p a r a l l e l f o rf o r ( i n t i 1; i n ; i ) // i i s p r i v a t e by d e f a u l tb [ i ] ( a [ i ] a [ i 1]) / 2 . 0 ;return 0;14}Lec 0101/21/20 – slide 9

Shared and Private VariablesTopics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared andPrivate VariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with PythonVariables declared before a parallel block can be sharedor privateShared variables are shared among all threadsPrivate variables are local to each thread On entry, values of private variables are undefinedOn exit, values of private variables are undefined By default, all variables declared outside a parallel block aresharedexcept the loop index variable, which is privateVariables declared in a parallel block are always privateVariables can be explicitly declared shared or privateLec 0101/21/20 – slide 10

Shared and Private VariablesTopics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared andPrivate VariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with Python A simple example:1 #pragma omp p a r a l l e l f o rf o r ( i 0 ; i n ; i )3 x[ i ] x[ i ] y[ i ];Here x, y, and n are shared and i is private in the parallelloopWe can make the attributes explicit with1 #pragma omp p a r a l l e l f o r s ha re d ( x , y , n ) p r i v a t e ( i )f o r ( i 0 ; i n ; i )3 x[ i ] x[ i ] y[ i ];or1 #pragma omp p a r a l l e l f o r d e f a u l t ( s ha re d ) p r i v a t e ( i )f o r ( i 0 ; i n ; i )3 x[ i ] x[ i ] y[ i ]The value of i is undefined after the loop.Lec 0101/21/20 – slide 11

MPI commands in a nutshellTopics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands ina nutshellMPI basicorganizationA full OpenMPIexampleHPC with Python Six Basic MPI commandsStart and stop Know yourself and others MPI Comm rank(.)MPI Comm size(.)Message Passing Lec 01MPI Init(.)MPI Finalize(.)MPI Send(.)MPI Recv(.)01/21/20 – slide 12

MPI basic organizationTopics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with Python Data representation is standardized Harnessing processes for a task MPI Communicators (e.g. MPI COMM WORLD)How do we know what to do with the message? MPI TagsHow many processes and processors? Lec 01MPI data types (e.g. MPI INT, MPI FLOAT)mpirun options -np N01/21/20 – slide 13

A full OpenMPI exampleConsider the following program to sum elements in a loop:Topics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with Python 1 #i n c l u d e i o s tre a m 3 i n t main ( i n t argc , cha r a rgv ){5c o n s t i n t max rows 10000000i n t a r r a y [ max rows ] ;7i n t num rows ;9s t d : : co u t ” p l e a s e e n t e r the number o f numbers t o sum : ” ;s t d : : c i n num rows ;11i f ( num rows max rows ) {13s t d : : c e r r ” Too many rows .\ n ” ;e x i t ( 1) ;15}17// i n i t i a l i z e an a r r a yf o r ( i n t i 0; i num rows ; i ) a r r a y [ i ] i ;1921// compute sumdouble sum 0 ;f o r ( i n t i 0; i num rows ; i ) sum a r r a y [ i ] ;23s t d : : co u t ” The grand t o t a l i s : ” sum ”\n ” ;25return 0;27 }Lec 0101/21/20 – slide 14

A full OpenMPI exampleThis is parallelized as follows:Topics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with Python 1 #i n c l u d e i o s tre a m #i n c l u d e mpi . h 3i n t main ( i n t argc , cha r a rgv )5 {c o n s t i n t max rows 100000;// Maximum s i z e a s b e f o r e7c o n s t i n t s e n d d a t a t a g 2001;// MPI message t a gc o n s t i n t r e t u r n d a t a t a g 2002; // MPI message t a g9i n t a r r a y [ max rows ] ;11i n t a r r a y 2 [ max rows ] ;13i n t i e r r M P I I n i t (& argc , &a rgv ) ; // I n i t i a l i z e MPIi n t r o o t p r o ce s s 0;// D e f i n e the r o o t p r o c e s s1517i n t myid , numprocs ;i e r r MPI Comm rank (MPI COMM WORLD, &myid ) ;i e r r MPI Comm size (MPI COMM WORLD, &numprocs ) ;19i f ( myid r o o t p r o c e s s ) {2123i n t num rows ;s t d : : co u t ” p l e a s e e n t e r the number o f numbers t o sum : ” ;s t d : : c i n num rows ;252729Lec 01i f ( num rows max rows ) {s t d : : co u t ” Too many numbers .\ n ” ;e x i t ( 1) ;}01/21/20 – slide 14

A full OpenMPI examplei n t a v g r o w s p e r p r o c e s s num rows / numprocs ;Topics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with Python 3234363840// Send a p o r t i o n//f o r ( i n t nid 1;int start row i n t end row o f the b e c t o r t o each c h i l d p r o c e s snid numprocs ; n i d ) {n i d a v g r o w s p e r p r o c e s s 1 ;( n i d 1) a v g r o w s p e r p r o c e s s ;42i f ( num rows end row a v g r o w s p e r p r o c e s s ) end row num rows 1 ;44i n t num rows to send end row s t a r t r o w 1 ;46i e r r MPI Send ( &num rows to send , 1 , MPI INT ,nid , s e n d d a t a t a g , MPI COMM WORLD) ;48i e r r MPI Send ( &a r r a y [ s t a r t r o w ] , num rows to send , MPI INT ,nid , s e n d d a t a t a g , MPI COMM WORLD) ;50}5254lo n g i n t sum 0 , p a r t i a l s u m ;f o r ( i n t i 0; i a v g r o w s p e r p r o c e s s 1 ; i ) sum a r r a y [ i ] ;56MPI Status s t a t u s ;58f o r ( i n t n i d 1; nid numprocs ; n i d ) {i e r r MPI Recv ( &p a r t i a l s u m , 1 , MPI LONG , MPI ANY SOURCE ,r e t u r n d a t a t a g , MPI COMM WORLD, &s t a t u s ) ;60Lec 01// I n i t i a l i z e a s b e f o r ef o r ( i n t i 0; i num rows ; i ) a r r a y [ i ] i 1 ;01/21/20 – slide 14

A full OpenMPI examplesum p a r t i a l s u m ;Topics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with Python }626466s t d : : co u t ” The grand t o t a l i s : ” sum ”\n ” ;}else {68i n t n u m r o w s t o r e c e i v e , n u m ro w s re ce i ve d ;70MPI Status s t a t u s ;72i e r r MPI Recv ( &n u m r o w s t o r e c e i v e , 1 , MPI INT ,r o o t p r o c e s s , s e n d d a t a t a g , MPI COMM WORLD, &s t a t u s ) ;7476i e r r MPI Recv ( &a rra y2 , n u m r o w s t o r e c e i v e , MPI INT ,r o o t p r o c e s s , s e n d d a t a t a g , MPI COMM WORLD, &s t a t u s ) ;78n u m ro w s re ce i ve d n u m r o w s t o r e c e i v e ;80lo n g i n t p a r t i a l s u m 0 ;f o r ( i n t i 0; i n u m ro w s re ce i ve d ; i ) p a r t i a l s u m a r r a y 2 [ i ] ;82i e r r MPI Send ( &p a r t i a l s u m , 1 , MPI LONG , r o o t p r o c e s s ,r e t u r n d a t a t a g , MPI COMM WORLD) ;84}86i e r r MPI Finalize () ;88return 0;90 }Lec 0101/21/20 – slide 14

HPC with PythonThreadsTopics for todayOpenMP quicksummaryMPI quick summaryComparisonSome OMP LibraryFunctionsParallel Loops inOpenMPA full OpenMPexampleShared and PrivateVariablesMPI commands in anutshellMPI basicorganizationA full OpenMPIexampleHPC with PythonNo threads in Python code because of GIL (GlobalIntepreter Lock)C/Fortran functions can be threaded and calledMPISeveral libraries that use MPI under the hood, mostpopular is mpi4py Approximate MPI function compatibility, but slowercommunication because of the extra overheadDask: http://dask.org Lec 01Attempt to implement Numpy and Pandas API forHPC applications01/21/20 – slide 15

nutshell MPI basic organization A full OpenMPI example HPC with Python Lec 01 01/21/20 - slide 3 Uses compiler directives to specify how and where to parallelize Good for a small set of bottlenecks on a single node Degree of parallelism specification export OMP NUM THREADS 4 or 1 #include omp.h 3 // omp set num threads( int n) ; omp .