Introduction To The MATLAB Parallel Toolbox

Transcription

Introduction to the MATLABParallel ToolboxMarinus PenningsOctober 10,2017Texas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

Outline Multi threading in MATLAB Parallel PoolsparforspmddistributedGPU computingCluster ProfilesMATLAB batch commandRemote job submissionTexas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

Short course home page:https://hprc.tamu.edu/training/matlab parallel toolbox.htmlMatlab source codes: On the course home page On ada: /scratch/training/MATLAB-PCT/matlab.zip On terra: /scratch/training/MATLAB-PCT/matlab.zipTexas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

Multi threadingcomputer/nodeTexas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

Multi threadingMATLAB automatically executes a largenumber of operators multi threaded Transparent to user Array/Matrix operations Elementwise operators feature(‘NumThreads’,N); old maxNumCompThreads(N);HPRCplugAverage Desktop/Laptop has 4 to 8cores. HPRC cluster terra has 28 cores(20 on ada, some nodes have 40 cores)Texas A&M Universitycomputer/nodeHigh Performance Research Computing – https://hprc.tamu.edu

Parallel PoolMainMATLABTexas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

Parallel PoolMainMATLABMainMATLABparallel poolworker N 3; p parpool(N);Texas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

Parallel PoolMainMATLAB delete(p)MainMATLABOR delete(gcp);special variableTexas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

parforfor i 1:100d d i;B(i) R(i) cendparfor i 1:100d d i;B(i) R(i) cend(Workers are idle, waiting for more work to do)Texas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

parfor1) Main MATLAB sendsdata and code toworkersparfor i 1:100d d i;B(i) R(i) cendd,cR(1:25)parfor i 1:25d d i;B(i) R(i) cendTexas A&M University(Before sending data, mainMATLAB needs to classifyall the variables in the loop)d,cR(26:50)parfor i 26:50d d i;B(i) R(i) cendd,cR(51:75)d,cR(76:100)parfor i 51:75d d i;B(i) R(i) cendparfor i 76:100d d i;B(i) R(i) cendHigh Performance Research Computing – https://hprc.tamu.edu

parfor1) Main MATLAB sendsdata and code toworkers2) Workers executeassigned iterationsparfor i 1:100d d i;B(i) R(i) cendparfor i 1:25d d i;B(i) R(i) cendTexas A&M University(Main MATLABwaiting for results)parfor i 26:50d d i;B(i) R(i) cendparfor i 51:75d d i;B(i) R(i) cendparfor i 76:100d d i;B(i) R(i) cendHigh Performance Research Computing – https://hprc.tamu.edu

parfor1) Main MATLAB sendsdata and code toworkers2) Workers executeassigned iterations3) Workers send resultsback. Main MATLABcombines resultsparfor i 1:100d d i;B(i) R(i) cendB(1:25)B(1:100)reduce ddparfor i 1:25d d i;B(i) R(i) cendTexas A&M UniversitydB(26:50)parfor i 26:50d d i;B(i) R(i) cenddB(51:75)dB(76:100)parfor i 51:75d d i;B(i) R(i) cendparfor i 76:100d d i;B(i) R(i) cendHigh Performance Research Computing – https://hprc.tamu.edu

parfor1) Main MATLAB sendsdata and code toworkers2) Workers executeassigned iterations3) Workers send resultsback. Main MATLABcombines results4) Main MATLAB getscontrol backTexas A&M Universityparfor i 1:100d d i;B(i) R(i) cend(continues executingstatements after parfor)(Workers will be idle again, waiting for more work to do)High Performance Research Computing – https://hprc.tamu.edu

SPMDSPMD block is aconstruct where allthe workers willexecute the code inthe SPMD blockconcurrently.spmdid labindex;tot numlabs;a ones(tot)*id;enda1 a{1};specialvariables(Workers are idle, waiting for more work to do)Texas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

SPMDspmdid labindex;1) Main MATLAB sendstot numlabs;code block to workersa ones(tot)*idenda1 a{1}id 1;tot 4;a ones(4)*1;Texas A&M Universityid 2;tot 4;a ones(4)*2;id 3;tot 4;a ones(4)*3;id 4;tot 4;a ones(4)*4;High Performance Research Computing – https://hprc.tamu.edu

SPMD1) Main MATLAB sendscode block to workers2) Workers executecode in SPMD blockspmdid labindex;tot numlabs;a ones(tot)*idenda1 a{1}id 1;tot 4;a ones(4)*1;Texas A&M University(Main MATLABwaitingid 2;tot 4;a ones(4)*2;id 3;tot 4;a ones(4)*3;id 4;tot 4;a ones(4)*4;High Performance Research Computing – https://hprc.tamu.edu

SPMD1) Main MATLAB sendscode block to workers2) Workers executecode in SPMD block3) Main MATLAB getscontrol backTexas A&M Universitycompositevariablesspmdid labindex;tot numlabs;a ones(tot)*idenda1 a{1}id, tot, aworkspaceid, tot, aid, tot, aid, tot, aid, tot, aworkspaceworkspaceworkspaceworkspaceHigh Performance Research Computing – https://hprc.tamu.edu

SPMD Communication labSend(var,id) sends variable “var” toworker “id” (Main MATLABwaitingvar labReceive(id) receives data from “id”and assigns to “var” vf labSendReceive(it,if,vt) sends “vt” to“it”,receive data from “if” and assign to “vf”Sending id 4spmdid labindex; n numlabs;if (id n)labSend(1,id);else if (id 1)id labReceive(n);endTexas A&M UniversityId labReceive(4);%empty% emptylabSend(4,1);High Performance Research Computing – https://hprc.tamu.edu

Distributed ArraysConceptually very similar to a regular array. Many regular matrix operatorsavailable for distributed arrays.Elements can be of any typeElements distributed over the workersMatlab will automatically useparallel version of operator Ifoperand is distributed variableworkspace methods(‘distributed’);workspaceTexas A&M UniversityworkspaceworkspaceworkspaceHigh Performance Research Computing – https://hprc.tamu.edu

Distributed ArraysConceptually very similar to a regular array. Many regular matrix operators b rand(1000);available for distributed arrays. a distributed(b);Elements can be of any typeElements distributed over the workersMatlab will automatically useparallel version of operator Ifoperand is distributed variableTexas A&M Universityb, orkspaceworkspaceworkspaceworkspaceHigh Performance Research Computing – https://hprc.tamu.edu

GPU ProgrammingWhat is a GPU? Accelerator card Thousands of small computing cores Dedicated high speed memory.Texas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

GPU Programming MATLAB provides GPU versions for a largenumber of MATLAB operators. Completely transparent to user Matlab will automatically use GPU versionof operator If operand is GPU variable methods(‘gpuArray’);Texas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

Copy to/from GPU function: var2 gpuArray(var1) copyvariable “v1” on host to GPU and name “v2”function: var2 gather(var1) copy variable“v1” on GPU to host and assign to variable“var2”Use convenient functions to create data directlyon the GPU a zeros(100); ag gpuArray(a); bg gpuArray.rand(100);Texas A&M Universityaworkspaceag, bgworkspaceHigh Performance Research Computing – https://hprc.tamu.edu

parpools revisitedWhat if you want more workers than cores/nodes?parallel poolMaincomputer/nodeTexas A&M Universitycomputer/nodeRemember fromcreating parpool,didn’t providecluster profile, sousing default ‘local’profile!High Performance Research Computing – https://hprc.tamu.edu

parpools revisitedWhat if you want more workers than cores/nodes?parallel poolMaincomputer/nodecomputer/nodeor want to distribute workers over multiple nodes?(e.g. so each worker can use more threads)MainRemember fromcreating parpool,didn’t providecluster profile, sousing default ‘local’profile!parallel poolcomputer/nodecomputer/nodeTexas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

parpools revisitedWhat if you want more workers than cores/nodes?Local Profile:parallel poolMain computer/nodecomputer/nodeor want to distribute workers over multiple nodes? (e.g. so each worker can use more threads)Mainparallel poolcomputer/node Workers must run on samecomputer/node as mainMATLABLimited to the number ofcores on a computer/node 28 workers on terra (20 on ada)Part of the MATLAB ParallelToolboxcomputer/nodeTexas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

OnlyonHPRCCluster Profile Number of workers only limited by license (currently 96, shared) Integrates with Batch scheduler (e.g. slurm and lsf) Will actually submit lsf/slurm jobs Workers will be running on the compute nodesMDCSlicenseTexas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

Importing Cluster ProfileOnlyonHPRC profile parallel. importProfile( PATH ); Only need to import cluster profile oncePre-created profile located in MATLABDIR/profiles/TAMUtamu convenience function: tamu import TAMU clusterprofile(); Wrapper around parallel.clusterProfile()No need to provide location of pre-created profileCreates directory in scratch directory to store meta dataTexas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

OnlyonHPRCCluster PropertiesHow to attach properties (e.g. workers/threads/time)?Defining cluster properties: help TAMUClusterProperties % to see all options tp TAMUClusterProperties(); tp.workers(4); tp.walltime(’02:00);HPRCDevelopedAttaching properties to cluster profile:% attach the properties to the profile profile tp.tamu set profile properties(tp);Texas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

MATLAB batch functionOffloads a script or function to worker(s), control is returnedimmediately, Job object is returned j batch(cluster obj ,’myscript’,’Pool’,N); % offloads script (start pool) j batch( cluster obj , @myfunc,N,{x1,x2}); % offloads function j tamu run batch(tp,’myscript’);HPRCDevelopedRetrieving Job info and results: r j.State(); % j.wait(); % offloads script (start pool) j.load(); % offloads script (start pool) res j.fetchOutputs(); % offloads functionTexas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

HPRCDevelopedRemote batch submissionSubmit jobs from user’s local MATLAB session (laptop/desktop)from Matlab command lineusing the MATLAB app tp TAMUClusterProperties(); tp.hostname(‘terra.tamu.edu’); tp.user(‘ netid ’); j1 tamu run batch(tp,’mytest’);% or run a function cp tamu set profile properties(tp); j2 batch(cp,@myfun,1,{a,b});Download framework and app at ://hprc.tamu.edu/wiki/SW:Matlab#Running .28parallel.29 Matlab Scripts on HPRC compute nodesTexas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

Questions?For additional information/help:See Wiki: https://hprc.tamu.edu/wiki/SW:MatlabSend email: help@hprc.tamu.eduTexas A&M UniversityHigh Performance Research Computing – https://hprc.tamu.edu

computer/node as main MATLAB Limited to the number of cores on a computer/node 28 workers on terra (20 on ada) Part of the MATLAB Parallel Toolbox Main parallel pool Main parallel pool or want to distribute workers over multiple nodes? (e.g. so each worker can use more threads) computer/node computer/node