Hardware And Software Lecture 6: Deep Learning Hardware .

Transcription

Lecture 6:Hardware and SoftwareDeep Learning Hardware, Dynamic & Static ComputationalGraph, PyTorch & TensorFlowFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 1April 15, 2021

AdministrativeAssignment 1 is due tomorrow April 16th, 11:59pm.Assignment 2 will be out tomorrow, due April 30th, 11:50 pm.Project proposal due Monday April 19.Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 2April 15, 2021

AdministrativeFriday’s section topic: course projectFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 3April 15, 2021

two more layers to go: POOL/FCFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 4April 15, 2021

Convolution Layers (continue from last time)563x3 CONV,stride 1,padding 1n filters 64565632Fei-Fei Li, Ranjay Krishna, Danfei Xu5664Lecture 6 - 5April 15, 2021

Pooling layer-makes the representations smaller and more manageableoperates over each activation map independently:Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 6April 15, 2021

MAX POOLINGSingle depth slicex1124567832101234max pool with 2x2 filtersand stride 26834yFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 7April 15, 2021

Pooling layer: summaryLet’s assume input is W1 x H1 x CConv layer needs 2 hyperparameters:- The spatial extent F- The stride SThis will produce an output of W2 x H2 x C where:- W2 (W1 - F )/S 1- H2 (H1 - F)/S 1Number of parameters: 0Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 8April 15, 2021

Fully Connected Layer (FC layer)-Contains neurons that connect to the entire input volume, as in ordinary NeuralNetworksFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 9April 15, 2021

Lecture 6:Hardware and SoftwareDeep Learning Hardware, Dynamic & Static ComputationalGraph, PyTorch & TensorFlowFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 10April 15, 2021

Where we are now.Computational graphsx*s (scores)hingeloss LWRFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 11April 15, 2021

Where we are now.Convolutional Neural NetworksIllustration of LeCun et al. 1998 from CS231n 2017 Lecture 1Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 12April 15, 2021

Where we are now (more on optimization in lecture 8)Learning network parameters through optimizationLandscape image is CC0 1.0 public domainWalking man image is CC0 1.0 public domainFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 13April 15, 2021

Today- Deep learning hardware- CPU, GPU- Deep learning software- PyTorch and TensorFlow- Static and Dynamic computation graphsFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 14April 15, 2021

Deep LearningHardwareFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 15April 15, 2021

Inside a computerFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 16April 15, 2021

Spot the CPU!(central processing unit)This image is licensed under CC-BY 2.0Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 17April 15, 2021

Spot the GPUs!(graphics processing unit)This image is licensed under CC-BY 2.0Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 18April 15, 2021

CPU vs GPUCoresClockSpeedMemoryPriceSpeed (throughput)CPU(Intel Corei9-7900k)104.3GHzSystemRAM 385 640 GFLOPS FP32GPU(NVIDIARTX 3090)104961.6GHz24 GBGDDR6X 1499 35.6 TFLOPS FP32Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 19CPU: Fewer cores,but each core ismuch faster andmuch morecapable; great atsequential tasksGPU: More cores,but each core ismuch slower and“dumber”; great forparallel tasksApril 15, 2021

Example: Matrix MultiplicationAxBAxCBxC cuBLAS::GEMM (GEneral Matrix-to-matrix Multiply)Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 20April 15, 2021

CPU vs GPU in practice66x67x(CPU performance notwell-optimized, a little unfair)71x64x76xData from https://github.com/jcjohnson/cnn-benchmarksFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 21April 15, 2021

CPU vs GPU in practice2.8x3.0xcuDNN much faster than“unoptimized” CUDA3.1x3.4x2.8xData from https://github.com/jcjohnson/cnn-benchmarksFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 22April 15, 2021

Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 23April 15, 2021

NVIDIAFei-Fei Li, Ranjay Krishna, Danfei XuvsAMDLecture 6 - 24April 15, 2021

NVIDIAFei-Fei Li, Ranjay Krishna, Danfei XuvsAMDLecture 6 - 25April 15, 2021

CPU vs GPUCoresClockSpeedMemoryPriceSpeedCPU(Intel Corei7-7700k)104.3 GHzSystemRAM 385 640 GFLOPs FP32GPU(NVIDIARTX 3090)104961.6 GHz24 GBGDDR6X 1499 35.6 TFLOPs FP32GPU(Data Center)NVIDIA A1006912 CUDA,432 Tensor1.5 GHz40/80GBHBM2 3/hr(GCP) 9.7 TFLOPs FP64 20 TFLOPs FP32 312 TFLOPs FP16TPUGoogle CloudTPUv32 Matrix Units(MXUs) percore, 4 cores?128 GBHBM 8/hr(GCP) 420 TFLOPs(non-standard FP)Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 26CPU: Fewer cores,but each core ismuch faster andmuch morecapable; great atsequential tasksGPU: More cores,but each core ismuch slower and“dumber”; great forparallel tasksTPU: Specializedhardware for deeplearningApril 15, 2021

Programming GPUs CUDA (NVIDIA only) Write C-like code that runs directly on the GPU Optimized APIs: cuBLAS, cuFFT, cuDNN, etc OpenCL Similar to CUDA, but runs on anything Usually slower on NVIDIA hardware HIP https://github.com/ROCm-Developer-Tools/HIP New project that automatically converts CUDA code tosomething that can run on AMD GPUs Stanford CS 149: http://cs149.stanford.edu/fall20/Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 27April 15, 2021

CPU / GPU CommunicationModelis hereFei-Fei Li, Ranjay Krishna, Danfei XuData is hereLecture 6 - 28April 15, 2021

CPU / GPU CommunicationModelis hereData is hereIf you aren’t careful, training canbottleneck on reading data andtransferring to GPU!Solutions:- Read all data into RAM- Use SSD instead of HDD- Use multiple CPU threadsto prefetch dataFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 29April 15, 2021

Deep LearningSoftwareFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 30April 15, 2021

A zoo of frameworks!CaffeCaffe2(UC Berkeley)(Facebook)mostly features absorbedby PyTorchTorchPyTorch(NYU / erred Networks)The company has officially migrated its researchinfrastructure to PyTorchMXNetCNTK(Amazon)Developed by U Washington, CMU, MIT,Hong Kong U, etc but main framework ofchoice at AWS(Microsoft)JAXTheanoTensorFlow(U Montreal)(Google)(Google)And others.Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 31April 15, 2021

A zoo of frameworks!CaffeCaffe2(UC Berkeley)(Facebook)PyTorch(NYU / Facebook)(Facebook)Chainer(Baidu)(Preferred Networks)The company has officially migrated its researchinfrastructure to PyTorchMXNetmostly features absorbedby PyTorchTorchPaddlePaddleCNTK(Amazon)Developed by U Washington, CMU, MIT,Hong Kong U, etc but main framework ofchoice at AWS(Microsoft)JAXTheanoTensorFlow(U Montreal)(Google)(Google)We’ll focus on theseFei-Fei Li, Ranjay Krishna, Danfei XuAnd others.Lecture 6 - 32April 15, 2021

Recall: Computational Graphsx*s (scores)hingeloss LWRFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 33April 15, 2021

Recall: Computational Graphsinput imageweightslossFigure copyright Alex Krizhevsky, Ilya Sutskever, andGeoffrey Hinton, 2012. Reproduced with permission.Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 34April 15, 2021

Recall: Computational Graphsinput imagelossFigure reproduced with permission from a Twitter post by Andrej Karpathy.Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 35April 15, 2021

The point of deep learning frameworks(1) Quick to develop and test new ideas(2) Automatically compute gradients(3) Run it all efficiently on GPU (wrap cuDNN, cuBLAS,OpenCL, etc)Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 36April 15, 2021

Computational GraphsNumpyxyz*a bΣcFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 37April 15, 2021

Computational GraphsNumpyxyz*a bΣcFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 38April 15, 2021

Computational GraphsNumpyxyz*a bΣcFei-Fei Li, Ranjay Krishna, Danfei XuGood:Clean API, easy towrite numeric codeBad:- Have to computeour own gradients- Can’t run on GPULecture 6 - 39April 15, 2021

Computational GraphsNumpyxyzPyTorch*a bΣcFei-Fei Li, Ranjay Krishna, Danfei XuLooks exactly like numpy!Lecture 6 - 40April 15, 2021

Computational GraphsNumpyxyzPyTorch*a bΣcFei-Fei Li, Ranjay Krishna, Danfei XuPyTorch handles gradients for us!Lecture 6 - 41April 15, 2021

Computational GraphsNumpyxyzPyTorch*a bΣcFei-Fei Li, Ranjay Krishna, Danfei XuTrivial to run on GPU - just constructarrays on a different device!Lecture 6 - 42April 15, 2021

PyTorch(More details)Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 43April 15, 2021

PyTorch: Fundamental Conceptstorch.Tensor: Like a numpy array, but can run on GPUtorch.autograd: Package for building computational graphsout of Tensors, and automatically computing gradientstorch.nn.Module: A neural network layer; may store stateor learnable weightsFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 44April 15, 2021

PyTorch: VersionsFor this class we are using PyTorch version 1.7Major API change in release 1.0Be careful if you are looking at older PyTorch code ( 1.0)!Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 45April 15, 2021

PyTorch: TensorsRunning example: Traina two-layer ReLUnetwork on random datawith L2 lossFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 46April 15, 2021

PyTorch: TensorsCreate random tensorsfor data and weightsFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 47April 15, 2021

PyTorch: TensorsForward pass: computepredictions and lossFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 48April 15, 2021

PyTorch: TensorsBackward pass:manually computegradientsFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 49April 15, 2021

PyTorch: TensorsGradient descentstep on weightsFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 50April 15, 2021

PyTorch: TensorsTo run on GPU, just use adifferent device!Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 51April 15, 2021

PyTorch: AutogradCreating Tensors withrequires grad True enablesautogradOperations on Tensors withrequires grad True cause PyTorchto build a computational graphFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 52April 15, 2021

PyTorch: AutogradForward pass looks exactlythe same as before, but wedon’t need to trackintermediate values PyTorch keeps track of themfor us in the graphFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 53April 15, 2021

PyTorch: AutogradCompute gradient of losswith respect to w1 and w2Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 54April 15, 2021

PyTorch: AutogradMake gradient step on weights, then zerothem. Torch.no grad means “don’t build acomputational graph for this part”Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 55April 15, 2021

PyTorch: AutogradPyTorch methods that end in underscoremodify the Tensor in-place; methods thatdon’t return a new TensorFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 56April 15, 2021

PyTorch: New Autograd FunctionsDefine your own autogradfunctions by writing forwardand backward functions forTensorsUse ctx object to “cache” values forthe backward pass, just like cacheobjects from A2Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 57April 15, 2021

PyTorch: New Autograd FunctionsDefine your own autogradfunctions by writing forwardand backward functions forTensorsUse ctx object to “cache” values forthe backward pass, just like cacheobjects from A2Define a helper function to make iteasy to use the new functionFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 58April 15, 2021

PyTorch: New Autograd FunctionsCan use our new autogradfunction in the forward passFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 59April 15, 2021

PyTorch: New Autograd FunctionsIn practice you almost never needto define new autograd functions!Only do it when you need custombackward. In this case we can justuse a normal Python functionFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 60April 15, 2021

PyTorch: nnHigher-level wrapper forworking with neural netsUse this! It will make your lifeeasierFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 61April 15, 2021

PyTorch: nnDefine our model as asequence of layers; eachlayer is an object thatholds learnable weightsFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 62April 15, 2021

PyTorch: nnForward pass: feed data tomodel, and compute lossFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 63April 15, 2021

PyTorch: nnForward pass: feed data tomodel, and compute losstorch.nn.functional has usefulhelpers like loss functionsFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 64April 15, 2021

PyTorch: nnBackward pass: computegradient with respect to allmodel weights (they haverequires grad True)Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 65April 15, 2021

PyTorch: nnMake gradient step oneach model parameter(with gradients disabled)Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 66April 15, 2021

PyTorch: optimUse an optimizer fordifferent update rulesFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 67April 15, 2021

PyTorch: optimAfter computing gradients, useoptimizer to update paramsand zero gradientsFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 68April 15, 2021

PyTorch: nnDefine new ModulesA PyTorch Module is a neural netlayer; it inputs and outputs TensorsModules can contain weights or othermodulesYou can define your own Modulesusing autograd!Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 69April 15, 2021

PyTorch: nnDefine new ModulesDefine our whole modelas a single ModuleFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 70April 15, 2021

PyTorch: nnDefine new ModulesInitializer sets up twochildren (Modules cancontain modules)Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 71April 15, 2021

PyTorch: nnDefine new ModulesDefine forward pass usingchild modulesNo need to definebackward - autograd willhandle itFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 72April 15, 2021

PyTorch: nnDefine new ModulesConstruct and train aninstance of our modelFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 73April 15, 2021

PyTorch: nnDefine new ModulesVery common to mix and matchcustom Module subclasses andSequential containersFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 74April 15, 2021

PyTorch: nnDefine new ModulesDefine network componentas a Module subclassFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 75April 15, 2021

PyTorch: nnDefine new ModulesStack multiple instances of thecomponent in a sequentialFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 76April 15, 2021

PyTorch: Pretrained ModelsSuper easy to use pretrained models with i Li, Ranjay Krishna, Danfei XuLecture 6 - 77April 15, 2021

PyTorch: torch.utils.tensorboardA python wrapper aroundTensorflow’s web-basedvisualization tool.This image is licensed under CC-BY 4.0; no changes were made to the imageFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 78April 15, 2021

PyTorch: Computational Graphsinput imagelossFigure reproduced with permission from a Twitter post by Andrej Karpathy.Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 79April 15, 2021

PyTorch: Dynamic Computation GraphsFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 80April 15, 2021

PyTorch: Dynamic Computation Graphsxw1w2yCreate Tensor objectsFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 81April 15, 2021

PyTorch: Dynamic Computation Graphsxw1w2ymmclampmmy predBuild graph data structure ANDperform computationFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 82April 15, 2021

PyTorch: Dynamic Computation Graphsxw1w2ymmclampmmy predpowsumlossFei-Fei Li, Ranjay Krishna, Danfei XuBuild graph data structure ANDperform computationLecture 6 - 83April 15, 2021

PyTorch: Dynamic Computation Graphsxw1w2ymmclampmmy predpowsumlossFei-Fei Li, Ranjay Krishna, Danfei XuSearch for path between loss and w1, w2(for backprop) AND perform computationLecture 6 - 84April 15, 2021

PyTorch: Dynamic Computation Graphsxw1w2yThrow away the graph, backprop path, andrebuild it from scratch on every iterationFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 85April 15, 2021

PyTorch: Dynamic Computation Graphsxw1w2ymmclampmmy predBuild graph data structure ANDperform computationFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 86April 15, 2021

PyTorch: Dynamic Computation Graphsxw1w2ymmclampmmy predpowsumlossFei-Fei Li, Ranjay Krishna, Danfei XuBuild graph data structure ANDperform computationLecture 6 - 87April 15, 2021

PyTorch: Dynamic Computation Graphsxw1w2ymmclampmmy predpowsumlossFei-Fei Li, Ranjay Krishna, Danfei XuSearch for path between loss and w1, w2(for backprop) AND perform computationLecture 6 - 88April 15, 2021

PyTorch: Dynamic Computation GraphsBuilding the graph andcomputing the graph happen atthe same time.Seems inefficient, especially if weare building the same graph overand over again.Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 89April 15, 2021

Static Computation GraphsAlternative: Static graphsStep 1: Build computational graphdescribing our computation(including finding paths forbackprop)Step 2: Reuse the same graph onevery iterationFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 90April 15, 2021

TensorFlowFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 91April 15, 2021

TensorFlow VersionsPre-2.0 (1.14 latest)2.0 Default static graph,optionally dynamicgraph (eager mode).Default dynamic graph,optionally static graph.We use 2.4 in this class.Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 92April 15, 2021

TensorFlow:Neural Net(Pre-2.0)(Assume imports at thetop of each snippet)Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 93April 15, 2021

TensorFlow:Neural Net(Pre-2.0)First definecomputational graphThen run the graphmany timesFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 94April 15, 2021

TensorFlow: 2.0 vs. pre-2.0Tensorflow 2.0 :“Eager” Mode by defaultassert(tf.executing eagerly())Fei-Fei Li, Ranjay Krishna, Danfei XuTensorflow 1.13Lecture 6 - 95April 15, 2021

TensorFlow: 2.0 vs. pre-2.0Tensorflow 2.0 :“Eager” Mode by defaultassert(tf.executing eagerly())Fei-Fei Li, Ranjay Krishna, Danfei XuTensorflow 1.13Lecture 6 - 96April 15, 2021

TensorFlow: 2.0 vs. pre-2.0Tensorflow 2.0 :“Eager” Mode by defaultassert(tf.executing eagerly())Fei-Fei Li, Ranjay Krishna, Danfei XuTensorflow 1.13Lecture 6 - 97April 15, 2021

TensorFlow:Neural NetConvert input numpyarrays to TF tensors.Create weights astf.VariableFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 98April 15, 2021

TensorFlow:Neural NetUse tf.GradientTape()context to builddynamic computationgraph.Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 99April 15, 2021

TensorFlow:Neural NetAll forward-passoperations in thecontexts (includingfunction calls) getstraced for computinggradient later.Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 100April 15, 2021

TensorFlow:Neural NetForward passFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 101April 15, 2021

TensorFlow:Neural Nettape.gradient() uses thetraced computationgraph to computegradient for the weightsFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 102April 15, 2021

TensorFlow:Neural NetBackward passFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 103April 15, 2021

TensorFlow:Neural NetTrain the network: Runthe training step overand over, use gradientto update weightsFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 104April 15, 2021

TensorFlow:Neural NetTrain the network: Runthe training step overand over, use gradientto update weightsFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 105April 15, 2021

TensorFlow:OptimizerCan use an optimizer tocompute gradients andupdate weightsFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 106April 15, 2021

TensorFlow:LossUse predefinedloss functionsFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 107April 15, 2021

Keras: High-LevelWrapperKeras is a layer on top ofTensorFlow, makes commonthings easy to do(Used to be third-party, nowmerged into TensorFlow)Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 108April 15, 2021

Keras: High-LevelWrapperDefine model as asequence of layersGet output bycalling the modelApply gradient to alltrainable variables(weights) in themodelFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 109April 15, 2021

Keras: High-LevelWrapperKeras can handle thetraining loop for you!Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 110April 15, 2021

TensorFlow: High-Level ww.tensorflow.org/api orflow.org/api docs/python/tf/estimator)Sonnet hedocs.io/en/latest/)Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 111April 15, 2021

@tf.function:compile staticgraphtf.function decorator(implicitly) compilespython functions tostatic graph for betterperformanceFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 112April 15, 2021

@tf.function:compile staticgraphRan on Google Colab, April 2020Here we compare theforward-pass time ofthe same model underdynamic graph modeand static graph modeFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 113April 15, 2021

@tf.function:compile staticgraphRan on Google Colab, April 2020Static graph is in theoryfaster than dynamic graph,but the performance gaindepends on the type ofmodel / layer / computationgraph.Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 114April 15, 2021

@tf.function:compile staticgraphRan on Google Colab, April 2020Static graph is in theoryfaster than dynamic graph,but the performance gaindepends on the type ofmodel / layer / computationgraph.Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 115April 15, 2021

Static vs Dynamic: OptimizationWith static graphs,framework canoptimize thegraph for youbefore it runs!The graph you wroteConvEquivalent graph withfused operationsReLUConv ReLUConvConv ReLUReLUConv ReLUConvReLUFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 116April 15, 2021

Static PyTorch: ONNX SupportYou can export a PyTorch model to ONNX (OpenNeural Network Exchange)Run the graph on a dummy input, andsave the graph to a fileWill only work if your model doesn’tactually make use of dynamic graph must build same graph on everyforward passFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 117April 15, 2021

Static PyTorch: ONNX Supportgraph(%0 : Float(64, 1000)%1 : Float(100, 1000)%2 : Float(100)%3 : Float(10, 100)%4 : Float(10)) {%5 : Float(64, 100) onnx::Gemm[alpha 1, beta 1, broadcast 1,transB 1](%0, %1, %2), scope:Sequential/Linear[0]%6 : Float(64, 100) onnx::Relu(%5),scope: Sequential/ReLU[1]%7 : Float(64, 10) onnx::Gemm[alpha 1,beta 1, broadcast 1, transB 1](%6, %3,%4), scope: Sequential/Linear[2]return (%7);}Fei-Fei Li, Ranjay Krishna, Danfei XuAfter exporting to ONNX, canrun the PyTorch model in Caffe2Lecture 6 - 118April 15, 2021

Static PyTorch: ONNX SupportONNX is an open-source standard for neural network modelsGoal: Make it easy to train a network in one framework, then runit in another frameworkSupported by PyTorch, Caffe2, Microsoft CNTK, Apache MXNet(3rd-party support Tensorflow)https://github.com/onnx/onnxFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 119April 15, 2021

Static PyTorch: TorchScriptgraph(%self.1 :torch .torch.nn.modules.module. torch mangle 4.Module,%input : Float(3, 4),%h : Float(3, 4)):%19 :torch .torch.nn.modules.module. torch mangle 3.Module prim::GetAttr[name "linear"](%self.1)%21 : Tensor prim::CallMethod[name "forward"](%19, %input)%12 : int prim::Constant[value 1]() # ipython-input-40-26946221023e :7:0%13 : Float(3, 4) aten::add(%21, %h, %12) # ipython-input-40-26946221023e :7:0%14 : Float(3, 4) aten::tanh(%13) # ipython-input-40-26946221023e :7:0%15 : (Float(3, 4), Float(3, 4)) prim::TupleConstruct(%14, %14)return (%15)Fei-Fei Li, Ranjay Krishna, Danfei XuBuild static graph with torch.jit.traceLecture 6 - 120April 15, 2021

PyTorch vs TensorFlow, Static vs DynamicPyTorchDynamic GraphsStatic: ONNX,TorchScriptFei-Fei Li, Ranjay Krishna, Danfei XuTensorFlowDynamic: EagerStatic: @tf.functionLecture 6 - 121April 15, 2021

Static vs Dynamic: SerializationStaticDynamicOnce graph is built, canserialize it and run itwithout the code thatbuilt the graph!Graph building and executionare intertwined, so alwaysneed to keep code aroundFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 122April 15, 2021

Dynamic Graph Applications- Recurrent networksKarpathy and Fei-Fei, “Deep Visual-Semantic Alignments forGenerating Image Descriptions”, CVPR 2015Figure copyright IEEE, 2015. Reproduced for educational purposes.Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 123April 15, 2021

Dynamic Graph Applications- Recurrent networks- Recursive networksThe cat ate a big ratFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 124April 15, 2021

Dynamic Graph Applications- Recurrent networks- Recursive networks- Modular networksFigure copyright Justin Johnson, 2017. Reproduced with permission.Andreas et al, “Neural Module Networks”, CVPR 2016Andreas et al, “Learning to Compose Neural Networks for Question Answering”, NAACL 2016Johnson et al, “Inferring and Executing Programs for Visual Reasoning”, ICCV 2017Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 125April 15, 2021

Dynamic Graph Applications-Recurrent networksRecursive networksModular Networks(Your creative idea here)Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 126April 15, 2021

Model Parallel vs. Data ParallelModel parallelism:split computationgraph into parts &distribute to GPUs/nodesData parallelism: splitminibatch into chunks &distribute to GPUs/ nodesModel ParallelminibatchData ParallelFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 127April 15, 2021

PyTorch: Data Parallelnn.DataParallelPro: Easy to use (just wrap the model and run training script as normal)Con: Single process & single node. Can be bottlenecked by CPU with large numberof GPUs (8 ).nn.DistributedDataParallelPro: Multi-nodes & multi-process trainingCon: Need to hand-designate device and manually launch training script for eachprocess / nodes.Horovod (https://github.com/horovod/horovod): Supports both PyTorch i Li, Ranjay Krishna, Danfei XuLecture 6 - 128April 15, 2021

TensorFlow: Data low.org/tutorials/distribute/kerasFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 129April 15, 2021

PyTorch vs. TensorFlow: minates-industry/Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 130April 15, 2021

PyTorch vs. TensorFlow: minates-industry/Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 131April 15, 2021

PyTorch vs. TensorFlow: Industry (202) No official survey / study on the comparison. A quick search on a job posting website turns up 2389 searchresults for TensorFlow and 1366 for PyTorch. The trend is unclear. Industry is also known to be slower onadopting new frameworks. TensorFlow mostly dominates mobile deployment / embeddedsystems.Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 132April 15, 2021

My Advice:PyTorch is my personal favorite. Clean API, native dynamic graphsmake it very easy to develop and debug. Can build model using thedefault API then compile static graph using JIT. Lots of researchrepositories are built on PyTorch.TensorFlow’s syntax became a lot more intuitive after 2.0. Notperfect but has huge community and wide usage. Can use sameframework for research and production. Probably use a higher-levelwrapper (Keras, Sonnet, etc.).Fei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 133April 15, 2021

Next Time:Training Neural NetworksFei-Fei Li, Ranjay Krishna, Danfei XuLecture 6 - 134April 15, 2021

Graph, PyTorch & TensorFlow . Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 6 - 2 April 15, 2021 Administrative Assignment 1 is due tomorrow April 16th, 11:59pm. Assignment 2 will be out tomorrow, due Ap