With TensorFlow Google Brain Team Large-Scale Deep Learning G . - Matroid

Transcription

Large-Scale Deep LearningWith TensorFlowJeff DeanGoogle Brain teamg.co/brainIn collaboration with many other people at Google

What is the Google Brain Team? Research team focused on long term artificial intelligenceresearch Mix of computer systems and machine learningresearch expertise Pure ML research, and research in context ofemerging ML application areas: robotics, language understanding, healthcare, .g.co/brain

We Disseminate Our Work in Many Ways By publishing our work See papers at research.google.com/pubs/BrainTeam.html By releasing TensorFlow, our core machine learningresearch system, as an open-source project By releasing implementations of our research models inTensorFlow By collaborating with product teams at Google to get ourresearch into real products

What Do We Really Want? Build artificial intelligence algorithms and systems thatlearn from experience Use those to solve difficult problems that benefit humanity

What do I mean by understanding?

What do I mean by understanding?

What do I mean by understanding?

What do I mean by understanding?Query[ car parts for sale ]

What do I mean by understanding?Query[ car parts for sale ]Document 1 car parking available for a small fee. parts of our floor model inventory for sale.Document 2Selling all kinds of automobile and pickup truck parts,engines, and transmissions.

Example Needs of the Future Which of these eye images shows symptoms of diabeticretinopathy? Find me all rooftops in North America Describe this video in Spanish Find me all documents relevant to reinforcement learning forrobotics and summarize them in German Find a free time for everyone in the Smart Calendar projectto meet and set up a videoconference Robot, please fetch me a cup of tea from the snack kitchen

Growing Use of Deep Learning at Google# of directories containing model description filesAcross manyproducts/areas:AndroidAppsdrug discoveryGmailImage understandingMapsNatural languageunderstandingPhotosRobotics researchSpeechTranslationYouTube many others .

Important Property of Neural NetworksResults get better withmore data bigger models more computation(Better algorithms, new insights and improvedtechniques always help, too!)

AsideMany of the techniques that are successful now weredeveloped 20-30 years agoWhat changed? We now have:sufficient computational resourceslarge enough interesting datasetsUse of large-scale parallelism lets us look ahead manygenerations of hardware improvements, as well

What do you want in a machine learning system? Ease of expression: for lots of crazy ML ideas/algorithmsScalability: can run experiments quicklyPortability: can run on wide variety of platformsReproducibility: easy to share and reproduce researchProduction readiness: go from research to real products

Open, standard software forgeneral machine learningGreat for Deep Learning b.com/tensorflow/tensorflowFirst released Nov 2015Apache 2.0 license

http://tensorflow.org/whitepaper2015.pdf

Preprint: arxiv.org/abs/1605.08695Updated version will appear in OSDI 2016

Strong External AdoptionGitHub Launch Nov. 2015GitHub Launch Sep. 2013GitHub Launch Jan. 2012GitHub Launch Jan. 200850,000 binary installs in 72 hours, 500,000 since November, 2015

Strong External AdoptionGitHub Launch Nov. 2015GitHub Launch Sep. 2013GitHub Launch Jan. 2012GitHub Launch Jan. 200850,000 binary installs in 72 hours, 500,000 since November, 2015Most forked new repo on GitHub in 2015 (despite only being available in Nov, ‘15)

http://tensorflow.org/

Motivations DistBelief (our 1st system) was the first scalable deeplearning system, but not as flexible as we wanted forresearch purposes Better understanding of problem space allowed us tomake some dramatic simplifications Define the industrial standard for machine learning Short circuit the MapReduce/Hadoop inefficiency

TensorFlow: Expressing High-Level ML Computations Core in C Very low overheadCore TensorFlow Execution SystemCPUGPUAndroidiOS.

TensorFlow: Expressing High-Level ML Computations Core in C Very low overheadDifferent front ends for specifying/driving the computation Python and C today, easy to add moreCore TensorFlow Execution SystemCPUGPUAndroidiOS.

TensorFlow: Expressing High-Level ML Computations Core in C Very low overheadDifferent front ends for specifying/driving the computation Python and C today, easy to add more.Python front endC front endCore TensorFlow Execution SystemCPUGPUAndroidiOS.

Computation is a dataflow graphGraph of Nodes, also called Operations or ops.biasesAddweightsMatMulexampleslabelsReluXent

Computation is a dataflow graphEdges are N-dimensional arrays: stenReluXent

Example TensorFlow fragment Build a graph computing a neural net inference.import tensorflow as tffrom tensorflow.examples.tutorials.mnist import input datamnist input data.read data sets('MNIST data', one hot True)x tf.placeholder("float", shape [None, 784])W tf.Variable(tf.zeros([784,10]))b tf.Variable(tf.zeros([10]))y tf.nn.softmax(tf.matmul(x, W) b)

Computation is a dataflow graph'Biases' is a variableetatith swSome ops compute gradients updates biasesbiases.learning rateAdd.Mul

Symbolic Differentiation Automatically add ops to calculate symbolic gradientsof variables w.r.t. loss function. Apply these gradients with an optimization algorithmy tf.placeholder(tf.float32, [None, 10])cross entropy -tf.reduce sum(y * tf.log(y))opt tf.train.GradientDescentOptimizer(0.01)train op opt.minimize(cross entropy)

Define graph and then execute it repeatedly Launch the graph and run the training ops in a loopinit tf.initialize all variables()sess tf.Session()sess.run(init)for i in range(1000):batch xs, batch ys mnist.train.next batch(100)sess.run(train step, feed dict {x: batch xs, y : batch ys})

Computation is a dataflow graphdetubirtisdGPU 0CPUbiasesAdd.learning rate.MulAssignSub

Assign Devices to Ops TensorFlow inserts Send/Recv Ops to transport tensors across devicesRecv ops pull data from Send opsGPU 0CPUbiasesSendRecvAdd.learning rate.MulAssignSub

Assign Devices to Ops TensorFlow inserts Send/Recv Ops to transport tensors across devicesRecv ops pull data from Send opsGPU dRecvlearning rateSend

November 2015

December 2015

February 2016

April 2016

June 2016

Activity

Experiment Turnaround Time and Research Productivity Minutes, Hours: Interactive research! Instant gratification! 1-4 days Tolerable Interactivity replaced by running many experiments in parallel 1-4 weeks High value experiments only Progress stalls 1 month Don’t even try

Data ParallelismParameter ServersModelReplicas.Data.

Data ParallelismParameter ServerspModelReplicas.Data.

Data ParallelismParameter Servers ppModelReplicas.Data.

Data ParallelismParameter Servers pp’ p ppModelReplicas.Data.

Data ParallelismParameter Serversp’ p pp’ModelReplicas.Data.

Data ParallelismParameter Servers p’p’ModelReplicas.Data.

Data ParallelismParameter Servers p’p’’ p’ pp’ModelReplicas.Data.

Data ParallelismParameter Servers p’p’’ p’ pp’ModelReplicas.Data.

Distributed training mechanismsGraph structure and low-level graph primitives (queues) allow us to play withsynchronous vs. asynchronous update algorithms.

Cross process communication is the same! Communication across machines over the network abstracted identically tocross device ing rateSendNo specialized parameter server subsystem!

Image Model Training Time50 GPUs10 GPUs1 GPUHours

Image Model Training Time50 GPUs10 GPUs2.6 hours vs. 79.3 hours (30.5X)Hours1 GPU

Sync converges faster (time to accuracy)Synchronous updates (with backup workers) trains to higher accuracy fasterBetter scaling to more workers (less loss of accuracy)Revisiting Distributed Synchronous SGD, Jianmin Chen, Rajat Monga, SamyBengio, Raal Jozefowicz, ICLR Workshop 2016, arxiv.org/abs/1604.00981

Sync converges faster (time to accuracy)40 hours vs. 50 hoursSynchronous updates (with backup workers) trains to higher accuracy fasterBetter scaling to more workers (less loss of accuracy)Revisiting Distributed Synchronous SGD, Jianmin Chen, Rajat Monga, SamyBengio, Raal Jozefowicz, ICLR Workshop 2016, arxiv.org/abs/1604.00981

General ComputationsAlthough we originally built TensorFlow for our uses arounddeep neural networks, it’s actually quite flexibleWide variety of machine learning and other kinds of numericcomputations easily expressible in the computation graphmodel

Runs on Variety of Platformsphonesdistributed systems of 100sof machines and/or GPU cardssingle machines (CPU and/or GPUs) custom ML hardware

Trend: Much More Heterogeneous hardwareGeneral purpose CPU performance scaling has slowedsignificantlySpecialization of hardware for certain workloads will be moreimportant

Tensor Processing UnitCustom machine learning ASICIn production use for 16 months: used on everysearch query, used for AlphaGo match, .See Google Cloud Platform blog: Google supercharges machine learning tasks with TPU custom chip,by Norm Jouppi, May, 2016

Long Short-Term Memory (LSTMs):Make Your Memory Cells DifferentiableSigmoids[Hochreiter & Schmidhuber, 1997]WWRITE?XRREAD?MYXMFORGET?FY

Example: LSTM [Hochreiter et al, 1997][Gers et al, 1999]Enableslong termdependenciesto flow

Example: LSTMfor i in range(20):m, c LSTMCell(x[i], mprev, cprev)mprev mcprev c

Example: Deep LSTMfor i in range(20):for d in range(4): # d is depthinput x[i] if d is 0 else m[d-1]m[d], c[d] LSTMCell(input, mprev[d], cprev[d])mprev[d] m[d]cprev[d] c[d]

Example: Deep LSTMfor i in range(20):for d in range(4): # d is depthinput x[i] if d is 0 else m[d-1]m[d], c[d] LSTMCell(input, mprev[d], cprev[d])mprev[d] m[d]cprev[d] c[d]

Example: Deep LSTMfor i in range(20):for d in range(4): # d is depthwith tf.device("/gpu:%d" % d):input x[i] if d is 0 else m[d-1]m[d], c[d] LSTMCell(input, mprev[d], cprev[d])mprev[d] m[d]cprev[d] c[d]

GPU6GPU5ABCDABCDGPU480k softmax by1000 dimsThis is very big!Split softmax into4 GPUsGPU31000 LSTM cells2000 dims pertimestepGPU2GPU1ABCDABC2000 x 4 8k dims persentence

GPU6GPU5ABCDABCDGPU480k softmax by1000 dimsThis is very big!Split softmax into4 GPUsGPU31000 LSTM cells2000 dims pertimestepGPU2GPU1ABCDABC2000 x 4 8k dims persentence

GPU6GPU5ABCDABCDGPU480k softmax by1000 dimsThis is very big!Split softmax into4 GPUsGPU31000 LSTM cells2000 dims pertimestepGPU2GPU1ABCDABC2000 x 4 8k dims persentence

GPU6GPU5ABCDABCDGPU480k softmax by1000 dimsThis is very big!Split softmax into4 GPUsGPU31000 LSTM cells2000 dims pertimestepGPU2GPU1ABCDABC2000 x 4 8k dims persentence

GPU6GPU5ABCDABCDGPU480k softmax by1000 dimsThis is very big!Split softmax into4 GPUsGPU31000 LSTM cells2000 dims pertimestepGPU2GPU1ABCDABC2000 x 4 8k dims persentence

GPU6GPU5ABCDABCDGPU480k softmax by1000 dimsThis is very big!Split softmax into4 GPUsGPU31000 LSTM cells2000 dims pertimestepGPU2GPU1ABCDABC2000 x 4 8k dims persentence

GPU6GPU5ABCDABCDGPU480k softmax by1000 dimsThis is very big!Split softmax into4 GPUsGPU31000 LSTM cells2000 dims pertimestepGPU2GPU1ABCDABC2000 x 4 8k dims persentence

GPU6GPU5ABCDABCDGPU480k softmax by1000 dimsThis is very big!Split softmax into4 GPUsGPU31000 LSTM cells2000 dims pertimestepGPU2GPU1ABCDABC2000 x 4 8k dims persentence

GPU6GPU5ABCDABCDGPU480k softmax by1000 dimsThis is very big!Split softmax into4 GPUsGPU31000 LSTM cells2000 dims pertimestepGPU2GPU1ABCDABC2000 x 4 8k dims persentence

GPU6GPU5ABCDABCDGPU480k softmax by1000 dimsThis is very big!Split softmax into4 GPUsGPU31000 LSTM cells2000 dims pertimestepGPU2GPU1ABCDABC2000 x 4 8k dims persentence

GPU6GPU5ABCDABCDGPU480k softmax by1000 dimsThis is very big!Split softmax into4 GPUsGPU31000 LSTM cells2000 dims pertimestepGPU2GPU1ABCDABC2000 x 4 8k dims persentence

GPU6GPU5ABCDABCDGPU480k softmax by1000 dimsThis is very big!Split softmax into4 GPUsGPU31000 LSTM cells2000 dims pertimestepGPU2GPU1ABCDABC2000 x 4 8k dims persentence

What are some ways thatdeep learning is havinga significant impact at Google?All of these examples implemented using TensorFlowor our predecessor system

Speech RecognitionDeepRecurrentNeural NetworkAcoustic Input“How cold isit outside?”Text OutputReduced word errors by more than 30%Google Research Blog - August 2012, August 2015

The Inception Architecture (GoogLeNet, 2014)Going Deeper with ConvolutionsChristian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov,Dumitru Erhan, Vincent Vanhoucke, Andrew RabinovichArXiv 2014, CVPR 2015

Neural Nets: Rapid Progress in Image RecognitionTeamYearPlaceError (top-5)XRCE (pre-neural-net explosion)20111st25.8%Supervision (AlexNet)20121st16.4%Clarifai20131st11.7%GoogLeNet (Inception)20141st6.66%Andrej Karpathy (human)2014N/A5.1%BN-Inception (Arxiv)2015N/A4.9%Inception-v3 task

Google Photos SearchDeepConvolutionalNeural Network“ocean”Automatic TagYour PhotoSearch personal photos without tags.Google Research Blog - June 2013

Google Photos Search

Reuse same model for completelydifferent problemsSame basic model structuretrained on different data,useful in completely different contextsExample: given image predict interesting pixels

www.google.com/sunroofWe have tons of vision problemsImage search, StreetView, Satellite Imagery,Translation, Robotics, Self-driving Cars,

MEDICAL IMAGINGVery good results using similar model fordetecting diabetic retinopathy in retinal images

“Seeing” Go

RankBrain in Google Search RankingQuery: “car parts for sale”,Doc: “Rebuilt transmissions ”DeepNeuralNetworkScore fordoc,querypairQuery & document featuresLaunched in 2015Third most important search ranking signal (of 100s)Bloomberg, Oct 2015: “Google Turning Its Lucrative Web Search Over to AI Machines”

Sequence-to-Sequence ModelTarget sequence[Sutskever & Vinyals & Le NIPS 2014]XYZQXYZvDeep LSTMABCInput sequenceD

Sequence-to-Sequence Model: Machine TranslationTarget sentence[Sutskever & Vinyals & Le NIPS 2014]HowvQuelleestvotreInput sentencetaille? EOS

Sequence-to-Sequence Model: Machine TranslationTarget sentence[Sutskever & Vinyals & Le NIPS 2014]Howtall EOS HowvQuelleestvotreInput sentencetaille?

Sequence-to-Sequence Model: Machine TranslationTarget sentence[Sutskever & Vinyals & Le NIPS 2014]Howtall EOS HowarevQuelleestvotreInput sentencetaille?tall

Sequence-to-Sequence Model: Machine TranslationTarget sentence[Sutskever & Vinyals & Le NIPS 2014]Howtall EOS Howareyou?vQuelleestvotreInput sentencetaille?tallare

Sequence-to-Sequence Model: Machine TranslationAt inference time:Beam search to choose mostprobableover possible output sequences[Sutskever & Vinyals & Le NIPS 2014]vQuelleestvotreInput sentencetaille? EOS

Smart ReplyApril 1, 2009: April Fool’s Day jokeNov 5, 2015: Launched Real ProductFeb 1, 2016: 10% of mobile Inbox replies

Incoming EmailSmart ReplySmallFeed-ForwardNeural NetworkGoogle Research Blog- Nov 2015ActivateSmart Reply?yes/no

Incoming EmailSmart ReplySmallFeed-ForwardNeural NetworkGoogle Research Blog- Nov 2015ActivateSmart Reply?yes/noGenerated RepliesDeepRecurrentNeural Network

Image Captioning[Vinyals et al., CVPR 2015]WAyounggirlasleepAyounggirl

Image Captions ResearchHuman: A young girl asleep onthe sofa cuddling a stuffedbear.Model: A close up of a childholding a stuffed animal.Model: A baby is asleep next toa teddy bear.

Combining Vision with Robotics“Deep Learning for Robots: Learningfrom Large-Scale Interaction”, GoogleResearch Blog, March, 2016“Learning Hand-Eye Coordination forRobotic Grasping with Deep Learningand Large-Scale Data Collection”,Sergey Levine, Peter Pastor, AlexKrizhevsky, & Deirdre Quillen,Arxiv, arxiv.org/abs/1603.02199

How Can You Get Started with Machine Learning?Three ways, with varying complexity:(1) Use a Cloud-based API (Vision, Speech, etc.)(2) Use an existing model architecture, andretrain it or fine tune on your dataset(3) Develop your own machine learning modelsfor new problemsMoreflexible,but moreeffortrequired

Use Cloud-based echcloud.google.com/visioncloud.google.com/text

Use Cloud-based echcloud.google.com/visioncloud.google.com/text

Google Cloud Vision APIhttps://cloud.google.com/vision/

Google Cloud MLScaled service for training and inference w/TensorFlow

A Few TensorFlow Community Examples(From more than 2000 results for ‘tensorflow’ on GitHub) .DQN: github.com/nivwusquorum/tensorflow-deepqNeuralArt: github.com/woodrush/neural-art-tfChar RNN: github.com/sherjilozair/char-rnn-tensorflowKeras ported to TensorFlow: github.com/fchollet/kerasShow and Tell: github.com/jazzsaxmafia/show and tell.tensorflowMandarin translation: github.com/jikexueyuanwiki/tensorflow-zh

A Few TensorFlow Community Examples(From more than 2000 2100 results for ‘tensorflow’ on GitHub) .DQN: github.com/nivwusquorum/tensorflow-deepqNeuralArt: github.com/woodrush/neural-art-tfChar RNN: github.com/sherjilozair/char-rnn-tensorflowKeras ported to TensorFlow: github.com/fchollet/kerasShow and Tell: github.com/jazzsaxmafia/show and tell.tensorflowMandarin translation: github.com/jikexueyuanwiki/tensorflow-zh

github.com/nivwusquorum/tensorflow-deepq

github.com/woodrush/neural-art-tf

github.com/sherjilozair/char-rnn-tensorflow

github.com/fchollet/keras

github.com/jazzsaxmafia/show and tell.tensorflow

github.com/jikexueyuanwiki/tensorflow-zh

What Does the Future Hold?Deep learning usage will continue to grow and accelerate: Across more and more fields and problems: robotics, self-driving vehicles, . health care video understanding dialogue systems personal assistance .

ConclusionsDeep neural networks are making significant strides inunderstanding:In speech, vision, language, search, robotics, If you’re not considering how to use deep neural nets to solveyour vision or understanding problems, you almost certainlyshould be

Further Reading Dean, et al., Large Scale Distributed Deep Networks, NIPS 2012,research.google.com/archive/large deep networks nips2012.html.Mikolov, Chen, Corrado & Dean. Efficient Estimation of Word Representations in VectorSpace, NIPS 2013, arxiv.org/abs/1301.3781.Sutskever, Vinyals, & Le, Sequence to Sequence Learning with Neural Networks, NIPS,2014, arxiv.org/abs/1409.3215.Vinyals, Toshev, Bengio, & Erhan. Show and Tell: A Neural Image Caption Generator.CVPR 2015. arxiv.org/abs/1411.4555TensorFlow white paper, tensorflow.org/whitepaper2015.pdf (clickable links in bibliography)g.co/brain (We’re hiring! Also check out Brain Residency program at eam.htmlQuestions?

By releasing TensorFlow, our core machine learning research system, as an open-source project By releasing implementations of our research models in TensorFlow By collaborating with product teams at Google to get our research into real products