Deep Learning With

Transcription

Deep Learning withTensorFlow 2 and KerasSecond EditionRegression, ConvNets, GANs, RNNs, NLP,and more with TensorFlow 2 and the Keras APIAntonio GulliAmita KapoorSujit PalBIRMINGHAM - MUMBAI

Deep Learning with TensorFlow 2 and KerasSecond EditionCopyright 2019 Packt PublishingAll rights reserved. No part of this book may be reproduced, stored in a retrievalsystem, or transmitted in any form or by any means, without the prior writtenpermission of the publisher, except in the case of brief quotations embedded incritical articles or reviews.Every effort has been made in the preparation of this book to ensure the accuracyof the information presented. However, the information contained in this book issold without warranty, either express or implied. Neither the authors, nor PacktPublishing or its dealers and distributors, will be held liable for any damages causedor alleged to have been caused directly or indirectly by this book.Packt Publishing has endeavored to provide trademark information about all of thecompanies and products mentioned in this book by the appropriate use of capitals.However, Packt Publishing cannot guarantee the accuracy of this information.Commissioning Editor: Amey VarangaonkarAcquisition Editors: Yogesh Deokar, Ben Renow-ClarkeAcquisition Editor – Peer Reviews: Suresh JainContent Development Editor: Ian HoughTechnical Editor: Gaurav GavasProject Editor: Janice GonsalvesProofreader: Safis EditingIndexer: Rekha NairPresentation Designer: Sandip TadgeFirst published: April 2017Second edition: December 2019Production reference: 2130320Published by Packt Publishing Ltd.Livery Place35 Livery StreetBirmingham B3 2PB, UK.ISBN 978-1-83882-341-2www.packt.com

packt.comSubscribe to our online digital library for full access to over 7,000 books and videos,as well as industry leading tools to help you plan your personal development andadvance your career. For more information, please visit our website.Why subscribe? Spend less time learning and more time coding with practical eBooks andVideos from over 4,000 industry professionals Learn better with Skill Plans built especially for you Get a free eBook or video every month Fully searchable for easy access to vital information Copy and paste, print, and bookmark contentDid you know that Packt offers eBook versions of every book published, with PDFand ePub files available? You can upgrade to the eBook version at www.Packt.comand as a print book customer, you are entitled to a discount on the eBook copy. Getin touch with us at customercare@packtpub.com for more details.At www.Packt.com, you can also read a collection of free technical articles, sign upfor a range of free newsletters, and receive exclusive discounts and offers on Packtbooks and eBooks.

ContributorsAbout the authorsAntonio Gulli has a passion for establishing and managing global technologicaltalent, for innovation and execution. His core expertise is in cloud computing, deeplearning, and search engines. Currently, he serves as Engineering Director for theOffice of the CTO, Google Cloud. Previously, he served as Google Warsaw Siteleader, doubling the size of the engineering site.So far, Antonio has been lucky enough to gain professional experience in 4countries in Europe and has managed teams in 6 countries in EMEA and the US: inAmsterdam, as Vice President for Elsevier, a leading scientific publisher; in London,as Engineering Site Lead for Microsoft working on Bing Search as CTO for Ask.com;and in several co-funded start-ups including one of the first web search companiesin Europe.Antonio has co-invented a number of technologies for search, smart energy, theenvironment, and AI, with 20 patents issued/applied, and he has published severalbooks about coding and machine learning, also translated into Japanese and Chinese.Antonio speaks Spanish, English, and Italian, and he is currently learning Polishand French. Antonio is a proud father of 2 boys, Lorenzo, 18, and Leonardo, 13,and a little queen, Aurora, 9.

I want to thank my kids, Aurora, Leonardo, and Lorenzo, for motivatingand supporting me during all the moments of my life. Special thanks to myparents, Elio and Maria, for being there when I need it. I'm particularlygrateful to the important people in my life: Eric, Francesco, Antonello,Antonella, Ettore, Emanuela, Laura, Magda, and Nina.I want to thank all my colleagues at Google for their encouragement inwriting this and previous books, for the precious time we've spent together,and for their advice: Behshad, Wieland, Andrei, Brad, Eyal, Becky, Rachel,Emanuel, Chris, Eva, Fabio, Jerzy, David, Dawid, Piotr, Alan, and manyothers. I'm especially appreciative of all my colleagues at OCTO, at theOffice of the CTO at Google, and I'm humbled to be part of a formidableand very talented team. Thanks, Jonathan and Will.Thanks to my high school friends and professors who inspired me overmany years (D'africa and Ferragina in particular). Thanks to the reviewerfor their thoughtful comments and efforts toward improving this book, andmy co-authors for their passion and energy.This book has been written in six different nations: Warsaw, Charlotte Bar;Amsterdam, Cafe de Jaren; Pisa, La Petite; Pisa, Caffe i Miracoli; Lucca,Piazza Anfiteatro, Tosco; London, Said; London, Nespresso, and Paris,Laduree. Lots of travel and lots of good coffee in a united Europe!

Amita Kapoor is an associate professor in the Department of Electronics,SRCASW, University of Delhi, and has been actively teaching neural networks andartificial intelligence for the last 20 years. Coding and teaching are her two passions,and she enjoys solving challenging problems. She is a recipient of the DAADSandwich fellowship 2008, and the Best Presentation Award at an internationalconference, Photonics 2008. She is an avid reader and learner. She has co-authoredbooks on Deep Learning and has more than 50 publications in international journalsand conferences. Her present research areas include machine learning, deepreinforcement learning, quantum computers, and robotics.To my grandmother the late Kailashwati Maini for her unconditional loveand affection; and my grandmother the late Kesar Kapoor for her marvelousstories that fueled my imagination; my mother, the late SwarnlataKapoor, for having trust in my abilities and dreaming for me; and mystepmother, the late Anjali Kapoor, for teaching me every struggle canbe a stepping stone.I am grateful to my teachers throughout life, who inspired me, encouragedme, and most importantly taught me: Prof. Parogmna Sen, Prof. WolfgangFreude, Prof. Enakshi Khullar Sharma, Dr. S Lakshmi Devi, Dr. RashmiSaxena and Dr. Rekha Gupta.I am extremely thankful to the entire Packt team for the work and effortthey put in since the inception of this book, the reviewers who painstakinglywent through the content and verified the codes; their comments andsuggestions helped improve the book. I am particularly thankful to my coauthors Antonio Gulli and Sujit Pal for sharing their vast experience withme in the writing of this book.I would like to thank my college administration, governing body andPrincipal Dr. Payal Mago for sanctioning my Sabbatical leave so that Ican concentrate on the book. I would also like to thank my colleagues forthe support and encouragement they have provided, with a special mentionof Dr. Punita Saxena, Dr. Jasjeet Kaur, Dr. Ratnesh Saxena, Dr. DayaBhardwaj, Dr. Sneha Kabra, Dr. Sadhna Jain, Mr. Projes Roy, Ms. VenikaGupta and Ms. Preeti Singhal.I want to thank my family members and friends my extended familyKrishna Maini, Suraksha Maini, the late HCD Maini, Rita Maini, NirjaraJain, Geetika Jain, Rashmi Singh and my father Anil Mohan Kapoor.And last but not the least I would like to thank Narotam Singh for hisinvaluable discussions, inspiration and unconditional support through allphases of my life.A part of the royalties of the book will go to smilefoundation.org.

Sujit Pal is a Technology Research Director at Elsevier Labs, an advancedtechnology group within the Reed-Elsevier Group of companies. His areas ofinterest include Semantic Search, Natural Language Processing, Machine Learning,and Deep Learning. At Elsevier, he has worked on several machine learninginitiatives involving large image and text corpora, and other initiatives aroundrecommendation systems and knowledge graph development. He has previouslyco-authored another book on Deep Learning with Antonio Gulli and writes abouttechnology on his blog Salmon Run.I would like to thank both my co-authors for their support and for makingthis authoring experience a productive and pleasant one, the editorialteam at Packt who were constantly there for us with constructive help andsupport, and my family for their patience. It has truly taken a village, andthis book would not have been possible without the passion and hard workfrom everyone on the team.

About the reviewersHaesun Park is a machine learning Google Developer Expert. He has been asoftware engineer for more than 15 years. He has written and translated severalbooks on machine learning. He is an entrepreneur, and currently runs his ownbusiness.Other books Haesun has worked on include the translation of Hands-On MachineLearning with Scikit-Learn and TensorFlow, Python Machine Learning, and DeepLearning with Python.I would like to thank Suresh Jain who proposed this work to me, andextend my sincere gratitude to Janice Gonsalves, who provided mewith a great deal of support in the undertaking of reviewing this book.Dr. Simeon Bamford has a background in AI. He is specialized in neural andneuromorphic engineering, including neural prosthetics, mixed-signal CMOSdesign for spike-based learning, and machine vision with event-based sensors.He has used TensorFlow for natural language processing and has experience indeploying TensorFlow models on serverless cloud platforms.

Table of ContentsPrefaceChapter 1: Neural Network Foundations with TensorFlow 2.0xi1What is TensorFlow (TF)?What is Keras?What are the most important changes in TensorFlow 2.0?Introduction to neural networksPerceptronA first example of TensorFlow 2.0 codeMulti-layer perceptron – our first example of a networkProblems in training the perceptron and their solutionsActivation function – sigmoidActivation function – tanhActivation function – ReLUTwo additional activation functions – ELU and LeakyReLUActivation functionsIn short – what are neural networks after all?A real example – recognizing handwritten digitsOne-hot encoding (OHE)Defining a simple neural network in TensorFlow 2.0Running a simple TensorFlow 2.0 net and establishing a baselineImproving the simple net in TensorFlow 2.0 with hidden layersFurther improving the simple net in TensorFlow with DropoutTesting different optimizers in TensorFlow 2.0Increasing the number of epochsControlling the optimizer learning rateIncreasing the number of internal hidden neurons1335678910101112131314141520212426323334[i]

Table of ContentsIncreasing the size of batch computationSummarizing experiments run for recognizing handwritten chartsRegularizationAdopting regularization to avoid overfittingUnderstanding BatchNormalizationPlaying with Google Colab – CPUs, GPUs, and TPUsSentiment analysisHyperparameter tuning and AutoMLPredicting outputA practical overview of backpropagationWhat have we learned so far?Towards a deep learning approachReferencesChapter 2: TensorFlow 1.x and 2.xUnderstanding TensorFlow 1.xTensorFlow 1.x computational graph program structureComputational graphs3536363638394245454648484951515152Working with constants, variables, and placeholdersExamples of operationsConstantsSequencesRandom tensorsVariables545555565657An example of TensorFlow 1.x in TensorFlow 2.xUnderstanding TensorFlow 2.xEager executionAutoGraphKeras APIs – three programming modelsSequential APIFunctional APIModel subclassing5960606163636466CallbacksSaving a model and weightsTraining from tf.data.datasetstf.keras or Estimators?Ragged tensorsCustom trainingDistributed training in TensorFlow 2.x67686972747476Multiple rServerStrategy76787878Changes in namespaces79[ ii ]

Table of ContentsConverting from 1.x to 2.xUsing TensorFlow 2.x effectivelyThe TensorFlow 2.x ecosystemLanguage bindingsKeras or tf.keras?Summary808081828384Chapter 3: Regression87What is regression?Prediction using linear regressionSimple linear regressionMultiple linear regressionMultivariate linear regressionTensorFlow EstimatorsFeature columnsInput functionsMNIST using TensorFlow Estimator APIPredicting house price using linear regressionClassification tasks and decision boundariesLogistic regressionLogistic regression on the MNIST 103107108Chapter 4: Convolutional Neural Networks109Deep Convolutional Neural Network (DCNN)Local receptive fieldsShared weights and biasA mathematical exampleConvnets in TensorFlow 2.xPooling layers110110111111112113Max poolingAverage poolingConvNets summary113113113An example of DCNN ‒ LeNetLeNet code in TensorFlow 2.0Understanding the power of deep learningRecognizing CIFAR-10 images with deep learningImproving the CIFAR-10 performance with a deeper networkImproving the CIFAR-10 performance with data augmentationPredicting with CIFAR-10Very deep convolutional networks for large-scale image recognitionRecognizing cats with a VGG16 Net[ iii ]114114121122125128130132134

Table of ContentsUtilizing tf.keras built-in VGG16 Net moduleRecycling prebuilt deep learning models for extracting featuresSummaryReferencesChapter 5: Advanced Convolutional Neural NetworksComputer visionComposing CNNs for complex tasksClassification and localizationSemantic segmentationObject detectionInstance ifying Fashion-MNIST with a tf.keras - estimator modelRun Fashion-MNIST the tf.keras - estimator model on GPUsDeep Inception-v3 Net used for transfer learningTransfer learning for classifying horses and humansApplication Zoos with tf.keras and TensorFlow HubKeras applicationsTensorFlow Hub147150151154157158158Other CNN architectures159AlexNetResidual networksHighwayNets and DenseNetsXception159159160160Answering questions about images (VQA)Style transferContent distanceStyle distance162165166167Creating a DeepDream networkInspecting what a network has learnedVideoClassifying videos with pretrained nets in six different waysTextual documentsUsing a CNN for sentiment analysisAudio and musicDilated ConvNets, WaveNet, and NSynthA summary of convolution operationsBasic convolutional neural networks (CNN or ConvNet)Dilated convolutionTransposed arable convolutionDepthwise convolutionDepthwise separable convolutionCapsule networks184185185185[ iv ]

Table of ContentsSo what is the problem with CNNs?So what is new with Capsule networks?SummaryReferencesChapter 6: Generative Adversarial NetworksWhat is a GAN?MNIST using GAN in TensorFlowDeep convolutional GAN (DCGAN)DCGAN for MNIST digitsSome interesting GAN architecturesSRGANCycleGANInfoGANCool applications of GANsCycleGAN in TensorFlow 2.0SummaryReferencesChapter 7: Word 14218228228231Word embedding ‒ origins and fundamentalsDistributed representationsStatic embeddingsWord2VecGloVeCreating your own embedding using gensimExploring the embedding space with gensimUsing word embeddings for spam detectionGetting the dataMaking the data ready for useBuilding the embedding matrixDefine the spam classifierTrain and evaluate the modelRunning the spam detectorNeural embeddings – not just for wordsItem2Vecnode2vecCharacter and subword embeddingsDynamic embeddingsSentence and paragraph embeddingsLanguage model-based embeddingsUsing BERT as a feature 0251252253253259260262264267

Table of ContentsFine-tuning BERTClassifying with BERT ‒ command lineUsing BERT as part of your own networkSummaryReferences269270271275275Chapter 8: Recurrent Neural Networks279The basic RNN cellBackpropagation through time (BPTT)Vanishing and exploding gradientsRNN cell variantsLong short-term memory (LSTM)Gated recurrent unit (GRU)Peephole LSTMRNN variantsBidirectional RNNsStateful RNNsRNN topologiesExample ‒ One-to-Many – learning to generate textExample ‒ Many-to-One – Sentiment AnalysisExample ‒ Many-to-Many – POS taggingEncoder-Decoder architecture – seq2seqExample ‒ seq2seq without attention for machine translationAttention mechanismExample ‒ seq2seq with attention for machine translationTransformer architectureSummaryReferencesChapter 9: 0307316318328330336340340345Introduction to autoencodersVanilla autoencodersTensorFlow Keras layers ‒ defining custom layersReconstructing handwritten digits using an autoencoderSparse autoencoderDenoising autoencodersClearing images using a Denoising autoencoderStacked autoencoderConvolutional autoencoder for removing noise from imagesKeras autoencoder example ‒ sentence vectorsSummaryReferences[ vi ]345347348350354356357360360365373374

Table of ContentsChapter 10: Unsupervised LearningPrincipal component analysisPCA on the MNIST datasetTensorFlow Embedding APIK-means clusteringK-means in TensorFlow 2.0Variations in k-meansSelf-organizing mapsColour mapping using SOMRestricted Boltzmann machinesReconstructing images using RBMDeep belief networksVariational 384387392393397399404405Chapter 11: Reinforcement LearningIntroductionRL lingoDeep reinforcement learning algorithmsReinforcement success in recent yearsIntroduction to OpenAI GymRandom agent playing BreakoutDeep Q-NetworksDQN for CartPoleDQN to play a game of AtariDQN variantsDouble DQNDueling 34Deep deterministic policy gradientSummaryReferences434436436Chapter 12: TensorFlow and Cloud439Deep learning on cloudMicrosoft AzureAmazon Web Services (AWS)Google Cloud Platform (GCP)IBM CloudVirtual machines on cloudEC2 on AmazonCompute Instance on GCP439440442444447447448450[ vii ]

Table of ContentsVirtual machine on Microsoft AzureJupyter Notebooks on cloudSageMakerGoogle ColaboratoryMicrosoft Azure NotebooksTensorFlow Extended for productionTFX PipelinesTFX pipeline componentsTFX librariesTensorFlow EnterpriseSummaryReferencesChapter 13: TensorFlow for Mobile and IoT and TensorFlow.jsTensorFlow MobileTensorFlow LiteQuantizationFlatBuffersMobile converterMobile optimized interpreterSupported platformsArchitectureUsing TensorFlow LiteA generic example of applicationUsing GPUs and acceleratorsAn example of applicationPretrained models in TensorFlow LiteImage classificationObject detectionPose estimationSmart replySegmentationStyle transferText classificationQuestion and answeringA note about using mobile GPUsAn overview of federated learning at the edgeTensorFlow FL APIsTensorFlow.jsVanilla TensorFlow.jsConverting models[ viii 472473474476478478485

Table of ContentsPretrained modelsNode.jsSummaryReferences485488489489Chapter 14: An introduction to AutoMLWhat is AutoML?Achieving AutoMLAutomatic data preparationAutomatic feature engineeringAutomatic model generationAutoKerasGoogle Cloud AutoMLUsing Cloud AutoML ‒ Tables solutionUsing Cloud AutoML ‒ Vision solutionUsing Cloud AutoML ‒ Text Classification solutionUsing Cloud AutoML ‒ Translation solutionUsing Cloud AutoML ‒ Video Intelligence Classification solutionCostBringing Google AutoML to KaggleSummaryReferencesChapter 15: The Math Behind Deep LearningHistorySome mathematical toolsDerivatives and gradients everywhereGradient descentChain ruleA few differentiation rulesMatrix operationsActivation functionsDerivative of the sigmoidDerivative of tanhDerivative of ReLUBackpropagationForward stepBackstepCase 1 – From hidden layer to output layerCase 2 ‒ From hidden layer to hidden layerLimit of backpropagationCross entropy and its 553554556557561561[ ix ]

Table of ContentsBatch gradient descent, stochastic gradient descent, and mini-batchBatch Gradient Descent (BGD)Stochastic Gradient Descent (SGD)Mini-Batch Gradient Descent (MBGD)563564564564Thinking about backpropagation and convnetsThinking about backpropagation and RNNsA note on TensorFlow and automatic differentiationSummaryReferencesChapter 16: Tensor Processing UnitC/G/T processing unitsCPUs and GPUsTPUsThree generations of TPUs and Edge TPUFirst-generation TPUSecond-generation TPUThird-generation TPUEdge TPUTPU performanceHow to use TPUs with ColabChecking whether TPUs are availableLoading data with tf.dataBuilding a model and loading it into the TPUUsing pretrained TPU modelsUsing TensorFlow 2.1 and nightly buildSummaryReferencesOther Books You May 78578580580581581584587588589591597[x]

PrefaceDeep Learning with TensorFlow 2 and Keras, Second Edition is a concise yet thoroughintroduction to modern neural networks, artificial intelligence, and deep learningtechnologies designed especially for software engineers and data scientists. The bookis the natural follow-up of the books Deep Learning with Keras [1] and TensorFlow 1.xDeep Learning Cookbook [2] previously written by the same authors.MissionThis book provides a very detailed panorama of the evolution of learningtechnologies during the past six years. The book presents dozens of workingdeep neural networks coded in Python using TensorFlow 2.0, a modular networklibrary based on Keras-like [1] APIs.You are introduced step-by-step to supervised learning algorithms such as simplelinear regression, classical multilayer perceptrons, and more sophisticated deepconvolutional networks and generative adversarial networks. In addition, the bookcovers unsupervised learning algorithms such as autoencoders and generativenetworks. Recurrent networks and Long Short-Term Memory (LSTM) networksare also explained in detail. The book also includes a comprehensive introductionto deep reinforcement learning and it covers deep learning accelerators (GPUs andTPUs), cloud development, and multi-environment deployment on your desktop,on the cloud, on mobile/IoT devices, and on your browser.Practical applications include code for text classification into predefined categories,syntactic analysis, sentiment analysis, synthetic generation of text, and parts-ofspeech tagging. Image processing is also explored, with recognition of handwrittendigit images, classification of images into different categories, and advanced objectrecognition with related image annotations.[ xi ]

PrefaceSound analysis comprises the recognition of discrete speech from multiple speakers.Generation of images using Autoencoders and GANs is also covered. Reinforcementlearning is used to build a deep Q-learning network capable of learningautonomously. Experiments are the essence of the book. Each net is augmented bymultiple variants that progressively improve the learning performance by changingthe input parameters, the shape of the network, loss functions, and algorithms usedfor optimizations. Several comparisons between training on CPUs, GPUs and TPUsare also provided. The book introduces you to the new field of AutoML where deeplearning models are used to learn how to efficiently and automatically learn how tobuild deep learning models. One advanced chapter is devoted to the mathematicalfoundation behind machine learning.Machine learning, artificial intelligence,and the deep learning CambrianexplosionArtificial intelligence (AI) lays the ground for everything this book discusses.Machine learning (ML) is a branch of AI, and Deep learning (DL) is in turna subset within ML. This section will briefly discuss these three concepts, whichyou will regularly encounter throughout the rest of this book.AI denotes any activity where machines mimic intelligent behaviors typicallyshown by humans. More formally, it is a research field in which machines aimto replicate cognitive capabilities such as learning behaviors, proactive interactionwith the environment, inference and deduction, computer vision, speech recognition,problem solving, knowledge representation, and perception. AI builds on elementsof computer science, mathematics, and statistics, as well as psychology and othersciences studying human behaviors. There are multiple strategies for building AI.During the 1970s and 1980s, ‘expert’ systems became extremely popular. The goalof these systems was to solve complex problems by representing the knowledgewith a large number of manually defined if–then rules. This approach worked forsmall problems on very specific domains, but it was not able to scale up for largerproblems and multiple domains. Later, AI focused more and more on methods basedon statistical methods that are part of ML.ML is a subdiscipline of AI that focuses on teaching computers how to learn withoutthe need to be programmed for specific tasks. The key idea behind ML is that it ispossible to create algorithms that learn from, and make predictions on, data. Thereare three different broad categories of ML:[ xii ]

Preface Supervised learning, in which the machine is presented with input dataand a desired output, and the goal is to learn from those training examplesin such a way that meaningful predictions can be made for data that themachine has never observed before. Unsupervised learning, in which the machine is presented with input dataonly, and the machine has to subsequently find some meaningful structureby itself, with no external supervision or input. Reinforcement learning, in which the machine acts as an agent, interactingwith the environment. The machine is provided with "rewards" for behavingin a desired manner, and "penalties" for behaving in an undesired manner.The machine attempts to maximize rewards by learning to develop itsbehavior accordingly.DL took the world by storm in 2012. During that year, the ImageNet 2012 challenge[3] was launched with the goal of predicting the content of photographs using asubset of a large hand-labeled dataset. A deep learning model named AlexNet[4] achieved a top-5 error rate of 15.3%, a significant improvement with respect toprevious state-of-the-art results. According to the Economist [5], "Suddenly peoplestarted to pay attention, not just within the AI community but across the technologyindustry as a whole." Since 2012, we have seen constant progress [5] (see Figure 1)with several models classifying ImageNet photography, with an error rate of lessthan 2%; better than the estimated human error rate at 5.1%:Figure 1: Top 5 accuracy achieved by different deep learning models on ImageNet 2012[ xiii ]

PrefaceThat was only the beginning. Today, DL techniques are successfully applied inheterogeneous domains including, but not limited to: healthcare, environment,green energy, computer vision, text analysis, multimedia, finance, retail, gaming,simulation, industry, robotics, and self-driving cars. In each of these domains, DLtechniques can solve problems with a level of accuracy that was not possible usingprevious methods.It is worth noting that interest in DL is also increasing. According to the State ofDeep Learning H2 2018 Review [9] "Every 20 minutes, a new ML paper is born.The growth rate of machine learning papers has been around 3.5% a month [.]around a 50% growth rate annually." During the past three years, it seems like weare living during a Cambrian explosion for DL, with the number of articles on ourarXiv growing faster than Moore's Law (see Figure 2). Still, according to the reviewthis "gives you a sense that people believe that this is where the future value incomputing is going to come from":Figure 2: ML papers on arXiv appears to be growing faster than Moore's Law (source: jor-advances-review.html)arXiv is a repository of electronic preprints approved for postingafter moderation, but not full peer review.[ xiv ]

PrefaceThe complexity of deep learning models is also increasing. ResNet-50 is an imagerecognition model (see chapters 4 and 5), with about 26 million parameters. Everysingle parameter is a weight used to fine-tune the model. Transformers, gpt1, bert, and gpt-2 [7] are natural language processing (see Chapter 8, RecurrentNeural Networks) models able to perform a variety of tasks on text. These modelsprogressively grew from 340 million to 1.5 billion parameters. Recently, Nvidiaclaimed that it has been able to train the largest-known model, with 8.3 billionparameters, in just 53 minutes. This training allowed Nvidia to build one of the mostpowerful models to process textual information us/).Figure 3: Growth in number of parameters for various deep learning modelsBesides that, computational capacity is significantly increasing. GPUs and TPUs(Chapter 16, Tensor Processing Unit) are deep learning accelerators that have made itpossible to train large models in a very short amount of time. TPU3s, announced onMay 2018, are about twice as powerful (360 teraflops) as the TPU2s announced onMay 2017. A full TPU3 pod can deliver more than 100 petaflops of machine learningperformance, while TPU2 pods can get to 11.5 teraflops of performance.[ xv ]

PrefaceAn improvement of 10x per pod (see Figure 4) was achieved in one year only, whichallows faster training:Figure 4: TPU accelerators performance in petaflopsHowever, DL's growth is not only in terms of better accuracy, more research papers,larger models, and faster accelerators. There are additional trends that have beenobserved over the last four years.First, the availability of flexible programming fra

Understanding TensorFlow 1.x 51 TensorFlow 1.x computational graph program structure 51 Computational graphs 52 Working with constants, variables, and placeholders 54 Examples of operations 55 Constants 55 Sequences 56 Random tensors 56 Variables 57 An example of TensorFlow 1.x in TensorFlow