TensorFlow! An Introduction To - Stanford University

Transcription

An introduction toTensorFlow!Chip Huyen (chiphuyen@cs.stanford.edu)CS224N1/25/20181

2

AgendaWhy TensorFlowGraphs and SessionsLinear Regressiontf.dataword2vecStructuring your modelManaging experiments3

Why TensorFlow? Flexibility ScalabilityPopularity4

import tensorflow as tf5

Graphs and Sessions6

Data Flow GraphsTensorFlow separates definition of computations from their executionGraph from TensorFlow for Machine Intelligence7

Data Flow GraphsPhase 1: assemble a graphPhase 2: use a session to execute operations in the graph.Graph from TensorFlow for Machine Intelligence8

Data Flow GraphsPhase 1: assemble a graphThis might change in thefuture with eager mode!!Phase 2: use a session to execute operations in the graph.Graph from TensorFlow for Machine Intelligence9

What’s a tensor?10

What’s a tensor?An n-dimensional array0-d tensor: scalar (number)1-d tensor: vector2-d tensor: matrixand so on11

Data Flow Graphsimport tensorflow as tfa tf.add(3, 5)Visualized by TensorBoard12

Data Flow Graphsimport tensorflow as tfa tf.add(3, 5)Visualized by TensorBoardWhy x, y?TF automatically names the nodes when you don’texplicitly name them.x 3y 513

Data Flow Graphsimport tensorflow as tfa tf.add(3, 5)Interpreted?3Nodes: operators, variables, and constantsEdges: tensors5aTensors are data.TensorFlow tensor flow data flow(I know, mind blown)14

Data Flow Graphsimport tensorflow as tfa tf.add(3, 5)print(a)35a Tensor("Add:0", shape (), dtype int32)(Not 8)15

How to get the value of a?Create a session, assign it to variable sess so we can call it laterWithin the session, evaluate the graph to fetch the value of a16

How to get the value of a?Create a session, assign it to variable sess so we can call it laterWithin the session, evaluate the graph to fetch the value of aimport tensorflow as tfa tf.add(3, 5)sess tf.Session()print(sess.run(a))sess.close()The session will look at the graph, trying to think: hmm, how can I get the value of a,then it computes all the nodes that leads to a.17

How to get the value of a?Create a session, assign it to variable sess so we can call it laterWithin the session, evaluate the graph to fetch the value of aimport tensorflow as tfa tf.add(3, 5)sess tf.Session()print(sess.run(a))sess.close() 88The session will look at the graph, trying to think: hmm, how can I get the value of a,then it computes all the nodes that leads to a.18

How to get the value of a?Create a session, assign it to variable sess so we can call it laterWithin the session, evaluate the graph to fetch the value of aimport tensorflow as tfa tf.add(3, 5)sess tf.Session()with tf.Session() as sess:print(sess.run(a))sess.close()819

tf.Session()A Session object encapsulates the environment in which Operation objects areexecuted, and Tensor objects are evaluated.20

tf.Session()A Session object encapsulates the environment in which Operation objects areexecuted, and Tensor objects are evaluated.Session will also allocate memory to store the current values of variables.21

More graphVisualized by TensorBoardx 2y 3op1 tf.add(x, y)op2 tf.multiply(x, y)op3 tf.pow(op2, op1)with tf.Session() as sess:op3 sess.run(op3)22

Subgraphsx 2y 3add op tf.add(x, y)mul op tf.multiply(x, y)useless tf.multiply(x, add op)pow op tf.pow(add op, mul op)with tf.Session() as sess:z sess.run(pow op)uselessadd oppow opmul opBecause we only want the value of pow op and pow op doesn’tdepend on useless, session won’t compute value of useless save computation23

SubgraphsPossible to break graphs into severalchunks and run them parallellyacross multiple CPUs, GPUs, TPUs,or other devicesExample: AlexNetGraph from Hands-On Machine Learning with Scikit-Learn and TensorFlow24

Distributed ComputationTo put part of a graph on a specific CPU or GPU:# Creates a graph.with tf.device('/gpu:2'):a tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], name 'a')b tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], name 'b')c tf.multiply(a, b)# Creates a session with log device placement set to True.sess tf.Session(config tf.ConfigProto(log device placement True))# Runs the op.print(sess.run(c))25

26

Why graphs1.Save computation. Only run subgraphs that leadto the values you want to fetch.27

Why graphs1.2.Save computation. Only run subgraphs that leadto the values you want to fetch.Break computation into small, differential piecesto facilitate auto-differentiation28

Why graphs1.2.3.Save computation. Only run subgraphs that leadto the values you want to fetch.Break computation into small, differential piecesto facilitate auto-differentiationFacilitate distributed computation, spread thework across multiple CPUs, GPUs, TPUs, or otherdevices29

Why graphs1.2.3.4.Save computation. Only run subgraphs that leadto the values you want to fetch.Break computation into small, differential piecesto facilitate auto-differentiationFacilitate distributed computation, spread thework across multiple CPUs, GPUs, TPUs, or otherdevicesMany common machine learning models aretaught and visualized as directed graphsA neural net graph from Stanford’sCS224N course30

TensorBoard31

Your first TensorFlow programimport tensorflow as tfa tf.constant(2, name 'a')b tf.constant(3, name 'b')x tf.add(a, b, name 'add')with tf.Session() as sess:print(sess.run(x))32

Visualize it with TensorBoardimport tensorflow as tfa tf.constant(2, name 'a')b tf.constant(3, name 'b')x tf.add(a, b, name 'add')Create the summary writer after graphdefinition and before running your sessionwriter tf.summary.FileWriter('./graphs', tf.get default graph())with tf.Session() as sess:# writer tf.summary.FileWriter('./graphs', sess.graph)print(sess.run(x))writer.close() # close the writer when you’re done using it‘graphs’ or any location where you want tokeep your event files33

Run itGo to terminal, run: python [yourprogram].py tensorboard --logdir "./graphs" --port 60066006 or any port you wantThen open your browser and go to: http://localhost:6006/34

35

Constants, Sequences,Variables, Ops36

Constantsimport tensorflow as tfa tf.constant([2, 2], name 'a')b tf.constant([[0, 1], [2, 3]], name 'b')x tf.multiply(a, b, name 'mul')Broadcasting similar to NumPywith tf.Session() as sess:print(sess.run(x))## [[0 2][4 6]]37

Tensors filled with a specific valuetf.zeros([2, 3], tf.int32) [[0, 0, 0], [0, 0, 0]]# input tensor is [[0, 1], [2, 3], [4, 5]]Similar to NumPytf.zeros like(input tensor) [[0, 0], [0, 0], [0, 0]]tf.fill([2, 3], 8) [[8, 8, 8], [8, 8, 8]]38

Constants as sequencestf.lin space(start, stop, num, name None)tf.lin space(10.0, 13.0, 4) [10. 11. 12. 13.]tf.range(start, limit None, delta 1, dtype None, name 'range')tf.range(3, 18, 3) [3 6 9 12 15]tf.range(5) [0 1 2 3 4]NOT THE SAME AS NUMPY SEQUENCESTensor objects are not iterablefor in tf.range(4): # TypeError39

Randomly Generated Constantstf.random normaltf.truncated normaltf.random uniformtf.random shuffletf.random croptf.multinomialtf.random gamma40

Randomly Generated Constantstf.set random seed(seed)41

TF vs NP Data TypesTensorFlow integrates seamlessly with NumPytf.int32 np.int32# TrueCan pass numpy types to TensorFlow opstf.ones([2, 2], np.float32)# [[1.0 1.0], [1.0 1.0]]For tf.Session.run(fetches): if the requested fetch is a Tensor , output will be a NumPy ndarray.sess tf.Session()a tf.zeros([2, 3], np.int32)print(type(a))# class 'tensorflow.python.framework.ops.Tensor' a out sess.run(a)print(type(a))# class 'numpy.ndarray' 42

Use TF DType when possible Python native types: TensorFlow has to infer Python type43

Use TF DType when possible Python native types: TensorFlow has to infer Python typeNumPy arrays: NumPy is not GPU compatible44

What’s wrong with constants?Not trainable45

Constants are stored in graph definitionmy const tf.constant([1.0, 2.0], name "my const")with tf.Session() as sess:print(sess.graph.as graph def())46

Constants are stored in graph definitionThis makes loading graphs expensive when constants are big47

Constants are stored in graph definitionThis makes loading graphs expensive when constants are bigOnly use constants for primitive types.Use variables or readers for more data that requires more memory48

Variables#smWcreate variables with tf.Variable tf.Variable(2, name "scalar") tf.Variable([[0, 1], [2, 3]], name "matrix") tf.Variable(tf.zeros([784,10]))#smWcreate variables with tf.get variable tf.get variable("scalar", initializer tf.constant(2)) tf.get variable("matrix", initializer tf.constant([[0, 1], [2, 3]])) tf.get variable("big matrix", shape (784, 10), initializer tf.zeros initializer())49

You have to initialize your variablesThe easiest way is initializing all variables at once:with tf.Session() as sess:sess.run(tf.global variables initializer())Initializer is an op. You need to execute itwithin the context of a session50

You have to initialize your variablesThe easiest way is initializing all variables at once:with tf.Session() as sess:sess.run(tf.global variables initializer())Initialize only a subset of variables:with tf.Session() as sess:sess.run(tf.variables initializer([a, b]))51

You have to initialize your variablesThe easiest way is initializing all variables at once:with tf.Session() as sess:sess.run(tf.global variables initializer())Initialize only a subset of variables:with tf.Session() as sess:sess.run(tf.variables initializer([a, b]))Initialize a single variableW tf.Variable(tf.zeros([784,10]))with tf.Session() as sess:sess.run(W.initializer)52

Eval() a variable# W is a random 700 x 100 variable objectW tf.Variable(tf.truncated normal([700, 10]))with tf.Session() as sess:sess.run(W.initializer)print(W) Tensor("Variable/read:0", shape (700, 10), dtype float32)53

tf.Variable.assign()W tf.Variable(10)W.assign(100)with tf.Session() as sess:sess.run(W.initializer)print(W.eval())# ?54

tf.Variable.assign()W tf.Variable(10)W.assign(100)with tf.Session() as sess:sess.run(W.initializer)print(W.eval())# 10Ugh, why?55

tf.Variable.assign()W tf.Variable(10)W.assign(100)with tf.Session() as sess:sess.run(W.initializer)print(W.eval())# 10W.assign(100) creates an assign op.That op needs to be executed in a sessionto take effect.56

tf.Variable.assign()W tf.Variable(10)W.assign(100)with tf.Session() as sess:sess.run(W.initializer)print(W.eval())# 10-------W tf.Variable(10)assign op W.assign(100)with tf.Session() as sess:sess.run(W.initializer)sess.run(assign op)print(W.eval())# 10057

Placeholder58

A quick reminderA TF program often has 2 phases:1. Assemble a graph2. Use a session to execute operations in the graph.59

PlaceholdersA TF program often has 2 phases:1. Assemble a graph2. Use a session to execute operations in the graph. Assemble the graph first without knowing the values needed for computation60

PlaceholdersA TF program often has 2 phases:1. Assemble a graph2. Use a session to execute operations in the graph. Assemble the graph first without knowing the values needed for computationAnalogy:Define the function f(x, y) 2 * x y without knowing value of x or y.x, y are placeholders for the actual values.61

Why placeholders?We, or our clients, can later supply their own data when theyneed to execute the computation.62

Placeholderstf.placeholder(dtype, shape None, name None)# create a placeholder for a vector of 3 elements, type tf.float32a tf.placeholder(tf.float32, shape [3])b tf.constant([5, 5, 5], tf.float32)# use the placeholder as you would a constant or a variablec a b # short for tf.add(a, b)with tf.Session() as sess:print(sess.run(c))# ?63

Placeholderstf.placeholder(dtype, shape None, name None)# create a placeholder for a vector of 3 elements, type tf.float32a tf.placeholder(tf.float32, shape [3])b tf.constant([5, 5, 5], tf.float32)# use the placeholder as you would a constant or a variablec a b # short for tf.add(a, b)with tf.Session() as sess:print(sess.run(c))# InvalidArgumentError: a doesn’t an actual value64

Supplement the values to placeholders usinga dictionary65

Placeholderstf.placeholder(dtype, shape None, name None)# create a placeholder for a vector of 3 elements, type tf.float32a tf.placeholder(tf.float32, shape [3])b tf.constant([5, 5, 5], tf.float32)# use the placeholder as you would a constant or a variablec a b # short for tf.add(a, b)with tf.Session() as sess:print(sess.run(c, feed dict {a: [1, 2, 3]}))# the tensor a is the key, not the string ‘a’# [6, 7, 8]66

Placeholderstf.placeholder(dtype, shape None, name None)# create a placeholder for a vector of 3 elements, type tf.float32a tf.placeholder(tf.float32, shape [3])b tf.constant([5, 5, 5], tf.float32)# use the placeholder as you would a constant or a variablec a b # short for tf.add(a, b)with tf.Session() as sess:print(sess.run(c, feed dict {a: [1, 2, 3]}))# [6, 7, 8]Quirk:shape None means that tensor of anyshape will be accepted as value forplaceholder.shape None is easy to construct graphsand great when you have differentbatch sizes, but nightmarish fordebugging67

Placeholderstf.placeholder(dtype, shape None, name None)# create a placeholder of type float 32-bit, shape is a vector of 3 elementsa tf.placeholder(tf.float32, shape [3])# create a constant of type float 32-bit, shape is a vector of 3 elementsb tf.constant([5, 5, 5], tf.float32)# use the placeholder as you would a constant or a variablec a b # Short for tf.add(a, b)with tf.Session() as sess:print(sess.run(c, {a: [1, 2, 3]}))Quirk:shape None also breaks all followingshape inference, which makes manyops not work because they expectcertain rank.# [6, 7, 8]68

Placeholders are valid opstf.placeholder(dtype, shape None, name None)# create a placeholder of type float 32-bit, shape is a vector of 3 elementsa tf.placeholder(tf.float32, shape [3])# create a constant of type float 32-bit, shape is a vector of 3 elementsb tf.constant([5, 5, 5], tf.float32)# use the placeholder as you would a constant or a variablec a b # Short for tf.add(a, b)with tf.Session() as sess:print(sess.run(c, {a: [1, 2, 3]}))# [6, 7, 8]69

What if want to feed multiple data points in?You have to do it one at a timewith tf.Session() as sess:for a value in list of values for a:print(sess.run(c, {a: a value}))70

Extremely helpful for testingFeed in dummy values to test parts of a large graph71

Linear Regressionin TensorFlow72

Model the linear relationship between: dependent variable Y explanatory variables X73

Visualization made by Google, based on data collected by World Bank74

World Development Indicators datasetX: birth rateY: life expectancy190 countries75

WantFind a linear relationship between X and Yto predict Y from X76

ModelInference: Y predicted w * X bMean squared error: E[(y - y predicted) 2]77

Interactive Codingbirth life 2010.txt78

Interactive Codinglinreg starter.pybirth life 2010.txt79

Phase 1: Assemble our graph80

Step 1: Read in dataI already did that for you81

Step 2: Create placeholders forinputs and labelstf.placeholder(dtype, shape None, name None)82

Step 3: Create weight and biastf.get variable(name,No need to specify shape ifusing constant initializershape None,dtype None,initializer None, )83

Step 4: InferenceY predicted w * X b84

Step 5: Specify loss functionloss tf.square(Y - Y predicted, name 'loss')85

Step 6: Create optimizeropt tf.train.GradientDescentOptimizer(learning rate 0.001)optimizer opt.minimize(loss)86

Phase 2: Train our modelStep 1: Initialize variablesStep 2: Run optimizer(use a feed dict to feed data into X and Y placeholders)87

Write log files using a FileWriterwriter tf.summary.FileWriter('./graphs/linear reg', sess.graph)88

See it on TensorBoardStep 1: python linreg starter.pyStep 2: tensorboard --logdir './graphs'89

90

91

tf.data92

PlaceholderPro: put the data processing outside TensorFlow, making it easy todo in PythonCons: users often end up processing their data in a single threadand creating data bottleneck that slows execution down.93

Placeholderdata, n samples utils.read birth life data(DATA FILE)X tf.placeholder(tf.float32, name 'X')Y tf.placeholder(tf.float32, name 'Y') with tf.Session() as sess: # Step 8: train the modelfor i in range(100): # run 100 epochsfor x, y in data:# Session runs train op to minimize losssess.run(optimizer, feed dict {X: x, Y:y})94

tf.dataInstead of doing inference with placeholders and feeding in datalater, do inference directly with data95

tf.datatf.data.Datasettf.data.Iterator96

Store data in tf.data.Dataset tf.data.Dataset.from tensor slices((features, labels))tf.data.Dataset.from generator(gen, output types, output shapes)97

Store data in tf.data.Datasettf.data.Dataset.from tensor slices((features, labels))dataset tf.data.Dataset.from tensor slices((data[:,0], data[:,1]))98

Store data in tf.data.Datasettf.data.Dataset.from tensor slices((features, labels))dataset tf.data.Dataset.from tensor slices((data[:,0], data[:,1]))print(dataset.output types)# (tf.float32, tf.float32)print(dataset.output shapes)# (TensorShape([]), TensorShape([]))99

Can also create Dataset from files (filenames)100

tf.data.IteratorCreate an iterator to iterate through samples in Dataset101

tf.data.Iterator iterator dataset.make one shot iterator()iterator dataset.make initializable iterator()102

tf.data.Iterator iterator dataset.make one shot iterator()Iterates through the dataset exactly once. No need to initialization. iterator dataset.make initializable iterator()Iterates through the dataset as many times as we want. Need to initialize with each epoch.103

tf.data.Iteratoriterator dataset.make one shot iterator()X, Y iterator.get next()# X is the birth rate, Y is the life expectancywith tf.Session() as sess:print(sess.run([X, Y]))print(sess.run([X, Y]))print(sess.run([X, Y]))# [1.822, 74.82825]# [3.869, 70.81949]# [3.911, 72.15066]104

tf.data.Iteratoriterator dataset.make initializable iterator().for i in range(100):sess.run(iterator.initializer)total loss 0try:while True:sess.run([optimizer])except tf.errors.OutOfRangeError:pass105

Handling data in TensorFlowdataset dataset.shuffle(1000)dataset dataset.repeat(100)dataset dataset.batch(128)dataset dataset.map(lambda x: tf.one hot(x, 10))# convert each element of dataset to one hot vector106

Does tf.data really perform better?107

Does tf.data really perform better?With placeholder: 9.05271519 secondsWith tf.data: 6.12285947 seconds108

Should we always use tf.data? For prototyping, feed dict can be faster and easier to write (pythonic)tf.data is tricky to use when you have complicated preprocessing or multipledata sourcesNLP data is normally just a sequence of integers. In this case, transferring thedata over to GPU is pretty quick, so the speedup of tf.data isn't that large109

How does TensorFlow know what variablesto update?110

Optimizers111

Optimizeroptimizer tf.train.GradientDescentOptimizer(learning rate 0.01).minimize(loss), l sess.run([optimizer, loss], feed dict {X: x, Y:y})112

Optimizeroptimizer tf.train.GradientDescentOptimizer(learning rate 0.001).minimize(loss), l sess.run([optimizer, loss], feed dict {X: x, Y:y})Session looks at all trainable variables that loss depends on and update them113

OptimizerSession looks at all trainable variables that optimizer depends on and update them114

Trainable variablestf.Variable(initial value None, trainable True,.)Specify if a variable should be trained or notBy default, all variables are trainable115

List of optimizers in dOptimizertf.train.MomentumOptimizerUsually Adam works out-of-the-box better rtf.train.RMSPropOptimizer.116

word2vec skip-gramin TensorFlow

Embedding Lookup118Illustration by Chris McCormick

Embedding Lookuptf.nn.embedding lookup(params, ids, partition strategy 'mod', name None,validate indices True, max norm None)119Illustration by Chris McCormick

Negative sampling vs NCE Negative sampling is a simplified model of Noise Contrastive Estimation (NCE)NCE guarantees approximation to softmax. Negative sampling doesn’t120

NCE Losstf.nn.nce loss(weights,biases,labels,inputs,num sampled,num classes, )121

Interactive Codingword2vec utils.pyword2vec starter.py122

Embedding visualization

Interactive Codingword2vec visualize.py124

Visualize vector representation of anything125Visualization from Chris Olah’s blog

126

Name scopeTensorFlow doesn’t know what nodes should begrouped together, unless you tell it to127

Name scopeGroup nodes together with tf.name scope(name)with tf.name scope(name of that scope):# declare op 1# declare op 2# .128

Name scopewith tf.name scope('data'):iterator dataset.make initializable iterator()center words, target words iterator.get next()with tf.name scope('embed'):embed matrix tf.get variable('embed matrix',shape [VOCAB SIZE, EMBED SIZE], .)embed tf.nn.embedding lookup(embed matrix, center words)with tf.name scope('loss'):nce weight tf.get variable('nce weight', shape [VOCAB SIZE, EMBED SIZE], .)nce bias tf.get variable('nce bias', initializer tf.zeros([VOCAB SIZE]))loss tf.reduce mean(tf.nn.nce loss(weights nce weight, biases nce bias, )with tf.name scope('optimizer'):optimizer tf.train.GradientDescentOptimizer(LEARNING RATE).minimize(loss)129

TensorBoard130

Variable scopeName scope vs variable scopetf.name scope() vs tf.variable scope()131

Variable scopeName scope vs variable scopeVariable scope facilitates variable sharing132

Variable sharing: The problemdef two hidden layers(x):w1 tf.Variable(tf.random normal([100, 50]), name 'h1 weights')b1 tf.Variable(tf.zeros([50]), name 'h1 biases')h1 tf.matmul(x, w1) b1w2 tf.Variable(tf.random normal([50, 10]), name 'h2 weights')b2 tf.Variable(tf.zeros([10]), name '2 biases')logits tf.matmul(h1, w2) b2return logits133

Variable sharing: The problemdef two hidden layers(x):w1 tf.Variable(tf.random normal([100, 50]), name 'h1 weights')b1 tf.Variable(tf.zeros([50]), name 'h1 biases')h1 tf.matmul(x, w1) b1w2 tf.Variable(tf.random normal([50, 10]), name 'h2 weights')b2 tf.Variable(tf.zeros([10]), name '2 biases')logits tf.matmul(h1, w2) b2return logitsWhat will happen if wemake these two calls?logits1 two hidden layers(x1)logits2 two hidden layers(x2)134

Sharing Variable: The problemTwo sets ofvariables arecreated.You want allyour inputs touse the sameweights andbiases!135

tf.get variable()tf.get variable( name , shape , initializer )If a variable with name already exists, reuse itIf not, initialize it with shape using initializer 136

tf.get variable()def two hidden layers(x):assert x.shape.as list() [200, 100]w1 tf.get variable("h1 weights", [100, 50], initializer tf.random normal initializer())b1 tf.get variable("h1 biases", [50], initializer tf.constant initializer(0.0))h1 tf.matmul(x, w1) b1assert h1.shape.as list() [200, 50]w2 tf.get variable("h2 weights", [50, 10], initializer tf.random normal initializer())b2 tf.get variable("h2 biases", [10], initializer tf.constant initializer(0.0))logits tf.matmul(h1, w2) b2return logitslogits1 two hidden layers(x1)logits2 two hidden layers(x2)137

tf.get variable()def two hidden layers(x):assert x.shape.as list() [200, 100]w1 tf.get variable("h1 weights", [100, 50], initializer tf.random normal initializer())b1 tf.get variable("h1 biases", [50], initializer tf.constant initializer(0.0))h1 tf.matmul(x, w1) b1assert h1.shape.as list() [200, 50]w2 tf.get variable("h2 weights", [50, 10], initializer tf.random normal initializer())b2 tf.get variable("h2 biases", [10], initializer tf.constant initializer(0.0))logits tf.matmul(h1, w2) b2return logitslogits1 two hidden layers(x1)logits2 two hidden layers(x2)ValueError: Variable h1 weights already exists,disallowed. Did you mean to set reuse True inVarScope?138

tf.variable scope()def two hidden layers(x):assert x.shape.as list() [200, 100]w1 tf.get variable("h1 weights", [100, 50], initializer tf.random normal initializer())b1 tf.get variable("h1 biases", [50], initializer tf.constant initializer(0.0))h1 tf.matmul(x, w1) b1assert h1.shape.as list() [200, 50]w2 tf.get variable("h2 weights", [50, 10], initializer tf.random normal initializer())b2 tf.get variable("h2 biases", [10], initializer tf.constant initializer(0.0))logits tf.matmul(h1, w2) b2return logitswith tf.variable scope('two layers') as scope:Put your variables within a scope and reuse allvariables within that scopelogits1 two hidden layers(x1)scope.reuse variables()logits2 two hidden layers(x2)139

tf.variable scope()Only one set ofvariables, all withinthe variable scope“two layers”They take in twodifferent inputs140

tf.variable scope()tf.variable scopeimplicitly creates aname scope141

Reusable code?def two hidden layers(x):assert x.shape.as list() [200, 100]w1 tf.get variable("h1 weights", [100, 50], initializer tf.random normal initializer())b1 tf.get variable("h1 biases", [50], initializer tf.constant initializer(0.0))h1 tf.matmul(x, w1) b1assert h1.shape.as list() [200, 50]w2 tf.get variable("h2 weights", [50, 10], initializer tf.random normal initializer())b2 tf.get variable("h2 biases", [10], initializer tf.constant initializer(0.0))logits tf.matmul(h1, w2) b2return logitswith tf.variable scope('two layers') as scope:logits1 two hidden layers(x1)scope.reuse variables()logits2 two hidden layers(x2)142

Layer ‘em updef fully connected(x, output dim, scope):with tf.variable scope(scope, reuse tf.AUTO REUSE) as scope:w tf.get variable("weights", [x.shape[1], output dim], initializer tf.random normal initializer())b tf.get variable("biases", [output dim], initializer tf.constant initializer(0.0))return tf.matmul(x, w) bdef two hidden layers(x):h1 fully connected(x, 50, 'h1')h2 fully connected(h1, 10, 'h2')Fetch variables if theyalready existElse, create themwith tf.variable scope('two layers') as scope:logits1 two hidden layers(x1)logits2 two hidden layers(x2)143

Layer ‘em up144

Manage Experiments

tf.train.Saversaves graph’s variables in binary files146

Saves sessions, not graphs!tf.train.Saver.save(sess, save path, global step None.)tf.train.Saver.restore(sess, save path)147

Save parameters after 1000 steps# define modelmodel SkipGramModel(params)# create a saver objectsaver tf.train.Saver()with tf.Session() as sess:for step in range(training steps):sess.run([optimizer])# save model every 1000 stepsif (step 1) % 1000 0:saver.save(sess,'checkpoint directory/model name',global step step)148

Specify the step at which the model is saved# define modelmodel SkipGramModel(params)# create a saver objectsaver tf.train.Saver()with tf.Session() as sess:for step in range(training steps):sess.run([optimizer])# save model every 1000 stepsif (step 1) % 1000 0:saver.save(sess,'checkpoint directory/model name',global step step)149

Global stepglobal step tf.Variable(0, dtype tf.int32, trainable False, name 'global step')Very common inTensorFlow program150

Global stepglobal step tf.Variable(0,dtype tf.int32,trainable False,name 'global step')optimizer tf.train.AdamOptimizer(lr).minimize(loss, global step global step)Need to tell optimizer to increment global stepThis can also help your optimizer know whento decay learning rate151

Your checkpoints are saved incheckpoint directory152

tf.train.SaverOnly save variables, not graphCheckpoints map variable names to tensors153

tf.train.SaverCan also choose to save certain variablesv1 tf.Variable(., name 'v1')v2 tf.Variable(., name 'v2')You can save your variables in one of three ways:saver tf.train.Saver({'v1': v1, 'v2': v2})saver tf.train.Saver([v1, v2])saver tf.train.Saver({v.op.name: v for v in [v1, v2]}) # similar to a dict154

Restore variablessaver.restore(sess, 'checkpoints/name of the checkpoint')e.g. saver.restore(sess, 'checkpoints/skip-gram-99999')Still need to first buildgraph155

Restore the latest checkpoint# check if there is checkpointckpt tf.train.get checkpoint state(os.path.dirname('checkpoints/checkpoint'))# check if there is a valid checkpoint pathif ckpt and ckpt.model checkpoint path:saver.restore(sess, ckpt.model checkpoint path)1.2.checkpoint file keeps track of the latestcheckpointrestore checkpoints only when there is a validcheckpoint path156

tf.summaryWhy matplotlib when you can summarize?157

tf.summaryVisualize our summary statistics during our mmary.image158

Step 1: create summarieswith tf.name scope("summaries"):tf.summary.scalar("loss", self.loss)tf.summary.scalar("accuracy", self.accuracy)tf.summary.histogram("histogram loss", self.loss)summary op tf.summary.merge all()merge them all into one summary op tomake managing them easier159

Step 2: run themloss batch, , summary sess.run([loss,optimizer,summary op])Like everything else in TF, summaries are ops.For the summaries to be built, you have to runit in a session160

Step 3: write summaries to filewriter.add summary(s

TensorFlow integrates seamlessly with NumPy tf.int32 np.int32 # True Can pass numpy types to TensorFlow ops tf.ones([2, 2], np.float32) # [[1.0 1.0], [1.0 1.0]] For tf.Session.run(fetches): if the requested fetch is a Tensor , output will be a NumPy nd