Feedforward Neural Networks

Transcription

Feedforward Neural NetworksDanna GurariUniversity of Colorado BoulderSpring 2022https://home.cs.colorado.edu/ rse.html

Review Last week: Binary classification applicationsEvaluating classification modelsBiological neurons: inspirationArtificial neuron: Perceptron Assignments (Canvas): Problem set 1 due earlier today (should receive grades in 1 week) Lab assignment 1 due in 1.5 weeks Questions?

Today’s Topics Motivation for neural networks: need non-linear models Neural network architecture: hidden layers Neural network architecture: activation functions Neural network architecture: output units Programming tutorial

Today’s Topics Motivation for neural networks: need non-linear models Neural network architecture: hidden layers Neural network architecture: activation functions Neural network architecture: output units Programming tutorial

Historical Context: Artificial NeuronsMachinelearningAIPerceptronTuring testFirst programmable machine1943 1945 1950191956195759First mathematicalmodel of neuron19691986Neural networks withMinsky & Paperteffective learning strategypublished a bookcalled “Perceptrons” todiscuss its limitations2012Wave 3: rise of“deep learning”

Recall: Vision for PerceptronFrank Rosenblatt(Psychologist)“[The perceptron is] the embryo of anelectronic computer that [the Navy] expectswill be able to walk, talk, see, write,reproduce itself and be conscious of itsexistence . [It] is expected to be finished inabout a year at a cost of 100,000.”1958 New York Times article: -of.htmlhttps://en.wikipedia.org/wiki/Frank Rosenblatt

Recall: PerceptronBiological Neuron:Artificial Neurons(e.g., Perceptron):Python Machine Learning; Raschka & 2.ipynb

Perceptron Limitation: XOR ProblemXOR “Exclusive Or”- Input: two binary values x1 and x2- Output:- 1, when exactly one input equals 1- 0, otherwisex1x2x1 XOR x200?01?10?11?

Perceptron Limitation: XOR ProblemXOR “Exclusive Or”- Input: two binary values x1 and x2- Output:- 1, when exactly one input equals 1- 0, otherwisex1x2x1 XOR x200?01?10?11?

Perceptron Limitation: XOR ProblemXOR “Exclusive Or”- Input: two binary values x1 and x2- Output:- 1, when exactly one input equals 1- 0, otherwisex1x2x1 XOR x200001?10?11?

Perceptron Limitation: XOR ProblemXOR “Exclusive Or”- Input: two binary values x1 and x2- Output:- 1, when exactly one input equals 1- 0, otherwisex1x2x1 XOR x200001110?11?

Perceptron Limitation: XOR ProblemXOR “Exclusive Or”- Input: two binary values x1 and x2- Output:- 1, when exactly one input equals 1- 0, otherwisex1x2x1 XOR x200001110111?

Perceptron Limitation: XOR ProblemXOR “Exclusive Or”- Input: two binary values x1 and x2- Output:- 1, when exactly one input equals 1- 0, otherwisex1x2x1 XOR x2000011101110

Perceptron Limitation: XOR ProblemCannot solve XOR problem and so separate 1s from 0s with a perceptron (linear function):x1x2x1 XOR x2000011101110

Perceptron Limitation: XOR Problem“[The perceptron is] the embryo of anelectronic computer that [the Navy] expectswill be able to walk, talk, see, write,reproduce itself and be conscious of itsexistence .[It]isexpectedtobefinishedinFrank Rosenblatt(Psychologist)aboutayearatacostof 100,000.”How can a machine be “conscious”1958 New York Times article: -of.htmlwhen it can’t solve the XOR problem?

Today’s Topics Motivation for neural networks: need non-linear models Neural network architecture: hidden layers Neural network architecture: activation functions Neural network architecture: output units Programming tutorial

Neural Networks: Connected NeuronsBiological Neural l Neural Network:https://github.com/amueller/introduction to ml with python/blob/master/02-supervised-learning.ipynb

Neural Networkone inputper featurepredictionThis is a 3-layer neural network(i.e., count number of hiddenlayers plus output layer)each “hidden layer” uses outputs ofunits (i.e., neurons) and provides themas inputs to other units (i.e., neurons)http://cs231n.github.io/neural-networks-1/

Neural Network How does this relate to a perceptron? Unit: takes as input a weighted sumand applies an activation functionPython Machine Learning; Raschka & /

Neural Network How does this relate to a perceptron? Unit: takes as input a weighted sumand applies an activation functionPython Machine Learning; Raschka & /

Neural Network How does this relate to a perceptron? Unit: takes as input a weighted sumand applies an activation functionPython Machine Learning; Raschka & /

Neural Network How does this relate to a perceptron? Unit: takes as input a weighted sumand applies an activation functionPython Machine Learning; Raschka & /

Neural Network How does this relate to a perceptron? Unit: takes as input a weighted sumand applies an activation functionPython Machine Learning; Raschka & /

Neural Network How does this relate to a perceptron? Unit: takes as input a weighted sumand applies an activation functionPython Machine Learning; Raschka & /

Neural Network How does this relate to a perceptron? Unit: takes as input a weighted sumand applies an activation functionPython Machine Learning; Raschka & /

Neural Network How does this relate to a perceptron? Unit: takes as input a weighted sumand applies an activation functionPython Machine Learning; Raschka & /

Neural Network How does this relate to a perceptron? Unit: takes as input a weighted sumand applies an activation functionPython Machine Learning; Raschka & /

Neural Network Training goal: learn model parameters Layers are called “hidden” becausealgorithm decides how to use eachlayer to produce its outputhttp://cs231n.github.io/neural-networks-1/

Neural NetworkHow many weights are in this model? Input to Hidden Layer 1: 3x4 12 Hidden Layer 1 to Hidden Layer 2: 4x4 16 Hidden Layer 2 to Output Layer 4x1 4 Total: 12 16 4 32http://cs231n.github.io/neural-networks-1/

Neural NetworkHow many parameters are there to learn? Number of weights: 32 Number of biases: 4 4 1 9 Total 41http://cs231n.github.io/neural-networks-1/

Fully Connected, Feedforward Neural Networks What does it mean for a model to be fullyconnected? Each unit provides input to each unit in the next layer What does it mean for a model to be feedforward? Each layer serves as input to the next layer with no loopshttp://cs231n.github.io/neural-networks-1/

How Many Layers and Units Should be Used?To be explored more in this courseand in lab assignment set 1

Hidden Layers Alone Are NOT Enough toModel Non-Linear FunctionsKey Observation: feedforward networks are just functions chained togethere.g., What is function for h1? x1W1W2h1W5W3x2 What is function for h2?yW4h2W6h1 w1x1 w3x2 b1 h2 w2x1 w4x2 b2 What is function for y? y h1w5 h2w6 b3y (w1x1 w3x2 b1 )w5 (w2x1 w4x2 b2)w6 b3y w1w5x1 w3w5x2 w5b1 w2w6 x1 w4w6x2 w6b2 b3A chain of LINEAR functions at any depth is still a LINEAR function!

Hidden Layers Alone Are NOT Enough toModel Non-Linear FunctionsKey Observation: feedforward networks are just functions chained togethere.g., What is function for h1? x1W1W2h1W5W3x2 What is function for h2?yW4h2W6h1 w1x1 w3x2 b1 h2 w2x1 w4x2 b2 What is function for y? y h1w5 h2w6 b3Constant x linear function linear functionA chain of LINEAR functions at any depth is still a LINEAR function!

Today’s Topics Motivation for neural networks: need non-linear models Neural network architecture: hidden layers Neural network architecture: activation functions Neural network architecture: output units Programming tutorial

Key Idea: Use Connected Neurons to Non-linearlyTransform Input into Useful Features for PredictionsBiological Neural l Neural Network:https://github.com/amueller/introduction to ml with python/blob/master/02-supervised-learning.ipynb

Key Idea: Use Connected Neurons to Non-linearlyTransform Input into Useful Features for PredictionsBiological Neuron:Artificial Neurons(e.g., Perceptron):?ActivationFunctionMimic a neuron firing,by having each unitapply a non-linear“activation” functionto the weighted inputPython Machine Learning; Raschka & Mirjalili

Non-Linear Activation Functions Each unit applies a non-linear “activation” function to the weighted input tomimic a neuron firingComputationally fasterSigmoidTanhReLUhttp://www.cs.utoronto.ca/ fidler/teaching/2015/slides/CSC411/10 nn1.pdf

Non-Linear Example: Revisiting XOR problem(0, 1)(1, 1)(0, 0)(1, 0) Non-linear function: separate 1s from 0s:INPUTOUTPUT

Non-Linear Example: Revisiting XOR problem(0, 1)(1, 1)(0, 0)(1, 0) Non-linear function: separate 1s from 0s: Approach: ReLU activation function (Bias 0Bias 011111Bias -1-2) with these parameters:

Non-Linear Example: Revisiting XOR problem(0, 1)(1, 1)(0, 0)(1, 0) Non-linear function: separate 1s from 0s: Approach: ReLU activation function (Bias 0Bias 0101?1?-2101Bias -1) with these parameters:

Non-Linear Example: Revisiting XOR problem(0, 1)(1, 1)(0, 0)(1, 0) Non-linear function: separate 1s from 0s: Approach: ReLU activation function (Bias 0Bias 010101?101Bias -10-2) with these parameters:

Non-Linear Example: Revisiting XOR problem(0, 1)(1, 1)(0, 0)(1, 0) Non-linear function: separate 1s from 0s: Approach: ReLU activation function (Bias 0Bias 0101010101Bias -10-2) with these parameters:

Non-Linear Example: Revisiting XOR problem(0, 1)(1, 1)(0, 0)(1, 0) Non-linear function: separate 1s from 0s: Approach: ReLU activation function (Bias 0Bias 0101?1?-2111Bias -1) with these parameters:

Non-Linear Example: Revisiting XOR problem(0, 1)(1, 1)(0, 0)(1, 0) Non-linear function: separate 1s from 0s: Approach: ReLU activation function (Bias 0Bias 010111?111Bias -10-2) with these parameters:

Non-Linear Example: Revisiting XOR problem(0, 1)(1, 1)(0, 0)(1, 0) Non-linear function: separate 1s from 0s: Approach: ReLU activation function (Bias 0Bias 0101111111Bias -10-2) with these parameters:

Non-Linear Example: Revisiting XOR problem(0, 1)(1, 1)(0, 0)(1, 0) Non-linear function: separate 1s from 0s: Approach: ReLU activation function (Bias 0Bias 0111?1?-2101Bias -1) with these parameters:

Non-Linear Example: Revisiting XOR problem(0, 1)(1, 1)(0, 0)(1, 0) Non-linear function: separate 1s from 0s: Approach: ReLU activation function (Bias 0Bias 011111?101Bias -10-2) with these parameters:

Non-Linear Example: Revisiting XOR problem(0, 1)(1, 1)(0, 0)(1, 0) Non-linear function: separate 1s from 0s: Approach: ReLU activation function (Bias 0Bias 0111111101Bias -10-2) with these parameters:

Non-Linear Example: Revisiting XOR problem(0, 1)(1, 1)(0, 0)(1, 0) Non-linear function: separate 1s from 0s: Approach: ReLU activation function (Bias 0Bias 0111?1?-2111Bias -1) with these parameters:

Non-Linear Example: Revisiting XOR problem(0, 1)(1, 1)(0, 0)(1, 0) Non-linear function: separate 1s from 0s: Approach: ReLU activation function (Bias 0Bias 011121?111Bias -11-2) with these parameters:

Non-Linear Example: Revisiting XOR problem(0, 1)(1, 1)(0, 0)(1, 0) Non-linear function: separate 1s from 0s: Approach: ReLU activation function (Bias 0Bias 0111210111Bias -11-2) with these parameters:

Non-Linear Example: Revisiting XOR problem(0, 1)(1, 1)(0, 0)(1, 0) Non-linear function: separate 1s from 0s: Approach: ReLU activation function () with these parameters:Neural networks can solve XOR problem.and so model non-linear functions!

Activation Functions and Model Parameters(e.g., Sigmoid)Biases determine shifted position:OutputOutputWeights determine al-networks

Which Activation Function Should be Used?To be explored more in this courseand in lab assignment set 1

Today’s Topics Motivation for neural networks: need non-linear models Neural network architecture: hidden layers Neural network architecture: activation functions Neural network architecture: output units Programming tutorial

Desired Output Driven by TaskRegression(predict continuous value)Classification(predict discrete value)Hands-on Machine Learning with Scikit-Learn & TensorFlow, Aurelien Geron

Linear (No Activation Function)xPython Machine Learning; Raschka & Mirjalili

Desired Output Driven by TaskRegression(predict continuous value)Classification(predict discrete value)Hands-on Machine Learning with Scikit-Learn & TensorFlow, Aurelien Geron

Sigmoid (for Binary Classification)If 0.5, output 1;Else, outputs 0

Softmax (for Multiclass Classification)Converts vector of scores into a probability distribution that sums to 1; e.g.,node with largestprobability is the classassigned to the imageFigure Source: de

Softmax (for Multiclass Classification)Converts vector of scores into a probability distribution that sums to 1; e.g.,Ideal prediction; 1hot vector of one 1and the rest 0sFigure Source: de

Softmax (for Multiclass Classification)Converts vector of scores into a probability distribution that sums to 1Get rid of negative values while preserving original order ofscores; e causes negative scores to become slightly larger than0 while positive values grow exponentially (choosing e ratherthan another exponent base simplifies math during training)i 1, , KNumber of classesWant to divide each node’s scoreby sum of all entries to makethem sum to 1 (normalization)Useful tutorial: ax-function-578c8b0fb15

Softmax (for Multiclass Classification)Converts vector of scores into a probability distribution that sums to 1; e.g.,Figure Source: de

Softmax (for Multiclass Classification)Converts vector of scores into a probability distribution that sums to 1; e.g.,

Softmax (for Multiclass Classification)Converts vector of scores into a probability distribution that sums to 1; e.g.,Figure Source: de

Softmax (for Multiclass Classification)Converts vector of scores into a probability distribution that sums to 1; e.g.,

Softmax (for Multiclass Classification)Converts vector of scores into a probability distribution that sums to 1; e.g.,node with largestprobability is the classassigned to the imageFigure Source: de

Softmax (for Multiclass Classification)Converts vector of scores into a probability distribution that sums to 1; e.g.,

Desired Output Driven by TaskRegression(predict continuous value)Classification(predict discrete value)Hands-on Machine Learning with Scikit-Learn & TensorFlow, Aurelien Geron

Desired Output Driven by ctivation-function-for-deep-learning/

Today’s Topics Motivation for neural networks: need non-linear models Neural network architecture: hidden layers Neural network architecture: activation functions Neural network architecture: output units Programming tutorial: link to code and video in Canvas

Today’s Topics Motivation for neural networks: need non-linear models Neural network architecture: hidden layers Neural network architecture: activation functions Neural network architecture: output units Programming tutorial

Neural network architecture: hidden layers Neural network architecture: activation functions Neural network architecture: outpu