Machine Learning Using Matlab - Uni-konstanz.de

Transcription

Machine Learning usingMatlabLecture 6 Neural Network (cont.)

Cost function

Forward propagation Forward propagation from layer l to layer l 1 is computed as: Note when l 1,Layer 1Layer 2Layer 3Layer 4

Backpropagation Backpropagation from layer l 1 to layer l is computed as: When l L,Layer 1Layer 2Layer 3Layer 4

ExampleGiven a training example (x,y), the cost function is firstsimplified as:. Forwardpropagation and backpropagation are computed as:Forward propagationBackpropagationLayer 1Layer 2Layer 3Layer 4

Gradient computation1.2.3.Given training setSetFor i 1 to m 4.SetPerform forward propagation to compute al forUsing yi , computeComputedelta

Random initialization Instead of initialize the parameters to all zeros, it is important to initialize themrandomly.The random initialization serves the purpose of symmetry breaking.Forward propagationBackpropagationLayer 1Layer 2Layer 3Layer 4

Random initialization - Matlab function Initial each parameter to a random value infunction W randInitializeWeights(L in, L out)epsilon init 0.1W rand(L out, 1 L in) * 2 * epsilon init - epsilon initend

Advanced optimization We have taught how to call the existing numerical computing functions toacquire the optimal parameters: Function [J, grad] costFunction(theta) .optTheta minFunc(@costFunction, initialTheta, options)In the following neural network, we have three parameter matrices, how tofeed them into “minFunc” function?“Unroll” into vectorsLayer 1Layer 2Layer 3Layer 4

Advanced optimization - exampleL 4, s1 4, s2 5, s3 5, s4 4ϴ(1) ℝ5 5, ϴ(2) ℝ5 6, ϴ(3) ℝ4 6Matlab implementation:1.2.3.Unroll: thetaVec [ Theta1(:); Theta2(:); Theta3(:)]Feed thetaVec into “minFunc”Reshape thetaVec in “costFunction”:a.b.c.d.Theta1 reshape(thetaVec(1:25),5,5);Theta2 reshape(thetaVect(26:55),5,6);Theta3 reshape(thetaVect(56:79),4,6);Compute J and gradLayer 1 Layer 2 Layer 3Layer 4

Gradient check Too many parameters, not sure if the computed gradient is correct or not?!Recall the definition of numerical estimation of gradients, we can compare thegradient with the numerical estimation of gradients.

Gradient check

Gradient check Implementation note: Implement backpropagation to compute gradientImplement numerical gradient check to compute estimated gradientMake sure they have similar values (less than a threshold)Turn off gradient check for trainingNote: Be sure to disable your gradient check code, otherwise it is very slow to learnGradient check can be generalized to check gradient of any cost function

Overview: train a neural network Design a network architectureRandomly initialized weightsImplement forward propagation to get hypothesisfor any xiImplement code to compute cost function JtheImplement backpropagation to compute partial derivativesUse gradient check to comparewith numerical estimation of gradientof Jthet, If it works well, then disable gradient checking codeUse gradient descent or advanced optimization method to minimize Jthet as afunction of parameters T

Deep feedforward Neural Networks

Other architecturesConvolutional Neural Network (CNN)Recurrent Neural Network (RNN)

Application 1

Application 2

Discussion More parameters, more powerful. Which one is better: more layers or more neurons?Disadvantages?Neural network is non-convex, gradient descent is susceptible to local optima;however, it works fairly well even though the optima is not global.Black box model

From logistic regression to SVMLogistic regressionSupport Vector Machine (SVM) Label: Label: Hypothesis: Hypothesis: Objective: Objective:

From logistic regression to SVM Cost function: Cost function:

From logistic regression to SVM Logistic regression: SVM:

SVM - model representation Given training examplesso that: Which is equivalent to minimizing the following cost function: , SVM aims to find an optimal hyperplaneHere is called hinge loss.

SVM - gradient computing Because the hinge loss is not differentiable, a sub-gradient is computed:

SVM - intuition

SVM - intuition Which of the linear classifier is optimal?

SVM - intuition1.2.Support vectorSupport vectorMaximizing the margin is goodaccording to tuition and PAC theoryImplies that only support vectors areimportant; while other trainingexamples are ignorable

SVM - intuition Which linear classifier has better performance?

Machine Learning using Matlab Lecture 6 Neural Network (cont.) Cost function. Forward propagation Forward propagation from layer l to layer l 1 is computed as: . Deep feedforward Neural Networks. Other architectures Convolutional Neural Network (CNN) Recurrent Neural Network (RNN) Application 1. Application 2.