ML With Tensorflow - GitHub Pages

Transcription

Lecture 7-1Application & Tips:Learning rate, data preprocessing, overfittingSung Kim hunkim mr@gmail.com

Gradient d730/l-6370362152/m-6379811827

Gradient d730/l-6370362152/m-6379811827

Large learning rate: 015 singlelayer neurons.html

Small learning rate:takes too long, stops at local minimumhttp://sebastianraschka.com/Articles/2015 singlelayer neurons.html

Try several learning rates Observe the cost function Check it goes down in a reasonable rate

Data (X) preprocessing for gradient descent

Data (X) preprocessing for gradient descentx1x2y19000A2-5000A4-2000B68000B99000C

Data (X) preprocessing for gradient descentx1x2y19000A2-5000A4-2000B68000B99000C

Data (X) preprocessing for gradient descent

Gradient descentis oneof the manyalgorithmsthat benefit from featurescaling.use a re, we will use a feature scaling method called standardization, which gives ouerty of adatastandarddistribution.Theeachthe propertynormalof a standardnormal distribution.Themeanmean ofofeachfeaturefeacenteredvalue 0 andcolumnthe feature columna standard deviationof 1. Forvalue 0isandtheat featurehas ahasstandarddeviationofexample, to standardize the j th feature, we simply need to subtract the samplej th feature, we simply need to subtract the sandardizethemean µ j from everyStandardizationtraining sample and divide it by its standard deviation σ jm every training sample and divide it by its standard deviatx ′j xj µjxj µjHere x j′x jis a vector consisting of theσjj th feature values of all training samples nσjStandardization can easily be achieved using the NumPy methods mean and std X std np.copy(X)j th X std[:,0](X[:,0]- X[:,0].mean())/ X[:,0].std()vector consistingof thefeature valuesof all training sam X std[:,1] (X[:,1] - X[:,1].mean()) / X[:,1].std()on can easilybe achievedusingtheNumPymethodsmeanaAfter standardization,we willtrain theAdalineagain andsee that it nowconverusing ahttp://sebastianraschka.com/Articles/2015 singlelayer neurons.htmllearning rate η 0.01 :np.copy(X) ada AdalineGD(n iter 15, eta 0.01) ada.fit(X std, y)0] (X[:,0]- X[:,0].mean()) / X[:,0].std()

Overfitting Our model is very good with training data set (with memorization) Not good at test dataset or in real use

Overfitting

Solutions for overfitting More training data! Reduce the number of features Regularization

Regularization Let’s not have too big numbers in the weight

Regularization Let’s not have too big numbers in the weight

Regularization Let’s not have too big numbers in the weight

Regularization Let’s not have too big numbers in the weight

Summary Learning rate Data preprocessing Overfitting- More training data- Regularization

Lecture 7-2Application & Tips:Learning and test data setsSung Kim hunkim mr@gmail.com

Performance evaluation: is this good?

Evaluation using training set? 100% correct (accuracy) Can memorize

Training and test setshttp://www.holehouse.org/mlclass/10 Advice for applying machine learning.html

Training, validation and test ecting-representative-data-sets

Online tions/selecting-representative-data-sets

MINIST Datasethttp://yann.lecun.com/exdb/mnist/

Accuracy How many of your predictions are correct? 95% 99%? Check out the lab video

ML with Tensorflow