Introduction To Machine Learning - Brown University

Transcription

Introduction toMachine LearningBrown University CSCI 1950-F, Spring 2012Prof. Erik SudderthLecture 8:Linear Regression & Least SquaresBayesian Linear Regression & PredictionMany figures courtesy Kevin Murphy’s textbook,Machine Learning: A Probabilistic Perspective

Gaussian Distributions ! ! ! !Simplest joint distribution that can capture arbitrary mean & covarianceJustifications from central limit theorem and maximum entropy criterionProbability density above assumes covariance is positive definiteML parameter estimates are sample mean & sample covariance

ContinuousDiscreteA Change in DirectionSupervised LearningUnsupervised Learningclassification reduction !GOAL: Predict label/response y from feature x !Generative classification: Apply Bayes’ rule to learned p(x,y) !Discriminative or conditional regression & classification:directly learn a model of p(y x), assuming x always given

!"# %&'(%)")'* #,-.#'/.0 1)'234'5! 67%891 :';.1 #.8"%1' & '*"?#@'Slides adapted from Bishop’s Pattern Recognition and Machine Learning Notation differs from Murphy’s Machine Learning: A Probabilistic Perspective

!"# %&'(%)")'* #,-.#'/.0 1)'2A4'5!;.1 #.8"%1'B%)")'C #,-.#):'5!DE ) '%& '@1.B%1:'%')8%11',E%#@ '"#'x'%F ,G)'%11'B%)")'C #,-.#)H'

!"# %&'(%)")'* #,-.#'/.0 1)'2I4'5!J%0"%1'B%)")'C #,-.#):'5!DE ) '%& '1.,%1:'%')8%11',E%#@ '"#'x'.#1 '%F ,G)'# %&B 'B%)")'C #,-.#)H';%&%8 G &)',.#G&.1'1.,%-.#'%#0'),%1 '2K"0GE4H'

!"# %&'(%)")'* #,-.#'/.0 1)'2L4'Fourier BasisWavelet Basis

M 8N.CNMO %& )'6&&.&'* #,-.#'Nnumber of examplesMnumber of featurestnxnoutput or responseinput or covariatesy(xn , w) φ(xn )T wN11T2E(w) (tn φ(xn ) w) t Φw 22 n 12Equivalent to maximum likelihood (ML) estimation under a Gaussian model:p(tn w, xn ) N (tn φ(xn )T w, β 1 ) ββexp (tn φ(xn )T w)22π2

P .8 G& '.C'! %)G'MO %& )'5! .#)"0 &'NN0"8 #)".#%1'MN0"8 #)".#%1'' '' '''''''''''''''''''' ' !S'")')9%## 0'B ''''''''''''''''''''H' !wML'8"#"8"Q )'GE '0")G%#, 'B GK #'t'%#0'"G)'.&GE.@.#%1'9&.R ,-.#'.#'SS'"H H'yH'N11T2E(w) (tn φ(xn ) w) t Φw 22 n 12

Finding Least Squares SolutionN11T2E(w) (tn φ(xn ) w) t Φw 22 n 12Gradient vectors:f : RM R w f : RM RM f (w)( w f (w))k wkGradient identities:1 Tf (w) w Aw bT w2Normal equations:ΦT Φŵ ΦT t w f (w) 1(A AT )w b2there is a unique solution if and only if(requires N M)rank(Φ) M

;.1 #.8"%1' & '*"?#@ '

TGE'U&0 &';.1 #.8"%1'

3)G'U&0 &';.1 #.8"%1'

I&0'U&0 &';.1 #.8"%1'

VGE'U&0 &';.1 #.8"%1'

U &NW?#@'J.GN/ %#NMO %& '2J/M4'6&&.&:'

;.1 #.8"%1' . X," #G)'''

Y%G%'M G'M"Q :''VGE'U&0 &';.1 #.8"%1'

Y%G%'M G'M"Q :''VGE'U&0 &';.1 #.8"%1'

J @ 1%&"Q 0'! %)G'MO %& )'5! .#)"0 &'GE ' &&.&'C #,-.#:'Y%G%'G &8'\'J @ 1%&"Q%-.#'G &8'5! Z"GE'GE ') 8N.CN)O %& )' &&.&'C #,-.#'%#0'%'O %0&%-,'& @ 1%&"Q &S'K '@ G'''5! KE",E'")'8"#"8"Q 0'B ''5! DE '8%G&"7'%B. '")'!"#! %'"# &-B1 H''ZE ['5! ZE%G'")'GE '9&.B%B"1")-,'"#G &9& G%-.#'.C'GE")'& @ 1%&"Q &['

J @ 1%&"Q%-.#:''

J @ 1%&"Q%-.#:''

J @ 1%&"Q%-.#:''''''''''' )H'''''''''''' '

;.1 #.8"%1' . X," #G)'''

(% )"%#'!"# %&'J @& ))".#'5! Y W# '%',.#R @%G '9&".&'. &'w5! .8B"#"#@'GE")'K"GE'GE '1"] 1"E.0'C #,-.#'%#0' )"#@''& ) 1G)'C.&'8%&@"#%1'%#0',.#0"-.#%1'P% ))"%#'0")G&"B -.#)S'@" )'GE '9.)G &".&'5! ',.88.#',E.", 'C.&'GE '9&".&'"):'

Gaussian Conditionals & Marginals

Linear Gaussian SystemsMarginal Likelihood:Posterior Distribution:Board: Specialization to linear regression model

(% )"%#'J @& ))".#'67%891 'T'0%G%'9."#G)'.B) & 0';&".&'Y%G%'M9%, '

(% )"%#'J @& ))".#'67%891 '3'0%G%'9."#G'.B) & 0'!"] 1"E.0';.)G &".&'Y%G%'M9%, '

(% )"%#'J @& ))".#'67%891 'A'0%G%'9."#G)'.B) & 0'!"] 1"E.0';.)G &".&'Y%G%'M9%, '

(% )"%#'J @& ))".#'67%891 'AT'0%G%'9."#G)'.B) & 0'!"] 1"E.0';.)G &".&'Y%G%'M9%, '

;& 0",- 'Y")G&"B -.#'234'5! ;& 0",G't'C.&'# K' %1 )'.C'x'B '"#G @&%-#@'. &'w:'5! KE & '

;& 0",- 'Y")G&"B -.#'2A4'5! 67%891 :'M"# )."0%1'0%G%S'V'P% ))"%#'B%)")'C #,-.#)S''3'0%G%'9."#G'

;& 0",- 'Y")G&"B -.#'2I4'5! 67%891 :'M"# )."0%1'0%G%S'V'P% ))"%#'B%)")'C #,-.#)S''A'0%G%'9."#G)'

;& 0",- 'Y")G&"B -.#'2L4'5! 67%891 :'M"# )."0%1'0%G%S'V'P% ))"%#'B%)")'C #,-.#)S''L'0%G%'9."#G)'

;& 0",- 'Y")G&"B -.#'2 4'5! 67%891 :'M"# )."0%1'0%G%S'V'P% ))"%#'B%)")'C #,-.#)S''A '0%G%'9."#G)'

6)-8%-.#' )H';& 0",- 'Y")G&"B -.#)'plugin approximation (MLE)Posterior predictive (known variance)6080predictiontraining datapredictiontraining data7050604050304020302010100 10 80 6 4 202468 10 8 6 4 202468

23.02.2012 · Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 8: Linear Regression & Least Squares Bayesian Linear Regression & Prediction Many figures courtesy Kevin Murphy’s textbook, Machine Learning: A Probabilistic Perspective . Gaussian Distributions ! Simplest joint distribution that can capture arbitrary mean & covariance ! Justifications from central .