Probabilistic Machine Learning - Marek.petrik.us

Transcription

Probabilistic Machine LearningBayesian Nets, MCMC, and moreMarek Petrik4/18/2017Based on: P. Murphy, K. (2012). Machine Learning: A ProbabilisticPerspective. Chapter 10.

Conditional IndependenceIIndependent random variablesP[X, Y ] P[X]P[Y ]IConvenient, but not true often enough

Conditional IndependenceIIndependent random variablesP[X, Y ] P[X]P[Y ]IConvenient, but not true often enoughIConditional independenceX Y Z P[X, Y Z] P[X Z]P[Y Z]IUse conditional independence in machine learning

Dependent but Conditionally IndependentEvents with a possibly biased coin:1. X: Your first coin flip is heads2. Y : Your second flip is heads3. Z: Coin is biased

Dependent but Conditionally IndependentEvents with a possibly biased coin:1. X: Your first coin flip is heads2. Y : Your second flip is heads3. Z: Coin is biasedIX and Y are not independentIX and Y are independent given Z

Independent but Conditionally DependentIs this possible?

Independent but Conditionally DependentIs this possible? Yes! Events with an unbiased coin:1. X: Your first coin flip is heads2. Y : Your second flip is heads3. Z: The coin flips are the same

Independent but Conditionally DependentIs this possible? Yes! Events with an unbiased coin:1. X: Your first coin flip is heads2. Y : Your second flip is heads3. Z: The coin flips are the sameIX and Y are independentIX and Y are not independent given Z

Conditional Independence in Machine LearningILinear regression

Conditional Independence in Machine LearningILinear regressionILDA

Conditional Independence in Machine LearningILinear regressionILDAINaive Bayes

Directed Graphical ModelsIRepresent complex structure of conditional independence

Directed Graphical ModelsIIRepresent complex structure of conditional independenceNode is independent of all predecessors conditional on parentvaluexs xpred(s)\pa(s) xpa(s)

Undirected Graphical ModelsIAnother (different) representation of conditional independenceIMarkov Random Fields

Naive Bayes ModelClosely related to QDA and LDA

Naive Bayes ModelIChain ruleP[x1 , x2 , x3 ] P[x1 ]P[x2 x1 ]P[x3 x1 , x2 ]IProbabilityP[x, y] P[y]DYj 1P[xj y]

Why Bother with Conditional Independence?

Why Bother with Conditional Independence?IReduces number of parameters

Why Bother with Conditional Independence?IReduces number of parametersIReduces bias or variance?

Markov ChainI1st order Markov chain:I2nd order Markov chain:

Uses of Markov ChainsITime series predictionISimulation of stochastic systemsIInference in Bayesian nets and modelsIMany others . . .

Hidden Markov ModelsUsed for:ISpeech and language recognitionITime series predictionIKalman filter: version with normal distributions used in GPS’s

InferenceIInference of hidden variables (y)P[y xv , θ] IP[y, xv θ]P[xv θ]Eliminating nuisance variables (e.g. x1 is not observed)XP[y x2 , θ] P[y, x1 x2 , θ]x1IWhat is inference in linear regression?

LearningIIComputing conditional probabilities θApproaches:1. Maximum A Posteriori (MAP)arg max log P[θ x] arg max (log P[x θ] log P[θ])θθ

LearningIIComputing conditional probabilities θApproaches:1. Maximum A Posteriori (MAP)arg max log P[θ x] arg max (log P[x θ] log P[θ])θθ2. Inference!IIInfer distribution of θ given xReturn mode, median, mean, or anything appropriate

LearningIIComputing conditional probabilities θApproaches:1. Maximum A Posteriori (MAP)arg max log P[θ x] arg max (log P[x θ] log P[θ])θθ2. Inference!IIIInfer distribution of θ given xReturn mode, median, mean, or anything appropriateFixed effects vs random effects (mixed effects models)

Inference in PracticeIPrecise inference is often impossibleIVariational inference: approximate modelsMarkov Chain Monte Carlo (MCMC):IIIIGibbs samplesMetropolis HastingsOthers

Probabilistic Modeling LanguagesISimple framework to describe a Bayesian modelIInference with MCMC and parameter searchPopular frameworks:IIIIIJAGSBUGS, WinBUGS, OpenBUGSStanExamples:IIILinear regressionRidge regressionLasso

Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence I Independent random variables P[X;Y] P[X]P[Y] I Convenient, but not true o†en enough