Transcription
Probabilistic Machine LearningBayesian Nets, MCMC, and moreMarek Petrik4/18/2017Based on: P. Murphy, K. (2012). Machine Learning: A ProbabilisticPerspective. Chapter 10.
Conditional IndependenceIIndependent random variablesP[X, Y ] P[X]P[Y ]IConvenient, but not true often enough
Conditional IndependenceIIndependent random variablesP[X, Y ] P[X]P[Y ]IConvenient, but not true often enoughIConditional independenceX Y Z P[X, Y Z] P[X Z]P[Y Z]IUse conditional independence in machine learning
Dependent but Conditionally IndependentEvents with a possibly biased coin:1. X: Your first coin flip is heads2. Y : Your second flip is heads3. Z: Coin is biased
Dependent but Conditionally IndependentEvents with a possibly biased coin:1. X: Your first coin flip is heads2. Y : Your second flip is heads3. Z: Coin is biasedIX and Y are not independentIX and Y are independent given Z
Independent but Conditionally DependentIs this possible?
Independent but Conditionally DependentIs this possible? Yes! Events with an unbiased coin:1. X: Your first coin flip is heads2. Y : Your second flip is heads3. Z: The coin flips are the same
Independent but Conditionally DependentIs this possible? Yes! Events with an unbiased coin:1. X: Your first coin flip is heads2. Y : Your second flip is heads3. Z: The coin flips are the sameIX and Y are independentIX and Y are not independent given Z
Conditional Independence in Machine LearningILinear regression
Conditional Independence in Machine LearningILinear regressionILDA
Conditional Independence in Machine LearningILinear regressionILDAINaive Bayes
Directed Graphical ModelsIRepresent complex structure of conditional independence
Directed Graphical ModelsIIRepresent complex structure of conditional independenceNode is independent of all predecessors conditional on parentvaluexs xpred(s)\pa(s) xpa(s)
Undirected Graphical ModelsIAnother (different) representation of conditional independenceIMarkov Random Fields
Naive Bayes ModelClosely related to QDA and LDA
Naive Bayes ModelIChain ruleP[x1 , x2 , x3 ] P[x1 ]P[x2 x1 ]P[x3 x1 , x2 ]IProbabilityP[x, y] P[y]DYj 1P[xj y]
Why Bother with Conditional Independence?
Why Bother with Conditional Independence?IReduces number of parameters
Why Bother with Conditional Independence?IReduces number of parametersIReduces bias or variance?
Markov ChainI1st order Markov chain:I2nd order Markov chain:
Uses of Markov ChainsITime series predictionISimulation of stochastic systemsIInference in Bayesian nets and modelsIMany others . . .
Hidden Markov ModelsUsed for:ISpeech and language recognitionITime series predictionIKalman filter: version with normal distributions used in GPS’s
InferenceIInference of hidden variables (y)P[y xv , θ] IP[y, xv θ]P[xv θ]Eliminating nuisance variables (e.g. x1 is not observed)XP[y x2 , θ] P[y, x1 x2 , θ]x1IWhat is inference in linear regression?
LearningIIComputing conditional probabilities θApproaches:1. Maximum A Posteriori (MAP)arg max log P[θ x] arg max (log P[x θ] log P[θ])θθ
LearningIIComputing conditional probabilities θApproaches:1. Maximum A Posteriori (MAP)arg max log P[θ x] arg max (log P[x θ] log P[θ])θθ2. Inference!IIInfer distribution of θ given xReturn mode, median, mean, or anything appropriate
LearningIIComputing conditional probabilities θApproaches:1. Maximum A Posteriori (MAP)arg max log P[θ x] arg max (log P[x θ] log P[θ])θθ2. Inference!IIIInfer distribution of θ given xReturn mode, median, mean, or anything appropriateFixed effects vs random effects (mixed effects models)
Inference in PracticeIPrecise inference is often impossibleIVariational inference: approximate modelsMarkov Chain Monte Carlo (MCMC):IIIIGibbs samplesMetropolis HastingsOthers
Probabilistic Modeling LanguagesISimple framework to describe a Bayesian modelIInference with MCMC and parameter searchPopular frameworks:IIIIIJAGSBUGS, WinBUGS, OpenBUGSStanExamples:IIILinear regressionRidge regressionLasso
Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence I Independent random variables P[X;Y] P[X]P[Y] I Convenient, but not true o†en enough