Metropolis-Hastings Algorithm - Jarad Niemi

Transcription

Metropolis-Hastings algorithmDr. Jarad NiemiSTAT 544 - Iowa State UniversityApril 2, 2019Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 20191 / 32

OutlineMetropolis-Hastings algorithmIndependence proposalRandom-walk proposalOptimal tuning parameterBinomial exampleNormal exampleBinomial hierarchical exampleJarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 20192 / 32

Metropolis-Hastings algorithmMetropolis-Hastings algorithmLetp(θ y) be the target distribution andθ(t) be the current draw from p(θ y).The Metropolis-Hastings algorithm performs the following1. propose θ g(θ θ(t) )2. accept θ(t 1) θ with probability min{1, r} wherer r(θ(t) , θ ) p(θ y)/g(θ θ(t) )p(θ y) g(θ(t) θ ) p(θ(t) y)/g(θ(t) θ )p(θ(t) y) g(θ θ(t) )otherwise, set θ(t 1) θ(t) .Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 20193 / 32

Metropolis-Hastings algorithmMetropolis-Hastings algorithmSuppose we only know the target up to a normalizing constant, i.e.p(θ y) q(θ y)/q(y)where we only know q(θ y).The Metropolis-Hastings algorithm performs the following1. propose θ g(θ θ(t) )2. accept θ(t 1) θ with probability min{1, r} wherer r(θ(t) , θ ) p(θ y) g(θ(t) θ )q(θ y)/q(y) g(θ(t) θ )q(θ y) g(θ(t) θ ) p(θ(t) y) g(θ θ(t) )q(θ(t) y)/q(y) g(θ θ(t) )q(θ(t) y) g(θ θ(t) )otherwise, set θ(t 1) θ(t) .Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 20194 / 32

Metropolis-Hastings algorithmTwo standard Metropolis-Hastings algorithmsIndependent Metropolis-HastingsIndependent proposal, i.e. g(θ θ(t) ) g(θ)Random-walk MetropolisSymmetric proposal, i.e. g(θ θ(t) ) g(θ(t) θ) for all θ, θ(t) .Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 20195 / 32

Independence Metropolis-HastingsIndependence Metropolis-HastingsLetp(θ y) q(θ y) be the target distribution,θ(t) be the current draw from p(θ y), andg(θ θ(t) ) g(θ), i.e. the proposal is independent of the current value.The independence Metropolis-Hastings algorithm performs the following1. propose θ g(θ)2. accept θ(t 1) θ with probability min{1, r} wherer q(θ y)/g(θ )q(θ y) g(θ(t) ) q(θ(t) y)/g(θ(t) )q(θ(t) y) g(θ )otherwise, set θ(t 1) θ(t) .Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 20196 / 32

Independence Metropolis-HastingsIntuition through examplesproposed 1proposed 0proposed 10.30.20.1current rrent 0y0.3FALSETRUE0.0value0.40.20.1current 1current0.3proposed0.0 2 1Jarad Niemi (STAT544@ISU)0123 2 101theta23 2 1Metropolis-Hastings0123April 2, 20197 / 32

Independence Metropolis-HastingsExample: Normal-Cauchy modelLet Y N (θ, 1) with θ Ca(0, 1) such that the posterior isp(θ y) p(y θ)p(θ) exp( (y θ)2 /2)1 θ2Use N (y, 1) as the proposal, then the Metropolis-Hastings acceptanceprobability is the min{1, r} withr Jarad Niemi (STAT544@ISU)q(θ y) g(θ(t) )q(θ(t) y) g(θ )exp( (y θ )2 /2)/1 (θ )2 exp( (θ(t) y)2 /2)exp( (y θ(t) )2 /2)/1 (θ(t) )2 exp( (θ y)2 /2)1 (θ(t) )21 (θ )2Metropolis-HastingsApril 2, 20198 / 32

Independence Metropolis-HastingsExample: Normal-Cauchy model0.4densitydistributionproposaltarget0.20.0 202thetaJarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 20199 / 32

Independence Metropolis-HastingsExample: Normal-Cauchy modelIndependence Metropolis Hastingsθ10 10255075100Iteration (t)Independence Metropolis Hastings (poor starting value)10.0θ7.55.02.50.00Jarad Niemi (STAT544@ISU)2550Iteration (t)Metropolis-Hastings75100April 2, 201910 / 32

Independence Metropolis-HastingsNeed heavy tailsRecall thatrejection sampling requires the proposal to have heavy tails andimportance sampling is efficient only when the proposal has heavytails.Independence Metropolis-Hastings also requires heavy tailed proposals forefficiency since if θ(t) isin a region where p(θ(t) y) g(θ(t) ), i.e. target has heavier tailsthan the proposal, thenany proposal θ such that p(θ y) g(θ ), i.e. in the center of thetarget and proposal,will result ing(θ(t) ) p(θ y)r 0p(θ(t) y) g(θ )and few samples will be accepted.Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201911 / 32

Independence Metropolis-HastingsNeed heavy tails - exampleSuppose θ y Ca(0, 1) and we use a standard normal as a proposal. Then0.4value0.3densitytarget0.2proposal0.10.0 4 2024target / proposalx100101 4 2024thetaJarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201912 / 32

Independence Metropolis-HastingsNeed heavy tailsθ20 202505007501000Iteration (t)Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201913 / 32

Random-walk MetropolisRandom-walk MetropolisLetp(θ y) q(θ y) be the target distribution,θ(t) be the current draw from p(θ y), andg(θ θ(t) ) g(θ(t) θ ), i.e. the proposal is symmetric.The Metropolis algorithm performs the following1. propose θ g(θ θ(t) )2. accept θ(t 1) θ with probability min{1, r} wherer q(θ y) g(θ(t) θ )q(θ y) q(θ(t) y) g(θ θ(t) )q(θ(t) y)otherwise, set θ(t 1) θ(t) .This is also referred to as random-walk Metropolis.Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201914 / 32

Random-walk MetropolisStochastic hill climbingNotice that r q(θ y)/q(θ(t) y) and thus will accept whenever the targetdensity is larger when evaluated at the proposed value than it is whenevaluated at the current value.0.4Suppose θ y N (0, 1), θ(t) 1, and θ N (θ(t) , 1).0.20.00.1dnorm(x)0.3TargetProposal 3 2 10123xJarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201915 / 32

Random-walk MetropolisExample: Normal-Cauchy modelLet Y N (θ, 1) with θ Ca(0, 1) such that the posterior isp(θ y) p(y θ)p(θ) exp( (y θ)2 /2)1 θ2Use N (θ(t) , v 2 ) as the proposal, then the acceptance probability is themin{1, r} withq(θ y)p(y θ )p(θ )r .q(θ(t) y)p(y θ(t) )p(θ(t) )For this example, let v 2 1.Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201916 / 32

Random-walk MetropolisExample: Normal-Cauchy modelRandom walk Metropolis2θ10 1 2025507510075100tRandom walk Metropolis (poor starting value)10.0θ7.55.02.50.002550tJarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201917 / 32

Random-walk MetropolisOptimal tuning parameterRandom-walk tuning parameterLet p(θ y) be the target distribution, the proposal is symmetric with scale v 2 , andθ(t) is (approximately) distributed according to p(θ y).If v 2 0, then θ θ(t) andr q(θ y) 1q(θ(t) y)and all proposals are accepted, but θ θ(t) .As v 2 , then q(θ y) 0 since θ will be far from the mass of thetarget distribution andq(θ y)r 0q(θ(t) y)so all proposed values are rejected.So there is an optimal v 2 somewhere. For normal targets, the optimalrandom-walk proposal variance is 2.42 V ar(θ y)/d where d is the dimension of θwhich results in an acceptance rate of 40% for d 1 down to 20% as d .Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201918 / 32

Random-walk MetropolisOptimal tuning parameterRandom-walk with tuning parameter that is too big andtoo smallLet y θ N (θ, 1), θ Ca(0, 1), and y 1.0.80.4thetaas.factor(v)0.10.010 0.40255075100iterationJarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201919 / 32

Random-walk MetropolisBinomial modelBinomial modelLet Y Bin(n, θ) and θ Be(1/2, 1/2), thus the posterior isp(θ y) θy 0.5 (1 θ)n y 0.5 I(0 θ 1).To construct a random-walk Metropolis algorithm, we choose the proposalθ N (θ(t) , 0.42 )and accept, i.e. θ(t 1) θ with probability min{1, r} wherer p(θ y)(θ )y 0.5 (1 θ )n y 0.5 I(0 θ 1) p(θ(t) y)(θ(t) )y 0.5 (1 θ(t) )n y 0.5 I(0 θ(t) 1)otherwise, set θ(t 1) θ(t) .Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201920 / 32

Random-walk MetropolisBinomial modelBinomial modeln 10000log q function(theta, y 3, n 10) {if (theta 0 theta 1) return(-Inf)(y-0.5)*log(theta) (n-y-0.5)*log(1-theta)}current 0.5# Initial valuesamps rep(NA,n)for (i in 1:n) {proposed rnorm(1, current, 0.4) # tuning parameter is 0.4logr log q(proposed)-log q(current)if (log(runif(1)) logr) current proposedsamps[i] current}length(unique(samps))/n # acceptance rate[1] 0.3746Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201921 / 32

Random-walk MetropolisBinomial .62.53.00.8Histogram of samps02000400060008000100000.0IndexJarad Niemi pril 2, 201922 / 32

Random-walk MetropolisNormal modelNormal modelAssumeindYi N (µ, σ 2 )andp(µ, σ) Ca (σ; 0, 1)and thus Qn 1 1 exp 1 (y µ)2p(µ, σ y) I(σ 0)i2i 1 σ2σ1 σ 2 Pn11 n22 σ exp 2σ2I(σ 0)i 1 yi 2µny µ1 σ 2Perform a random-walk Metropolis using a normal proposal, i.e. if µ(t) andσ (t) are the current values for µ and σ, then (t) µµ N,S σσ (t)where S is the tuning parameter.Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201923 / 32

Random-walk MetropolisNormal modelAdapting the tuning parameterRecall that the optimal random-walk tuning parameter (if the target isnormal) is 2.42 V ar(θ y)/d where V ar(θ y) is the unknown posteriorcovariance matrix. We can estimate V ar(θ y) using the sample covariancematrix of draws from the posterior.Proposed automatic adapting of the Metropolis-Hastings tuningparameter:1. Start with S0 . Set b 0.2. Run M iterations of the MCMC using 2.42 Sb /d.3. Set Sb 1 to the sample covariance matrix of all previous draws.4. If b B, set b b 1 and return to step 2. Otherwise, throw awayall previous draws and go to step 5.5. Run K iterations of the MCMC using 2.42 SB /d.Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201924 / 32

Random-walk MetropolisNormal modelR code for Metropolis-Hastingsn 20y rnorm(n)sum y2 sum(y 2)nybar mean(y)log q function(x) {if (x[2] 0) return(-Inf)-n*log(x[2])-(sum y2-2*nybar*x[1] n*x[1] 2)/(2*x[2] 2)-log(1 x[2] 2)}B 10M 100samps matrix(NA, nrow B*M, ncol 2)a rate rep(NA, B)# InitializeS diag(2) # S 0current c(0,1)Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201925 / 32

Random-walk MetropolisNormal modelR code for Metropolis-Hastings - Adapting# Adaptfor (b in 1:B) {for (m in 1:M) {i (b-1)*M mproposed mvrnorm(1, current, 2.4 2*S/2)logr log q(proposed) - log q(current)if (log(runif(1)) logr) current proposedsamps[i,] current}a rate[b] length(unique(samps[1:i,1]))/length(samps[1:i,1])S var(samps[1:i,])}a rate[1] 0.0300000 0.2700000 0.3566667 0.4000000 0.4240000 0.4333333 0.4200000 0.4175000 0.4166667 0.4270000var(samps) # S B[,1][,2][1,] 0.04898222 0.00255292[2,] 0.00255292 0.02365873Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201926 / 32

Random-walk MetropolisNormal modelR code for Metropolis-Hastings - Adaptingsamps as.data.frame(samps); names(samps) c("mu","sigma"); samps iteration 1:nrow(samps)ggplot(melt(samps, id.var 'iteration', variable.name 'parameter'), aes(x iteration, y value)) geom line() facet wrap( parameter, scales 'free') theme bw()musigma1.50value0.51.250.01.00 0.50.750250500750100002505007501000iterationJarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201927 / 32

Random-walk MetropolisNormal modelR code for Metropolis-Hastings - Inference# Final runK 10000samps matrix(NA, nrow K, ncol 2)for (k in 1:K) {proposed mvrnorm(1, current, 2.4 2*S/2)logr log q(proposed) - log q(current)if (log(runif(1)) logr) current proposedsamps[k,] na.omit(samps[,1])) # acceptance rate[1] 0.3947Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201928 / 32

Random-walk MetropolisNormal modelR code for Metropolis-Hastings - Inferencemusigma1.02.00.50.0value1.5 0.51.0 sigma1.52.01.5density1.01.00.50.50.00.0 1.0 0.50.00.51.01.01.52.0valueJarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201929 / 32

Random-walk MetropolisHierarchical binomial modelHierarchical binomial modelRecall the hierarchical binomial modelindYi Bin(ni , θi ),indθi Be(α, β),p(α, β) (α β) 5/2and after marginalizing out the θiindp(α, β) (α β) 5/2 I(a 0)I(b 0)Yi Beta-binomial(ni , α, β),Thus the posterior is" n#Y B(α yi , β ni yi )p(α, β y) (α β) 5/2 I(a 0)I(b 0)B(α, β)i 1where B(·) is the beta function.We can perform exactly the same adapting procedure, but now using thisposterior as the target distribution.Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201930 / 32

Random-walk MetropolisHierarchical binomial modelBeta-binomial hyperparameter 3000beta20001000001000Jarad Niemi 000200030004000April 2, 201931 / 32

SummaryMetropolis-Hastings summaryThe Metropolis-Hastings algorithm, samples θ g(· θ(t) ) and setsθ(t 1) θ with probability equal to min{1, r} wherer q(θ y) g(θ(t) θ )q(θ(t) y) g(θ θ(t) )and otherwise sets θ(t 1) θ(t) .There are two common Metropolis-Hastings proposalsindependent proposal: g(θ θ(t) ) g(θ )random-walk proposal: g(θ θ(t) ) g(θ(t) θ )Independent proposals suffer from the same heavy-tail problems asrejection sampling proposals.Random-walk proposals require tuning of the random walk parameter.Jarad Niemi (STAT544@ISU)Metropolis-HastingsApril 2, 201932 / 32

Metropolis-Hastings algorithm Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2019 Jarad Niemi (STAT544@ISU) Metropolis-Hastings April 2, 2019 1/32. Outline Metropolis-Hastings algorithm Independence proposal Random-walk proposal Optimal tuning parameter Binomial example