BAYESIAN MODELING USING SAS/STAT - Wiilsu

Transcription

BAYESIAN MODELING USING SAS/STAT C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANAGENDASAS/STAT What is Bayesian Analysis? Options in SAS/STAT Example using Proc FMM (Zero-Inflated Poisson model) Examples using Proc MCMCC op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANWHAT IS BAYESIAN ANALYSIS?SAS/STAT Bayesian analysis is a field of statistics that is based on the notionof conditional probability. It can be viewed as the formalization of the process of incorporatingscientific knowledge using probabilistic tools. It provides uncertainty quantification of parameters by its conditionaldistribution in the light of available data.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANBAYES’ THEOREMSAS/STAT P( B A) P( A)P( A B) P( B) P(A) is the prior probability of event A. It is called the prior because it doesnot take into account any information about event B. P(B A) is the conditional probability of event B given event A. P(B) is the prior or marginal probability of event B. P(A B) is the conditional probability of event A given event B. It is called theposterior probability because it is derived from the specified value of eventB.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANBAYESIAN ANALYSISSAS/STAT The Bayesian approach to statistical inference treats parameters asrandom variables. It includes the incorporation of prior knowledge and its uncertainty inmaking inferences on unknown quantities (model parameters,missing data, and so on). It expresses the uncertainty concerning the parameter throughprobability statements and distributions.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANFREQUENTIST APPROACH TO STATISTICSSAS/STAT Classical methods consider the parameters to be fixed butunknown.They do not enable you to make probability statements aboutparameters because they are fixed.They are based on probabilities that are only for observations giventhe unknown parameters.They are judged by how they perform in an infinite number ofhypothetical repetitions of the experiments.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANCONFIDENCE INTERVALS – CLASSICAL APPROACHSAS/STAT A 95% confidence interval states that you are 95% confident thatrandom interval contains the true mean. In other words, if 100 different samples were drawn from the samepopulation and 100 intervals were calculated, approximately 95 of themwould contain the population mean.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANBAYESIAN APPROACH TO STATISTICSSAS/STAT Bayesian methods treat the unknown parameters as randomvariables.They enable you to make probability statements about parametersand observations.They interpret probabilities for parameters as “degree of belief” andcan be subjective.They use the rules of probability to revise “degree of beliefs” aboutthe parameters given the observed data.They base the inferences about the parameters on the probabilitydistribution for the parameter.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIAN95% CREDIBLE INTERVALSAS/STAT PosteriorDistribution0.025 quantile0.975 quantileParameter of InterestThere is a 95% chance that the parameter is inthe credible interval.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANSTEPS INVOLVED IN BAYESIAN INFERENCESAS/STAT 1.The probability distribution of the parameter, known as the priordistribution, is formulated.2.Given the observed data, you choose a statistical model thatdescribes the distribution of the data given the parameters.3.You update your beliefs about the parameter by combininginformation from the prior distribution and the data through thecalculation of the posterior distribution. This is carried out by usingBayes’ theorem; hence the term Bayesian analysis.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANTHE BAYES’ RULESAS/STAT posterior density of given xsampling density ofx given f ( x ) ( )p( x) m( x)marginal density of xC op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .prior densityfor

BAYESIANPRIOR DISTRIBUTIONSSAS/STAT You cannot carry out any Bayesian inference or perform any modelingwithout using a prior distribution.It is not necessarily specified beforehand because prior does not refer totime.It is not necessarily unique, as the prior distribution could be a combinationof prior distributions expressing a range of reasonable opinions.It is not necessarily completely specified, as it might be possible to haveunknown parameters in the prior, which are then estimated.It is not necessarily important, as it could have a negligible influence on theconclusions, especially when the sample size is large.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANADVANTAGES OF BAYESIAN ANALYSISSAS/STAT Bayesian analysis is useful when you have prior information, either expertopinion or historical knowledge, that you want to incorporate into theanalysis.It is useful if you want to communicate your findings in terms of probabilitynotions that can be more easily understood by non-statisticians.It provides inferences that are conditional on the data and are exact, withoutreliance on asymptotic approximation.It provides the full uncertainty of parameters via the posterior distribution incontrast to point estimates and standard errors only.The simulations make the computations tractable even for complexhierarchical models.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANDISADVANTAGES OF BAYESIAN ANALYSISSAS/STAT It does not tell you how to select a prior and there is no one correctway to choose a prior Bayesian inferences require skills to translate subjective priorbeliefs into a mathematically formulated prior. If you do not proceedwith caution, you can generate misleading results. It can produce posterior distributions that are heavily influenced bythe priors. It often comes with a high computational cost, especially in modelswith a large number of parameters.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

HOW DO WE IMPLEMENT IN SAS/STAT ?C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANBAYESIAN ANALYSIS IN SASSAS/STAT Bayesian methods in SAS 9.4 (& 9.3) are found in the followingprocedures: the FMM procedure, which fits finite mixture modelsthe GENMOD procedure, which fits generalized linear modelsthe PHREG procedure, which performs regression analysis of survival databased on the Cox proportional hazards modelthe LIFEREG procedure, which fits parametric models to survival datathe MCMC procedure, which is a general purpose Markov Chain MonteCarlo simulation procedure that is designed to fit Bayesian models.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANTWO AVENUES TO BAYESIAN COMPUTINGSAS/STAT Support for Bayesian analysis in four existing procedures GENMOD, LIFEREG, PHREG, and FMM You use options to specify prior distributions, generate posterior samples,and request convergence diagnostics and posterior summaries. MCMC procedure General-purpose simulation procedure You specify prior distributions and likelihood functions with programmingstatements. Experimental in SAS 9.2, production in SAS 9.22C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANFMM PROCEDURESAS/STAT PROC FMM fits statistical models to data where the distribution of theresponse is a finite mixture of univariate distributions.Performs maximum likelihood estimation for all modelsProvides Bayesian analysis for several models. Useful for applications such as estimating multimodal or heavy-tailed densities modeling over dispersed data.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANGENMOD PROCEDURESAS/STAT PROC GENMOD provides Bayesian analysis for generalized linear models. Sampling methods include adaptive rejection Metropolis sampling (ARMS),Gamerman sampling, and independent Metropolis sampling. When there isa normal distribution with a conjugate prior, Gibbs sampling is performed. Diagnostic tests include Gelman and Rubin, Geweke, Heidelberger andWelch, and Raftery and Lewis. Prior distributions for the regression coefficients include uniform, normal,and Jeffrey’s prior.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANPHREG PROCEDURESAS/STAT PROC PHREG provides Bayesian analysis for Cox regression models withtime-independent and time-dependent predictor variables andaccommodates all the methods handling ties. PROC PHREG also provides Bayesian analysis for piecewise exponentialmodels where you can divide the time axis into sections having its ownhazard rate. In SAS 9.4, Bayesian frailty models are supported and you can specify thegamma or lognormal distributions for the shared frailty. Sampling algorithms include ARMS, random walk Metropolis sampling, andGibbs sampling when there is conjugacy.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANLIFEREG PROCEDURESAS/STAT PROCLIFEREG provides Bayesian analysis forparametric location-scale survival models. SupportedC op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .prior distributions are normal and uniform.

BAYESIAN BAYESIAN ANALYSIS WITHSAS/STAT GENMOD, LIFEREG, PHREG AND FMM The BAYES statement requests Bayesian analysis. A set of standard prior distributions, posterior summary statistics, andconvergence diagnostics are provided. You can specify Adaptive rejection, Gamerman or Metropolissampling algorithms.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANSYNTAX FOR THE BAYES STATEMENTSAS/STAT BAYES options ;Options available in all BAYES statements:INITIAL NBI NMC OUTPOST SEED THINNING DIAGNOSTICS PLOTS SUMMARY COEFFPRIOR C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .initial values of the chainnumber of burn-in iterationsnumber of iterations after burn-inoutput data set for posterior samplesrandom number generator seedthinning of the Markov chainconvergence diagnosticsdiagnostic plotssummary statisticsprior for the regression coefficients

BAYESIANTHE MCMC PROCEDURESAS/STAT PROC MCMC is a general purpose simulation procedure that usesMarkov chain Monte Carlo (MCMC) techniques to fit a wide range ofBayesian models. It requires the specification of a likelihood function for the data and aprior distribution for the parameters. It enables you to analyze data that have any likelihood or priordistribution as long as they are programmable using SAS DATA stepfunctions.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANPROC MCMC STATEMENTSSAS/STAT You declare the parameters in the model and assign the startingvalues for the Markov chain with PARMS statements.You specify prior distributions for the parameters with PRIORstatements.You specify the likelihood function for the data with the MODELstatement.The model specification is similar to PROC NLIN and shares muchof the same syntax as PROC NLMIXED.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANPROC MCMC SYNTAXSAS/STAT PROC MCMC options;PARMS parameters and starting values;BEGINCNST;Programming Statements;ENDCNST;BEGINNODATA;Programming Statements;ENDNODATA;PRIOR parameter distribution;MODEL variable distribution;RANDOM random effects specification;PREDDIST 'label' OUTPRED SAS-data-set options ;RUN;C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANPARMS STATEMENTSAS/STAT The PARMS statement lists the names of the parameters and specifiesoptional initial values. PROC MCMC generates values for uninitialized parameters from thecorresponding prior distributions. If the initial values lead to an invalid prior or likelihood calculation, PROCMCMC prints an error message and stops. Every parameter in the PARMS statement must have a corresponding priordistribution in the PRIOR statement.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANMULTIPLE PARMS STATEMENTSSAS/STAT When multiple PARMS statements are used, each statement defines a block ofparameters. PROC MCMC updates parameters in each block sequentially, conditional on thecurrent values of other parameters in other blocks. Forming blocks of parameters has its advantages with regard to achieving goodmixing of the chains. One recommendation is to form small groups of correlated parameters that belong tothe same context in the formulation of the model. For example, regressioncoefficients are in one block and a scale parameter is in a separate block.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANPRIOR STATEMENTSAS/STAT The PRIOR statement is used to specify the prior distribution of the modelparameters. You must specify a single parameter or a list of parameters, a tilde, andthen a distribution with its parameters. Multiple PRIOR statements are allowed and you can have as manyhierarchical levels as desired. A HYPERPRIOR statement is also available to fit a multilevel hierarchicalmodel.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANSTANDARD DISTRIBUTIONSSAS/STAT tricinverse chi-square inverse lParetoPoissonscaled eneraldgeneralDirichletinverse WishartmultivariatenormalmultinomialC op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANMODEL STATEMENTSAS/STAT The MODEL statement is used to specify the conditional distributionof the data given the parameters (the likelihood function). You must specify a single dependent variable or a list of dependentvariables, a tilde, and a distribution with its arguments. The dependent variables can be either variables from the data setor functions of variables in the program. Multiple MODEL statements are allowed for defining models withmultiple independent components.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANSPECIFYING A NEW DISTRIBUTIONSAS/STAT The GENERAL and DGENERAL functions enable you to analyzedata that have any distribution function, as long as these functionsare programmable with SAS statements. The new distributions have to be specified on the logarithm scale(logarithm of the density must be specified). PROC MCMC does not verify that the GENERAL function that youspecify is a valid distribution, and you can easily construct prior andlog-likelihood functions that lead to improper posterior distributions.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANBEGINCNST/ENDCNST STATEMENTSSAS/STAT These statements define a block within which PROC MCMCprocesses the programming statements only during the setup stageof the simulation. You can use them to define constants or import data set variablesinto arrays, and to assign initial values to the parameters. Using these statements can reduce redundant processing.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANBEGINNODATA/ENDNODATA STATEMENTSSAS/STAT These statements define a block within which PROC MCMC executes theprogramming statements only twice: at the first and last observation of thedata set. These statements are best used to reduce unnecessary observation-levelcomputations. Any computations that are identical to every observation, such astransformation of parameters, should be enclosed in these statements. These statements should not contain data set variables.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANRANDOM EFFECTS MODELSSAS/STAT The RANDOM statement is similar to the one in the NLMIXED procedure.RANDOM random-effect distribution SUBJECT options;random-effectis either a univariate or an array of random effectsdistributioncan be beta, normal, binary, inverse gamma, gamma,Laplace, Poisson, multivariate normal withautoregressive structure, or general distribution.SUBJECT identifies the subjects in the model. The variable can benumeric or character, and does not need to be sorted.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANPREDDIST STATEMENTSAS/STAT The PREDDIST statement creates a new SAS data set that containsrandom samples from the posterior predictive distribution of the responsevariable. The posterior predictive distribution can often be used to check whether themodel is consistent with the data. The PREDDIST statement works only on response variables that havestandard distributions, and it does not support either the GENERAL orDGENERAL functions.C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

DEMONSTRATIONC op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

RESOURCESC op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

Classroom andLive Web TrainingBayesian AnalysisUsing SASC op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

BAYESIANSAS/STATFOR FURTHER INFORMATIONIntroduction to Bayesian Analysis l/en/statug/66859/HTML/default/viewer.htm#statug introbayes toc.htmThe Proc FMM example is documented tatug/66859/HTML/default/viewer.htm#statug fmm gettingstarted02.htmThe Proc MCMC examples all come from this eedings09/257-2009.pdfOur Bayesian Analysis Using SAS/STAT landing page (and links within) is really /index.htmlBayesian Analysis Using SAShttps://support.sas.com/edu/schedules.html?ctry us&id 2047C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .

Connect with me:LinkedIn: https://www.linkedin.com/in/melodierushTwitter: @Melodie RushQUESTIONS?Thank you for your time and attention!C op yr i g h t 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .sas.com

BAYESIAN MODELING USING SAS/STAT . In SAS 9.4, Bayesian frailty models are supported and you can specify the