Transcription
Step Up Your Statistical Practicewith Today’s SAS/STAT SoftwarePhil GibbsSAS Institute Technical SupportC o p y ri gh t S A S In st i t ut e In c . A l l ri gh ts reserv ed .
Are you over-relying on familiar procedures,and unaware of newer procedures that could benefit your work?Should you always use PROC REGfor building predictive models? PROC GENMOD for handling dropouts in longitudinal studies? PROC LIFETEST for analyzing interval-censored data? PROC MIXEDfor fitting linear mixed models?2
This presentation explains the advantages of newer tools infour of the many areas where SAS/STAT is expanding1. Regression model building2. Inferential analysis of generalized linear models3. Survival analysis4. Analysis of mixed models3
This is a high-level overview, which gives you the big picturewithout descending into detailsSAS users on balloon safari at Magaliesburg, South Africa, November 20154
Regression Model Building
Tech Support is often asked,“Can you add a CLASS statement to PROC REG?”Kathleen KiernanAnalytical Technical Support6
PROC GLMSELECT is now the flagship procedure for buildingstandard regression modelsDesigned for Selecting the “best” model when you are choosing from hundreds ofvariables—or even thousands Continuous or categorical predictors Explanatory models or predictive models7
PROC GLMSELECT provides many advantages for buildingregression models with large data Effect selection methods for general linear modelsPredictors can be main effects of continuous or classificationvariables, and interaction effects Lasso methods for sparse, more interpretable models Data partitioning to avoid overfittingUse PROC REG for fitting regression models when you needinferential methods, influence statistics, and diagnostics8
Model building procedures are available for a variety of goals and methods9
PROC HPREG is a high-performance regression modeling procedure10
PROC HPSPLIT builds classification and regression trees11
Models for means are not always adequate 12
Regression models for quantiles (percentiles) are useful when theconditional distribution of the response varies with covariates90th percentile50th percentile10th percentile13
PROC QUANTSELECT builds quantile regression models14
PROC HPLOGISTIC builds logistic regression models15
PROC HPGENSELECT builds generalized linear models16
How does the HPGENSELECT procedure compare with theGENMOD procedure?PROC HPGENSELECTPROC GENMODFits and builds modelsFits modelsLarge to massive dataModerate to large dataDesigned for predictive modelingDesigned for inferential analysis17
The GAMPL procedure fits generalized additive models18
Generalized additive models provide greater flexibility for describingcomplex, unknown dependency relationshipsApplications Analyzing claim rates for insured mortgages Environmental models with spatial effects Insurance ratemaking for geographic areas19
The ADAPTIVEREG procedure fits multivariate adaptive regression splines20
Inferential Analysis of Generalized Linear Models
Tech Support is often asked, “I have longitudinal data with dropouts.Can PROC GENMOD do the right GEE analysis?”Rob Agnelli and David Schlotzhauer, Analytical Technical Support22
The new GEE procedure implements a weighted GEE method thataccounts for dropouts that are missing at random (MAR)Standard GEEWeighted GEEProceduresGENMOD and GEEGEESpecificationsResponse modelCorrelationResponse modelCorrelationMissingness modelInference assumingMCARValid even if correlationis misspecifiedValid even if correlationis misspecifiedInference assumingMARNot generally validValid even if correlationis misspecified23
PROC GEE is just one new feature for analysis of generalized linear models24
PROC GENMOD has been enhanced, and PROC FMM has been added25
Survival Analysis
Tech Support is often asked, “Can I use PROC LIFETEST withtime-to-event data that are interval-censored?”Paul SavareseAnalytical Technical Support27
Specialized methods of handling interval-censored data are available inthe new ICLIFETEST and ICPHREG procedures PROC ICLIFETEST provides nonparametric methods of estimating survivalfunctions and statistical testing PROC ICPHREG fits proportional hazards regression modelsImputing midpoints and using the LIFETEST and PHREG procedures is lessefficient than applying specialized methods28
There are now six procedures for analyzing time-to-event data,each with a different FETESTSurvival functionNonparametricNoRightICLIFETESTSurvival metricYesRight, left, intervalPHREGHazard functionSemiparametricYesRightICPHREGHazard parametricYesRight29
Survival analysis capability for estimation and testing is growing30
Specialized methods of analyzing competing risks are available in theLIFETEST and PHREG procedures The cumulative incidence function (CIF) replaces the survival functionPROC LIFETEST estimates the CIF and provides Gray’s test The cause-specific hazard function (CSH) replaces the hazard functionPROC PHREG implements the Fine and Gray model,which extends the Cox model to the CSH setting31
Survival analysis capability for modeling is also growing32
Survival analysis capability for modeling is also growing33
Mixed Models
Tech Support is often asked,“How do I decide which mixed model procedure to use?”Jill TaoAnalytical Technical Support35
PROC MIXED is the flagship procedure for linear mixed models, providinggenerality for model estimation and postfit inference36
Use PROC HPMIXED when you need specialized computational methodsfor large, sparse mixed models37
Use PROC GLIMMIX if your response has a nonnormal distribution thatbelongs to the exponential family38
Use PROC NLMIXED to fit a random coefficients model in which thecoefficients enter nonlinearly, or to fit PK/PD models the list goes on39
Use PROC MCMC for a wide range of Bayesian models and for modelsthat the other procedures cannot handle40
Summary
Our flyover has pointed out many new features—now, it’s time to land and wrap up42
Newer tools give you greater flexibility for regression modeling BenefitMethodProceduresImproved predictiveability and interpretabilityof regression modelsData partitioningGLMSELECT, HPREG, HPSPLIT,QUANTSELECT, ADAPTIVEREG,HPLOGISTIC, HPGENSELECTGLMSELECT, QUANTSELECT,HPGENSELECTLasso methods andinformation criteriaRegression model building Categorical responsesfor a variety of responsetypes and for complexQuantile regressiondependence structuresRegression treesSpline effectsHPLOGISTIC, HPGENSELECT,GAMPL, ADAPTIVEREGQUANTSELECTHPSPLITGLMSELECT, GAMPL,ADAPTIVEREG43
specialized inference for complex data BenefitMethodProceduresInference for specialgeneralized linear modelsModels for overdispersionGENMOD, FMMExact methods for small samples GENMODWeighted GEE methods fordropouts in longitudinal studiesInference for special types Methods for interval-censoredof time-to-event datadataGEEICLIFETEST, ICPHREGAnalysis of competing risksLIFETEST, PHREGAnalysis of heterogeneous dataQUANTLIFE44
versatile Bayesian methods, and high-performance computingBenefitMethodProceduresAdvantages of Bayesianmethods, including modelversatility and highlyinterpretable resultsGeneralized linear modelsSurvival analysis modelsFinite mixture modelsMixed modelsGeneral Bayesian modelsGENMODLIFEREG, PHREG, MCMCFMMMCMCMCMCHigh-performancecomputing for large dataRegression model buildingHPREG, HPLOGISTIC,HPQUANTSELECT,HPGENSELECT, HPSPLITGAMPLHPSPLITHPMIXEDGeneralized additive modelsRegression treesLarge, sparse mixed models45
Learn more at http://support.sas.com/statisticsSign up fore-newsletterWatch shortvideosDownloadoverviewpapers
Step Up Your Statistical Practicewith Today’s SAS/STAT SoftwarePhil GibbsSAS InstituteC o p y ri gh t S A S In st i t ut e In c . A l l ri gh ts reserv ed .
PROC REG for building predictive models? . This presentation explains the advantages of newer tools in four of the many areas where SAS/STAT is expanding 1. Regression model building 2. Inferential analysis of generalized linear models . Procedure Focus Approach Modeling Censoring LIFETEST Survival function Nonparametric No Right