Step Up Your Statistical Practice With Today's SAS/STAT Software

Transcription

Step Up Your Statistical Practicewith Today’s SAS/STAT SoftwarePhil GibbsSAS Institute Technical SupportC o p y ri gh t S A S In st i t ut e In c . A l l ri gh ts reserv ed .

Are you over-relying on familiar procedures,and unaware of newer procedures that could benefit your work?Should you always use PROC REGfor building predictive models? PROC GENMOD for handling dropouts in longitudinal studies? PROC LIFETEST for analyzing interval-censored data? PROC MIXEDfor fitting linear mixed models?2

This presentation explains the advantages of newer tools infour of the many areas where SAS/STAT is expanding1. Regression model building2. Inferential analysis of generalized linear models3. Survival analysis4. Analysis of mixed models3

This is a high-level overview, which gives you the big picturewithout descending into detailsSAS users on balloon safari at Magaliesburg, South Africa, November 20154

Regression Model Building

Tech Support is often asked,“Can you add a CLASS statement to PROC REG?”Kathleen KiernanAnalytical Technical Support6

PROC GLMSELECT is now the flagship procedure for buildingstandard regression modelsDesigned for Selecting the “best” model when you are choosing from hundreds ofvariables—or even thousands Continuous or categorical predictors Explanatory models or predictive models7

PROC GLMSELECT provides many advantages for buildingregression models with large data Effect selection methods for general linear modelsPredictors can be main effects of continuous or classificationvariables, and interaction effects Lasso methods for sparse, more interpretable models Data partitioning to avoid overfittingUse PROC REG for fitting regression models when you needinferential methods, influence statistics, and diagnostics8

Model building procedures are available for a variety of goals and methods9

PROC HPREG is a high-performance regression modeling procedure10

PROC HPSPLIT builds classification and regression trees11

Models for means are not always adequate 12

Regression models for quantiles (percentiles) are useful when theconditional distribution of the response varies with covariates90th percentile50th percentile10th percentile13

PROC QUANTSELECT builds quantile regression models14

PROC HPLOGISTIC builds logistic regression models15

PROC HPGENSELECT builds generalized linear models16

How does the HPGENSELECT procedure compare with theGENMOD procedure?PROC HPGENSELECTPROC GENMODFits and builds modelsFits modelsLarge to massive dataModerate to large dataDesigned for predictive modelingDesigned for inferential analysis17

The GAMPL procedure fits generalized additive models18

Generalized additive models provide greater flexibility for describingcomplex, unknown dependency relationshipsApplications Analyzing claim rates for insured mortgages Environmental models with spatial effects Insurance ratemaking for geographic areas19

The ADAPTIVEREG procedure fits multivariate adaptive regression splines20

Inferential Analysis of Generalized Linear Models

Tech Support is often asked, “I have longitudinal data with dropouts.Can PROC GENMOD do the right GEE analysis?”Rob Agnelli and David Schlotzhauer, Analytical Technical Support22

The new GEE procedure implements a weighted GEE method thataccounts for dropouts that are missing at random (MAR)Standard GEEWeighted GEEProceduresGENMOD and GEEGEESpecificationsResponse modelCorrelationResponse modelCorrelationMissingness modelInference assumingMCARValid even if correlationis misspecifiedValid even if correlationis misspecifiedInference assumingMARNot generally validValid even if correlationis misspecified23

PROC GEE is just one new feature for analysis of generalized linear models24

PROC GENMOD has been enhanced, and PROC FMM has been added25

Survival Analysis

Tech Support is often asked, “Can I use PROC LIFETEST withtime-to-event data that are interval-censored?”Paul SavareseAnalytical Technical Support27

Specialized methods of handling interval-censored data are available inthe new ICLIFETEST and ICPHREG procedures PROC ICLIFETEST provides nonparametric methods of estimating survivalfunctions and statistical testing PROC ICPHREG fits proportional hazards regression modelsImputing midpoints and using the LIFETEST and PHREG procedures is lessefficient than applying specialized methods28

There are now six procedures for analyzing time-to-event data,each with a different FETESTSurvival functionNonparametricNoRightICLIFETESTSurvival metricYesRight, left, intervalPHREGHazard functionSemiparametricYesRightICPHREGHazard parametricYesRight29

Survival analysis capability for estimation and testing is growing30

Specialized methods of analyzing competing risks are available in theLIFETEST and PHREG procedures The cumulative incidence function (CIF) replaces the survival functionPROC LIFETEST estimates the CIF and provides Gray’s test The cause-specific hazard function (CSH) replaces the hazard functionPROC PHREG implements the Fine and Gray model,which extends the Cox model to the CSH setting31

Survival analysis capability for modeling is also growing32

Survival analysis capability for modeling is also growing33

Mixed Models

Tech Support is often asked,“How do I decide which mixed model procedure to use?”Jill TaoAnalytical Technical Support35

PROC MIXED is the flagship procedure for linear mixed models, providinggenerality for model estimation and postfit inference36

Use PROC HPMIXED when you need specialized computational methodsfor large, sparse mixed models37

Use PROC GLIMMIX if your response has a nonnormal distribution thatbelongs to the exponential family38

Use PROC NLMIXED to fit a random coefficients model in which thecoefficients enter nonlinearly, or to fit PK/PD models the list goes on39

Use PROC MCMC for a wide range of Bayesian models and for modelsthat the other procedures cannot handle40

Summary

Our flyover has pointed out many new features—now, it’s time to land and wrap up42

Newer tools give you greater flexibility for regression modeling BenefitMethodProceduresImproved predictiveability and interpretabilityof regression modelsData partitioningGLMSELECT, HPREG, HPSPLIT,QUANTSELECT, ADAPTIVEREG,HPLOGISTIC, HPGENSELECTGLMSELECT, QUANTSELECT,HPGENSELECTLasso methods andinformation criteriaRegression model building Categorical responsesfor a variety of responsetypes and for complexQuantile regressiondependence structuresRegression treesSpline effectsHPLOGISTIC, HPGENSELECT,GAMPL, ADAPTIVEREGQUANTSELECTHPSPLITGLMSELECT, GAMPL,ADAPTIVEREG43

specialized inference for complex data BenefitMethodProceduresInference for specialgeneralized linear modelsModels for overdispersionGENMOD, FMMExact methods for small samples GENMODWeighted GEE methods fordropouts in longitudinal studiesInference for special types Methods for interval-censoredof time-to-event datadataGEEICLIFETEST, ICPHREGAnalysis of competing risksLIFETEST, PHREGAnalysis of heterogeneous dataQUANTLIFE44

versatile Bayesian methods, and high-performance computingBenefitMethodProceduresAdvantages of Bayesianmethods, including modelversatility and highlyinterpretable resultsGeneralized linear modelsSurvival analysis modelsFinite mixture modelsMixed modelsGeneral Bayesian modelsGENMODLIFEREG, PHREG, MCMCFMMMCMCMCMCHigh-performancecomputing for large dataRegression model buildingHPREG, HPLOGISTIC,HPQUANTSELECT,HPGENSELECT, HPSPLITGAMPLHPSPLITHPMIXEDGeneralized additive modelsRegression treesLarge, sparse mixed models45

Learn more at http://support.sas.com/statisticsSign up fore-newsletterWatch shortvideosDownloadoverviewpapers

Step Up Your Statistical Practicewith Today’s SAS/STAT SoftwarePhil GibbsSAS InstituteC o p y ri gh t S A S In st i t ut e In c . A l l ri gh ts reserv ed .

PROC REG for building predictive models? . This presentation explains the advantages of newer tools in four of the many areas where SAS/STAT is expanding 1. Regression model building 2. Inferential analysis of generalized linear models . Procedure Focus Approach Modeling Censoring LIFETEST Survival function Nonparametric No Right