Lecture 7 Time-dependent Covariates In Cox Regression

Transcription

Lecture 7Time-dependent Covariates in CoxRegressionSo far, we’ve been considering the following Cox PH model:λ(t Z) λ0(t) exp(β 0Z)X λ0(t) exp( βj Zj )where βj is the parameter for the the j-th covariate (Zj ).Important features of this model:(1) the baseline hazard depends on t, but not on the covariates Z1, ., Zp(2) the hazard ratio exp(β 0Z) depends on the covariatesZ1, ., Zp, but not on time t.But there are cases where if we measure some of the Zj ’s overtime, they may vary. Eg. a patient’s performance status,certain biomarkers, or –1

Example to motivate time-dependent covariatesStanford Heart transplant example:Variables: survival - time from program enrollment until death or censoring dead - indicator of death (1) or censoring (0) transpl - whether patient ever had transplant(1 if yes, 2 if no) surgery - previous heart surgery prior to program age - age at time of acceptance into program wait - time from acceptance into program until transplantsurgery ( . for those without transplant)Initially, a Cox PH model was fit for predicting survival time:λ(t Z) λ0 (t) exp(β1 transpl β2 surgery β3 age)However, this model does not take into consideration thatsome patients had shorter waiting time to get transplantsthan others. A model with a time dependent indicator ofwhether a patient had a transplant at each point in timemight be more appropriate.2

Cox model with time-dependent covariatesλ(t Z(t)) λ0(t) exp{β 0Z(t)}The hazard at time t depends (only) on the value of thecovariates at that time, i.e Z(t). The regression effect ofZ(·) is constant β over time.Some people do not call this model ‘proportional hazards’any more, because the hazard ratio exp{β 0Z(t)} varies overtime. But many of us still use the term ‘PH’ loosely here.Comparison with a single binary predictor (like heart transplant): A Cox PH model with time-independent covariate wouldcompare the survival distributions between those without a transplant (ever) to those with a transplant. Asubject’s transplant status at the end of the study woulddetermine which category they were put into for the entire study follow-up. A Cox model with time-dependent covariate would compare the risk of an event between transplant and nontransplant at each event time, but would re-evaluatewhich risk group each person belonged in based on whetherthey’d had a transplant by that time.3

Inference:We still use the partial likelihood to estimate βL(β) nY Pi 1 δexp{β 0Zi(Xi)} 0exp{βZ(X)}jij R(Xi )iNote that each term in the partial likelihood is still the conditional probability of choosing individual i to fail from therisk set, given the risk set at time Xi and given that onefailure is to occur.Inference then proceeds similarly to the Cox model withtime-independent covariates. The only difference is that thevalues of Z now changes at each risk set.Example:Suppose Z(t) is a time-varying covariate:Z(t)ID time event group t 3 4 5 6 7 8 91311064001 135111 1 125010 0 046110 0 0 077100 0 0 1 188000 0 0 0 0 058010 0 0 0 1 199100 0 0 1 1 1 110 10000 1 1 1 1 1 14

orderedfailure Individualstime (τj )at riskfailure IDPartialLikelihoodcontribution35679(Be sure to do this exercise in order to be convinced that theprocedure follows is valid for fitting the Z(t) model.)5

Results from fitting two modelsModel with time-independent Z(3):Testing Global Null Hypothesis: BETA 0Criterion-2 LOG .699.Model Chi-Square3.254 with 2 DF (p 0.1965)3.669 with 2 DF (p 0.1597)2.927 with 2 DF (p 0.2315)Analysis of Maximum Likelihood aldChi-SquarePr 91.756440.917880.18510.33805.0053.898GROUPZ2Model with time-dependent Z(t):Testing Global Null Hypothesis: BETA 0Criterion-2 LOG .226.Model Chi-Square2.727 with 2 DF (p 0.2558)2.725 with 2 DF (p 0.2560)2.271 with 2 DF (p 0.3212)Analysis of Maximum Likelihood ErrorWaldChi-SquarePr 02.210660.342490.13710.55846.2142.0266

Time-varying covariates in R (and most software)The original data on page 4 may be stored as (‘wide’ format):Table 1: A Toy Data ExampleSubject 45853Z2Time2181711910Status1011001010We first need to create a data set with start (or ‘time’) andstop (‘time2’) values of time (‘long’ format):id start stop status 8080090500959101003001031000z00100110100101Note that each different value of Z(t) for a subject generatesa row of pseudo data.7

The R command to fit the Cox model would then be:‘coxph( Surv( time start, time2 stop, status ) group z, data )’.Results:Alive Dead Deleted950coef exp(coef) se(coef)zp[1,] 1.8276.211.23 1.487 0.137[2,] 0.7062.031.21 0.585 0.558exp(coef) exp(-coef) lower .95 upper elihood ratio test 2.73Efficient score test 2.73on 2 df,on 2 df,p 0.256p 0.256Q: why is this approach valid?(hint: write down the likelihood)Note: this form of Surv() is also used to handle left truncateddata, where ‘time’ is the truncation (entry) time Q, and‘time2’ is the event time.Most other softwares handle time-dependent covariates similarly (Stata). SAS has multiple programming options (seeAllison book).8

ApplicationsThe Cox model where time-dependent covariates is used:I. When important covariates change during a study Framingham Heart study5209 subjects followed since 1948 to examine relationship between risk factors and cardiovascular disease. Aparticular example:Outcome: time to congestive heart failurePredictors: age, systolic blood pressure, # cigarettesper day Liver Cirrhosis (Andersen and Gill, p.528)Clinical trial comparing treatment to placebo for cirrhosis. The outcome of interest is time to death. Patientswere seen at the clinic after 3, 6 and 12 months, thenyearly.Fixed covariates: treatment, gender, age (at diagnosis)Time-varying covariates: alcohol consumption, nutritional status, bleeding, albumin, bilirubin, alkalinephosphatase and prothrombin.9

Recidivism study: (Allison ‘Survival Analysis UsingSAS’, p.42)432 male inmates were followed for one year after releasefrom prison, to evaluate risk of re-arrest as function offinancial aid (fin), age at release (age), race (race),full-time work experience prior to first arrest (wexp),marital status (mar), parole status (paro 1 if releasedwith parole, 0 otherwise), and number of prior convictions (prio). Data were also collected on employmentstatus over time during the year.Time-independent model:A time independent model might include the employment status of the individual at the beginning of thestudy (1 if employed, 0 if unemployed), or perhaps atany point during the year.Time-dependent model:However, employment status changes over time, and itmay be the more recent employment status that wouldaffect the hazard for re-arrest. For example, we mightwant to define a time-dependent covariate for each monthof the study that indicates whether the individual wasemployed during the past month.10

Recidivism Example:Hazard for arrest within one year of release from prison:Model without employment statusTesting Global Null Hypothesis: BETA 0Criterion-2 LOG 1317.496.Model Chi-Square33.266 with 7 DF (p 0.0001)33.529 with 7 DF (p 0.0001)32.113 with 7 DF (p 0.0001)Analysis of Maximum Likelihood 02200.30800.21220.38190.19580.0287WaldChi-SquarePr 840.9441.3690.8610.6480.9191.096What are the important predictors of recidivism?11

Recidivism Example: OutputModel WITH employment as time-dependent covariateAnalysis of Maximum Likelihood aldChi-SquarePr 890.265Is current employment important?Do the other covariates change much?Can you think of any problem with using currentemployment as a predictor?12

Another option for assessing impact of employmentAllison suggests using the employment status of the pastweek rather than the current week.The coefficient for employed changes from -1.33to -0.79, so the risk ratio is about 0.45 instead of0.27. It is still highly significant with χ2 13.1.Does this model improve the causal interpretation?Other options for time-dependent covariates: multiple lags of employment status (week-1, week-2, etc.) cumulative employment experience (proportion of weeksworked)13

II. For cross-over studies, to indicate change in treatment Stanford heart study (Cox and Oakes p.129)Between 1967 and 1980, 249 patients entered a programat Stanford University where they were registered to receive a heart transplant. Of these, 184 received transplants, 57 died while waiting, and 8 dropped out of theprogram for other reasons. Does getting a heart transplant improve survival? Here is a sample of the data:Waitingtransplant?survival -492.152.101151513513381172.111146571etc(survival is not indicated above for those without transplants, but was available in the dataset)Naive approach: Compare the total survival of transplanted and non-transplanted.Problem: Selection (length) Bias (why?). In causal inference, the treatment assignment is wrong at t for thosewho received transplant after t. See also Xu et al. (2012).14

RESULTS for Stanford Heart Transplant data:Naive model with fixed transplant indicator:Criterion-2 LOG l Chi-Square44.198 with 1 DF (p 0.0001)68.194 with 1 DF (p 0.0001)51.720 with 1 DF (p 0.0001)Analysis of Maximum Likelihood aldChi-SquarePr 0.135TSTATModel with time-dependent transplant indicator:Testing Global Null Hypothesis: BETA 0Criterion-2 LOG 1312.710.Model Chi-Square17.510 with 1 DF (p 0.0001)17.740 with 1 DF (p 0.0001)17.151 with 1 DF (p 0.0001)Analysis of Maximum Likelihood rrorWaldChi-SquarePr 0.381The second model took about twice as long to run as the first model, which is usuallythe case for models with time-dependent covariates.15

III. For testing the PH assumptionFor example, we can fit these two models:(1) Time independent covariate Z1λ(t, Z) λ0(t) exp(β1 Z1)The hazard ratio for Z1 is exp(β1).(2) Time dependent covariate Z1λ(t, Z) λ0(t) exp(β1 Z1 β2 Z1 t)The hazard ratio for Z1 is exp(β1 β2t).A test of the parameter β2 0 is a test of the PH assumption.Q: what are the pros and cons of such a test?(We will talk more about testing the PH assumption.)16

Illustration: Colon Cancer dataModel without time*stage interactionEvent and Censored 44Testing Global Null Hypothesis: BETA 0Criterion-2 LOG 1939.654.Model Chi-Square20.273 with 2 DF (p 0.0001)18.762 with 2 DF (p 0.0001)18.017 with 2 DF (p 0.0001)Analysis of Maximum Likelihood dardErrorWaldChi-SquarePr 390.0149217.984480.90280.00011.0170.49617

Model WITH time*stage interactionTesting Global Null Hypothesis: BETA 0Criterion-2 LOG 1902.374.Model Chi-Square57.553 with 3 DF (p 0.0001)35.960 with 3 DF (p 0.0001)19.319 with 3 DF (p 0.0002)Analysis of Maximum Likelihood teStandardErrorWaldChi-SquarePr 210.00011.0084.0640.000Notice the change in sign of stage effect alone?The time-varying effect of stage is: 1.4 8.32t, compared tothe fixed effect of β 0.7 from the first model.Ex: think about how to fit the above interaction model?Like in Cox and Oakes, we can run a few different models oncovariates by time interaction, other than the linear effect oftime.18

IV. For fitting non-PH modelsThe second model in the above is a non-proportional hazardsmodel.In general, a non-proportional hazards model can bewrittenλ(t Z) λ0(t) exp{β(t)0Z}so that the regression effect of Z changes with time.We can put different assumptions on β(t). We can model itas piecewise constant, linear (as in the previous example) orpiecewise linear, or piecewise cubic (spline), etc.Piecewise constant β(t): Depending on how we dividing the intervals, the piecewise constant model can approximate any shape of β(t). It is relatively easy to fit (see below). It has simple interpretations; eg. the hazard ratio is xxxfrom t1 to t2, etc. Without any other indications, we often take equal number of events per interval.19

When β(t) is piecewise constant, the non-PH model can bewritten as a Cox model with time-dependent covariates, asin the following.Suppose 0 t0 t1 t2 . tK , and β(t) β k on [tk 1 , tk ),i.e.,β(t) KXβ k I[tk 1 ,tk ) (t)k 1where I[tk 1 ,tk ) (·) is the indicator function for interval [tk 1 , tk ).Thenβ(t)0 Z { KXβ k I[tk 1 ,tk ) (t)}0 Zk 1KXβ 0k {I[tk 1 ,tk ) (t)Z}k 1KXβ 0k Zk (t)k 1where Zk (t) I[tk 1 ,tk ) (t)Z.One can show that fitting the above Z(t) using partial likelihood is in fact equivalent to: estimating β k using the survival data in the interval [tk 1, tk ), by excluding all thosedata points i such that Xi tk 1, and treating all those isuch that Xi tk as censored (i.e. set δi 0 for estimatingβ k ).Exercise: prove the above for K 3 using the partial likelihood. Can you make a connection here to left truncation,what do you learn?20

There are ways to search for an optimal change point of β(t);see O’Quigley and Pessione (1991).There are also ways to find multiple change points using atree-based approach, following which a piecewise constantβ(t) can be fitted; see Xu and Adak (2002).21

Some further notesIn practice, Z(t) may not be measured at each time point t.What do we do? use the most recent value (assumes step function) interpolate impute based on some model for the ‘missing’ mechanismTypes of time-varying covariates: internal covariates:variables that relate to the individuals, and can only bemeasured when an individual is alive, e.g. white bloodcell count, CD4 count external covariates:– variable which changes in a known way, e.g. age, doseof drug– variable that exists totally independently of all individuals, e.g. air temperatureThese concepts are relavent particularly when predicting survival (estimating S(t Z)). It is difficult to predict survivalbased on internal covariates. Often survival prediction isdone only based on time-independent covariates.22

Some cautionary notes Time-varying covariates must be carefully constructedto ensure interpretability. (What is the interpretation ofβ?) There is no point adding a time-varying covariate whosevalue changes the same as study time . you will getthe same answer as using a fixed covariate measured atstudy entry. For example, suppose we want to study theeffect of age on time to death.We could1. use age at start of the study as a fixed covariate2. age as a time varying covariateHowever, the results will be the same! Why?Technical assumption:Z(t) needs to be predictable (given the history up to t ) inorder to apply the martingale theory to the Cox model.23

Lecture 7 Time-dependent Covariates in Cox Regression So far, we've been considering the following Cox PH model: (tjZ) 0(t) exp( 0Z) 0(t)exp( X jZ j) where j is the parameter for the the j-th covariate (Z j). Important features of this model: