Introduction To Statistics In Pharmaceutical Clinical .

Transcription

10Confirmatory clinical trials:Analysis of categorical efficacy data10.1 Introduction: Regulatory views ofsubstantial evidenceWhen thinking about the use of statistics in clinical trials, the first thing that comes to mind formany people is the process of hypothesis testingand the associated use of p values. This is veryreasonable, because the role of a chance outcomeis of utmost importance in study design and theinterpretation of results from a study. A sponsor’sobjective is to develop an effective therapy thatcan be marketed to patients with a certain diseaseor condition. From a public health perspective,the benefits of a new treatment cannot be separated from the risks that are tied to it. Regulatoryagencies must protect public health by ensuringthat a new treatment has “definitively” beendemonstrated to have a beneficial effect. Themeaning of the word “definitively” as used hereis rather broad, but we discuss what it means inthis context – that is, we operationally define theterm “definitively” as it applies to study design,data analysis, and interpretation in new drugdevelopment.Most of this chapter is devoted to describingvarious types of data and the correspondinganalytical strategies that can be used to demonstrate that an investigational drug, or testtreatment, is efficacious. First, however, it isinformative to discuss the international standards for demonstrating efficacy of a new product,and examine how regulatory agencies haveinterpreted these guidelines. ICH Guidance E9(1998, p 4) addresses therapeutic confirmatorystudies and provides the following definition:A confirmatory trial is an adequately controlledtrial in which the hypotheses are stated inadvance and evaluated. As a rule, confirmatorytrials are necessary to provide firm evidence ofefficacy or safety. In such trials the key hypothesis of interest follows directly from the trial’sprimary objective, is always pre-defined, and isthe hypothesis that is subsequently tested whenthe trial is complete. In a confirmatory trial it isequally important to estimate with due precisionthe size of the effects attributable to the treatment of interest and to relate these effects totheir clinical significance.It is common practice to use earlier phasestudies such as therapeutic exploratory studiesto characterize the size of the treatment effect,while acknowledging that the effect size foundin these studies is associated with a certainamount of error. As noted earlier, confidenceintervals can be helpful for planning confirmatory studies. The knowledge and experiencegained in these earlier studies can lead tohypotheses that we wish to test (and hopefullyconfirm) in a therapeutic confirmatory trial, forexample, the mean reduction in systolic bloodpressure (SBP) for the test treatment is 20 mmHggreater than the mean reduction in SBP forplacebo. As we have seen, a positive result froma single earlier trial could be a type I error, so asecond study is useful in substantiating thatresult.The description of a confirmatory study inICH Guidance E9 (1998) also illustrates theimportance of the study design employed. Thestudy should be designed with several importantcharacteristics: It should test a specific hypothesis. It should be appropriately sized. It should be able to differentiate treatmenteffects from other sources of variation (for

128Chapter 10 Confirmatory clinical trials: Analysis of categorical efficacy dataexample, time trends, regression to the mean,bias). The size of the treatment effect that is beingconfirmed should be clinically relevant.The clinical relevance, or clinical significance, ofa treatment effect is an extremely importantconsideration. The size of a treatment effect thatis deemed clinically relevant is best defined bymedical, clinical, and regulatory specialists.Precise description of the study design andadherence to the study procedures detailed inthe study protocol are particularly important forconfirmatory studies. Quoting again from ICHGuidance E9 (1998, p 4):Confirmatory trials are intended to provide firmevidence in support of claims and hence adherence to protocols and standard operating procedures is particularly important; unavoidablechanges should be explained and documented,and their effect examined. A justification of thedesign of each such trial, and of other importantstatistical aspects such as the principal featuresof the planned analysis, should be set out inthe protocol. Each trial should address only alimited number of questions.Confirmatory studies should also providequantitative evidence that substantiates claimsin the product label (for example, the packageinsert) as they relate to an appropriate population of patients. In the following quote fromICH Guidance E9 (1998, p 4), the elements ofstatistical and clinical inference can be seen:Firm evidence in support of claims requires thatthe results of the confirmatory trials demonstratethat the investigational product under test hasclinical benefits. The confirmatory trials shouldtherefore be sufficient to answer each key clinicalquestion relevant to the efficacy or safety claimclearly and definitively. In addition, it is important that the basis for generalisation . . . to theintended patient population is understood andexplained; this may also influence the numberand type (e.g. specialist or general practitioner) ofcentres and/or trials needed. The results of theconfirmatory trial(s) should be robust. In somecircumstances the weight of evidence from asingle confirmatory trial may be sufficient.The terms “firm evidence” and “robust” donot have explicit definitions. However, as clinical trials have been conducted and reported inrecent years, some practical (operational) definitions have emerged, and these are discussedshortly.In its guidance document Providing ClinicalEvidence of Effectiveness for Human Drug andBiological Products, the US Food and DrugAdministration (US Department of Health andHuman Services, FDA, 1998) describes theintroduction of an effectiveness requirementaccording to a standard of “substantial evidence”in the Federal Food, Drug, and Cosmetic Act(the FDC Act) of 1962:Substantial evidence was defined in section 505(d)of the Act as “evidence consisting of adequateand well-controlled investigations, includingclinical investigations, by experts qualified byscientific training and experience to evaluate theeffectiveness of the drug involved, on the basisof which it could fairly and responsibly beconcluded by such experts that the drug willhave the effect it purports or is represented tohave under the conditions of use prescribed,recommended, or suggested in the labeling orproposed labeling thereof.”US Department of Health andHuman Services, FDA (1998, p 3)The phrase “adequate and well-controlled investigations” has typically been interpreted as atleast two studies that clearly demonstrated thatthe drug has the effect claimed by the sponsorsubmitting a marketing approval. Furthermore, atype I error of 0.05 has typically been adopted asa reasonable standard upon which data fromclinical studies are judged. That is, it was widelybelieved that the intent of the FDC Act of 1962was to state that a drug could be concluded to beeffective if the treatment effect was clinicallyrelevant and statistically significant at the a 0.05 level in two independent studies.The ICH Guidance E8 (1998, p 4) clarified thisissue:The usual requirement for more than oneadequate and well-controlled investigationreflects the need for independent substantiation

Introduction: Regulator y views of substantial evidence129of experimental results. A single clinical experimental finding of efficacy, unsupported by otherindependent evidence, has not usually beenconsidered adequate scientific support for aconclusion of effectiveness. The reasons for thisinclude the following:substantiation of experimental results addressessuch problems by providing consistency acrossmore than one study, thus greatly reducing thepossibility that a biased, chance, site-specific, orfraudulent result will lead to an erroneousconclusion that a drug is effective. Any clinical trial may be subject to unanticipated, undetected, systematic biases. Thesebiases may operate despite the best intentionsof sponsors and investigators, and may leadto flawed conclusions. In addition, someinvestigators may bring conscious biases toevaluations. The inherent variability in biological systemsmay produce a positive trial result by chancealone. This possibility is acknowledged, andquantified to some extent, in the statisticalevaluation of the result of a single efficacytrial. It should be noted, however, thathundreds of randomized clinical efficacy trialsare conducted each year with the intent ofsubmitting favorable results to the FDA. Evenif all drugs tested in such trials were ineffective, one would expect one in forty of thosetrials to “demonstrate” efficacy by chancealone at conventional levels of statisticalsignificance. It is probable, therefore, thatfalse positive findings (that is, the chanceappearance of efficacy with an ineffectivedrug) will occur and be submitted to FDA asevidence of effectiveness. Independentsubstantiation of a favorable result protectsagainst the possibility that a chance occurrence in a single study will lead to an erroneous conclusion that a treatment is effective. Results obtained in a single center may bedependent on site or investigator-specificfactors (for example, disease definition,concomitant treatment, diet). In such cases,the results, although correct, may not begeneralizable to the intended population.This possibility is the primary basis foremphasizing the need for independence insubstantiating studies. Rarely, favorable efficacy results are theproduct of scientific fraud.This guidance further clarified that the need forsubstantiation does not necessarily require twoor more identically designed trials:Although there are statistical, methodologic,and other safeguards to address the identifiedproblems, they are often inadequate to addressthese problems in a single trial. IndependentPrecise replication of a trial is only one of anumber of possible means of obtaining independent substantiation of a clinical finding and, attimes, can be less than optimal as it could leavethe conclusions vulnerable to any systematicbiases inherent to the particular study design.Results that are obtained from studies that areof different design and independent in execution, perhaps evaluating different populations,endpoints, or dosage forms, may providesupport for a conclusion of effectiveness that isas convincing as, or more convincing than, arepetition of the same study.ICH Guidance E8 (1998, p 5)Regulatory agencies have traditionally acceptedonly two-sided hypotheses because, theoretically, one could not rule out harm (as opposed tosimply no effect) associated with the test treatment. If the value of a test statistic (for example,the Z-test statistic) is in the critical region at theextreme left or extreme right of the distribution(that is, 1.96 or 1.96), the probability ofsuch an outcome by chance alone under the nullhypothesis of no difference is 0.05. However, theprobability of such an outcome in the directionindicative of a treatment benefit is half of 0.05,that is, 0.025. This led to a common statisticaldefinition of “firm” or “substantial” evidence asthe effect was unlikely to have occurred bychance alone, and it could therefore be attributed to the test treatment. Assuming that twostudies of the test treatment had two-sidedp values 0.05 with the direction of the treatment effect in favor of a benefit, the probabilityof the two results occurring by chance alonewould be 0.025 0.025, that is, 0.000625 (whichcan also be expressed as 1/1600).It is important to note here that this standardis not written into any regulation. Therefore,

130Chapter 10 Confirmatory clinical trials: Analysis of categorical efficacy datathere may be occasions where this statisticalstandard is not met. In fact, it is possible toredefine the statistical standard using one largewell-designed trial, an approach that has beendescribed by Fisher (1999).Whether the substantial evidence comes fromone or more than one trial, the basis forconcluding that the evidence is indeed substantial is statistical in nature. That is, the regulatoryagency must agree with the sponsor on severalkey points in order to approve a drug formarketing:help to provide compelling evidence that anobserved treatment effect cannot be explainedby other phenomena. Selection of the appropriateanalytical strategy maximizes the precision andefficiency of the statistical test employed. Theemployment of appropriate study design andanalytical strategies provides the opportunity foran investigational drug to be deemed effective ifa certain treatment effect is observed in clinicaltrials. The effect claimed cannot be explained byother phenomena such as regression to themean, time trends, or bias. This highlights theneed for appropriate study design and dataacquisition. The effect claimed is not likely a chanceoutcome. That is, the results associated with aprimary objective have a small p value, indicating a low probability of a type I error. The effect claimed is large enough to beimportant to patients, that is, clinically relevant. The magnitude of the effect mustaccount for sampling during the trial(s).10.2 Objectives of therapeuticconfirmatory trialsA clinical development program containsvarious studies that are designed to provide thequantity and quality of evidence required tosatisfy regulatory agencies, which have theconsiderable responsibility of protecting publichealth. The requirements for the demonstration of substantial evidence highlight theimportance of study design and analytic strategies. Appropriate study design features such asconcurrent controls, randomization, standardization of data collection, and treatment blindingTable 10.1Table 10.1 provides a general taxonomy of theobjectives of confirmatory trials and specificresearch questions corresponding to each.Confirmatory trials typically have one primaryobjective that varies by the type of trial. In thecase of a new antihypertensive it may be sufficient to demonstrate simply that the reductionin blood pressure is greater for the test treatmentthan for the placebo. A superiority trial is appropriate in this instance. However, in other therapeutic areas – for example, oncology – otherdesigns are appropriate. In these therapeuticareas it is not ethical to withhold life-extendingtherapies to certain individuals by randomizingthem to a placebo treatment if there is already anexisting treatment for the disease or condition.In such cases, it is appropriate to employ trialswith the objective of demonstrating that theclinical response to the test treatment is equivalent (that is, no better or worse) to that of anexisting effective therapy. These trials are calledTaxonomy of therapeutic confirmatory trial objectivesObjective of trialExample indicationExample research questionDemonstrate superiorityHypercholesterolemiaIs the magnitude of LDL reduction for the testtreatment greater than for placebo?Demonstrate equivalenceOncologyIs the test treatment at worst trivially inferior to and atbest slightly better than the active control withrespect to the rate of partial tumor response?Demonstrate noninferiorityAnti-infectiveIs the microbial eradication rate for the test treatmentat least not unacceptably worse than for the active control?

Moving from research questions to research objectivesequivalence trials. A question that arises hereis: Why would we want to develop another drugif there is already an existing effective treatment? The answer is that we believe the testtreatment offers other advantages (for example,convenience, tolerability, or cost) to justify itsdevelopment. Another type of trial is thenoninferiority trial. These trials are intendedonly to demonstrate that a test treatment is notunacceptably worse (noninferior) than an activecontrol. Again, the test treatment may provideadvantages other than greater therapeuticresponse such as fewer adverse effects or greaterconvenience.Equivalence and noninferiority trials are quitedifferent from superiority trials in their design,analysis, and interpretation (although exactlythe same methodological considerations applyto collect optimum quality data in these trials).Superiority trials continue to be our focus in thisbook, but it is important that you are aware ofother designs too. Therefore, in Chapter 12 wediscuss some of the unique features of theseother design types.10.3 Moving from research questions toresearch objectives: Identification ofendpointsThere is an important relationship betweenresearch questions and study objectives, and it isrelatively straightforward to restate researchquestions such as those in Table 10.1 in terms ofstudy objectives. As stated in ICH Guidance E9, aconfirmatory study should be designed toaddress at most a few objectives. If a treatmenteffect can be quantified by an appropriatestatistical measure, study objectives can be translated into statistical hypotheses. For example,the extent of low-density lipoprotein (LDL)cholesterol reduction can be measured by themean change from baseline to end-of-treatment,or by the proportion of study participants whoattain a goal level of LDL according to a treatment guideline. The efficacy of a cardiovascularintervention may be measured according to themedian survival time after treatment. For manydrugs, identification of an appropriate measure131of the participant-level response (for example,reported pain severity using a visual analog scale)is not difficult. However, there may be instanceswhen the use of a surrogate endpoint can bejustified on the basis of statistical, biological andpractical considerations. Measuring HIV viralload as a surrogate endpoint for occurrence ofAIDS is an example.Identification of the endpoint of interest isone of the many cases in clinical research thatinitially seem obvious and simple. We knowexactly what disease or condition we are interested in treating, and it should be easy to identify an endpoint that will tell us if we havebeen successful. In reality, the establishment ofan appropriate endpoint, whether it is themost clinically relevant endpoint or a surrogateendpoint, can be difficult. Some of the statisticalcriteria used to judge the acceptability of surrogate endpoints are described by Fleming andDeMets (1996), who caution against their use inconfirmatory trials. One might argue that themost clinically relevant endpoint for a antihypertensive is the survival time from myocardialinfarction, stroke, or death. Fortunately, theincidence of these events is relatively low duringthe typical observation period of clinical trials.The use of SBP as a surrogate endpoint enables theuse of shorter and smaller studies than would berequired if the true clinical endpoint had to beevaluated. For present purposes, we assume thesimplest scenario: The characteristic that we aregoing to measure (blood pressure) is uncontroversial and universally accepted, and a clinicallyrelevant benefit is acknowledged to be associatedwith a relative change in blood pressure for thetest treatment compared with the control.Common measures of the efficacy of a testtreatment compared with a placebo include thedifferences in means, in proportions, and insurvival distributions. How the treatment effectis measured and analyzed in a clinical trialshould be a prominent feature of the studyprotocol and should be agreed upon withregulatory authorities before the trial begins. Inthis chapter we describe between-group differences in general terms. It is acceptable tocalculate the difference in two quantities, A andB as “A minus B” or “B minus A” as long as theprocedure chosen is identified unambiguously.

132Chapter 10 Confirmatory clinical trials: Analysis of categorical efficacy data10.4 A brief review of hypothesistestingWe discussed hypothesis testing in some detailin Chapter 6. For present purposes, the role ofhypothesis testing in confirmatory clinical trialscan be restated simply as follows:Hypothesis testing provides an objective way tomake a decision to proceed as if the drug iseither effective or not effective based on thesample data, while also limiting the probabilityof making either decision in error.For a superiority trial the null hypothesis is thatthe treatment effect is zero. Sponsors of drugtrials would like to generate sufficient evidence,in the form of the test statistic, to reject the nullhypothesis in favor of the alternate hypothesis,thereby providing compelling evidence that thetreatment effect is not zero. The null hypothesismay be rejected if the treatment effect favorsthe test drug, and also if it favors the placebo(as discussed, we have to acknowledge thispossibility).The decision to reject the null hypothesisdepends on the value of the test statistic relativeto the distribution of its values under the nullhypothesis. Rejection of the null hypothesismeans one of two things:1. There really is a difference between the twotreatments, that is, the alternate hypothesis istrue.2. An unusually rare event has occurred, that is,a type I error has been committed, meaningthat we reject the null hypothesis given thatit is true.Regulatory authorities have many reasons to beconcerned about type I errors. As a review at theend of this chapter, the reader is encouraged tothink about the implications for a pharmaceutical company of committing a type I or II errorat the conclusion of a confirmatory efficacy study.The test statistic is dependent on the analysismethod, which is dependent on the studydesign; this, in turn, is dependent on a preciselystated research question. By now, you have seenus state this fundamental point several times,but it really cannot be emphasized enough. Inour experience, especially with unplanned dataanalyses, researchers can be so anxious to know“What’s the p value?” that they forget toconsider the possibility that the study thatgenerated the data was not adequatelydesigned to answer the specific questionof interest. The steps that lead toward optimally informed decision-making in confirmatory trials on the basis of hypothesis testing areas follows:1. State the research question.2. Formulate the research question in the formof null and alternate statistical hypotheses.3. Design the study to minimize bias, maximizeprecision, and limit the chance of committinga type I or II error. As part of the study design,prespecify the primary analysis method thatwill be used to test the hypothesis. Dependingon the nature of the data and the size of thestudy, consider whether a parametric ornonparametric approach is appropriate.4. Collect optimum-quality data using optimumquality experimental methodology.5. Carry out the primary statistical analysisusing the prespecified method.6. Report the results of the primary statisticalanalysis.7. Make a decision to proceed as if the drug iseither effective or ineffective:(a) If you decide that it is effective based onthe results of this study, you may chooseto move on to conduct the next study inyour clinical development plan, or, if thisis the final study in your developmentplan, to submit a dossier (for example,NDA [new drug application], MAA[marketing authorisation application]) toa regulatory agency.(b) If you decide that it is ineffective based onthe results of this study, you may chooseto refine the original research questionand conduct a new study, or to abandonthe development of this investigationalnew drug.

Hypothesis tests for two or more proportions10.5 Hypothesis tests for two or moreproportionsThe research question of interest in some studiescan be phrased: Does the test treatment result ina higher probability of attaining a desired statethan the control? Examples of such applicationsinclude: survival after 1 year following a cardiovascularintervention avoiding hospitalization associated withasthmatic exacerbations over the course of6 months attaining a specific targeted level of LDLaccording to one’s background risk.In a confirmatory trial of an antihypertensive,for example, a sponsor might like to know ifthe test treatment results in a higher proportion of hypertensive individuals (which can beinterpreted as a probability) reaching an SBP 140 mmHg.10.5.1 Hypothesis test for twoproportions: The Z approximationIn the case of a hypothesis test for two proportions the null and alternate statisticalhypotheses can be stated as follows:H0: p1 p2 0HA: p1 p2 苷 0where the population proportions for each oftwo independent groups are represented by p1and p2.The sample proportions will be used to estimate the population proportions and, as inChapter 8, are defined as:number of observations in group 1 with the event of interestp̂1 ––––––total number of observations in group 1 at risk of the eventandnumber of observations in group 2 with the event of interestp̂2 ––––––.total number of observations in group 2 at risk of the event133The estimator for the difference in the twosample proportions is p̂1 p̂2 and the standarderror of the difference p̂1 p̂2 is:p̂ 1q̂ 1p̂ 2qˆ 2SE(p̂ 1 p̂ 2) –––––– ––––––,n1n2冪where q̂1 1 p̂1 and q̂2 1 p̂2. The test statisticfor the test of two proportions is equal to:(p̂ 1 p̂ 2)Z –––––––––––.SE(p̂ 1 p̂ 2)Use of a correction factor may be useful as well,especially with smaller sample sizes. A teststatistic that makes use of the correction factor is:()1 11 p̂ 1 p̂ 2 –– –– ––2 n1 n2Z �–––––.SE(p̂ 1 p̂ 2)For large samples (that is, when p̂1n1 5 andp̂2n2 5), these test statistics follow a standardnormal distribution under the null hypothesis.Values of the test statistic that are far away fromzero would contradict the null hypothesis andlead to rejection. In particular, for a two-sidedtest of size a, the critical region (that is, thosevalues of the test statistic that would lead torejection of the null hypothesis) is defined byF Fa/2 or F F1 a/2. If the calculated value ofthe test statistic is in the critical region, thenull hypothesis is rejected in favor of the alternate hypothesis. If the calculated value of thetest statistic is outside the critical region, thenull hypothesis is not rejected.As an illustration of this hypothesis test,consider the following hypothetical data from aconfirmatory study of a new antihypertensive.In a randomized, double-blind, 12-week study,the test treatment was compared with placebo.The primary endpoint of the study was theproportion of participants who attained an SBPgoal 140 mmHg. Of 146 participants assignedto placebo, 34 attained an SBP 140 mmHg atweek 12. Of 154 assigned to test treatment, 82attained the goal. Let us look at how these resultscan help us to make a decision based on the

134Chapter 10 Confirmatory clinical trials: Analysis of categorical efficacy datainformation provided. We go through the stepsneeded to do this.The research questionIs the test treatment associated with a higher rateof achieving target SBP?Study designAs noted, the study is a randomized, doubleblind, placebo-controlled, 12-week study of aninvestigational antihypertensive drug.DataThe data from this study are in the form ofcounts. We have a count of the number of participants in each treatment group, and, for both ofthese groups, we have a count of the numberof participants who experienced the event ofinterest. As the research question pertains to aprobability, or risk, we use the count data toestimate the probability of a proportion ofparticipants attaining the goal SBP.Hypotheses and statistical analysisThe null and alternate statistical hypotheses inthis case can be stated as:H0: pTEST pPLACEBO 0HA: pTEST pPLACEBO 苷 0where the population proportions for eachgroup are represented by pTEST and pPLACEBO. Asthe response is attaining a lower SBP, the groupwith the greater proportion of responses will beregarded as the treatment with a more favorableresponse. The difference in proportions is calculated as “test minus placebo.” Positive values ofthe test statistic will favor the test treatment.As the samples are large according to the definition given earlier, the test of the two proportions using the Z approximation is appropriate.For a two-sided test of size 0.05 the critical regionis defined by Z 1.96 or Z 1.96. The valueof the test statistic is calculated as:p̂ TEST p̂ PLACEBOZ �–.SE(p̂ TEST p̂ PLACEBO )The differencecalculated as:insampleproportionsis8234p̂ TEST p̂ PLACEBO –––– –––– 0.5325 0.2329 0.2996.154146The standard error of the difference in sampleproportions is calculated as:SE(p̂ TEST p̂ PLACEBO ) 冪(0.5325)(0.4675) ��––– ––––––––––––––– 0.0533.154146Using these calculated values, the value of thetest statistic is:0.2996Z ––––––– 5.62.0.0533The test statistic using a correction factor isobtained as:()1 110.2996 – –––– ––––2 154146Z �––––––– 5.50.0.0533Interpretation and decision-makingAs the value of test statistic – that is, 5.62 – is inthe critical region (5.62 1.96), the null hypothesis is rejected in favor of the alternate hypothesis. Note that the value of the test statistic usingthe correction factor was also in the criticalregion. The probability of attaining the SBP goalis greater for those receiving the test treatmentthan for those receiving placebo.It is fairly common to report a p value fromsuch an analysis. As we have seen, the p value isthe probability (under the null hypothesis) ofobserving the result obtained or one that is moreextreme. In this analytical strategy we refer to atable of Z scores and the tail areas associatedwith each to find the sum of the two areas (

When thinking about the use of statistics in clin-ical trials, the first thing that comes to mind for many people is the process of hypothesis testing and the associated use of p values. This is very reasonable, because the role of a chance outcome is of utmost importance in study design an