Study Design Guide - NSW Health

Transcription

Evidence and Evaluation Guidance SeriesPopulation and Public Health DivisionStudy Design for EvaluatingPopulation Health and HealthService Interventions:A Guide

CENTRE FOR EPIDEMIOLOGY AND EVIDENCENSW Ministry of HealthLocked Mail Bag 961North Sydney NSW 2059Copyright NSW Ministry of Health 2019This work is copyright. It may be reproduced in whole or in partfor study and training purposes subject to the inclusion of anacknowledgement of the source. It may not be reproduced forcommercial usage or sale. Reproduction for purposes other than thoseindicated above requires written permission from the NSW Ministry ofHealth.SHPN (CEE) 190320ISBN 978-1-76081-186-0Contributors to the development of this guide:NSW Ministry of HealthAlexandra SchiavuzziAmanda JayakodyAshleigh ArmanascoAaron CashmoreAndrew MilatPrevention Research Collaboration, University of SydneyAdrian BaumanSuggested citation:Centre for Epidemiology and Evidence. Study Design for EvaluatingPopulation Health and Health Service Interventions: A Guide. Evidenceand Evaluation Guidance Series, Population and Public Health Division.Sydney: NSW Ministry of Health; 2019.

Contents1. Executive summary22. Introduction33. Planning an evaluation53.1 Program logic53.2 Good practice principles for evaluation53.3 Generating evaluation questions63.4 When to use quantitative methods73.5 A note on qualitative methods73.6 Pragmatic considerations84. Study designs94.1 Experimental designs94.1.1 Randomised controlled trials4.1.2 Cluster randomised controlled trials124.1.3 Stepped-wedge and multiple baseline designs154.2 Quasi-experimental designs9204.2.1 Controlled before and after designs204.2.2 Interrupted time series design214.3 Non-experimental designs234.3.1 Retrospective and prospective cohort studies234.3.2 Repeat cross-sectional studies264.3.3 Single group pre-post test and post-program only275. Key resources and further reading296. Key definitions307. References31Study Design for Evaluating Population Health and Health Service Interventions: A Guide 1

1. Executive summaryAppropriate selection of study designsis vital to the production of high qualityevaluationsChoosing an appropriate study design and executing it wellcan produce credible evidence of intervention or programeffectiveness. The purpose of this guide is to assist NSWHealth staff in the planning and design of evaluations. Thisguide includes information to assist with the selection ofquantitative study designs. It includes consideration of thequality and credibility of different designs, as well as pragmaticconsiderations for conducting research in healthcare andpopulation settings.A well-planned study design is a criticalfirst step in evaluationPlanning should include developing a program logic model toassist in determining the mechanism by which the interventioncontributes to change, providing a structure upon whichto develop an evaluation. Planning should also involveconsidering good practice principles (e.g. ethical conductand stakeholder involvement) and generating evaluationquestions which unpack the key issues you hope to addressin your evaluation. Pragmatic considerations include the stageof implementation of the intervention, the setting, feasibility,acceptability and integrity of the study design, and availabilityof resources for the evaluation in relation to costs, time andsample size required.Experimental designs provide thestrongest evidence of causalityThere are many different study designs which may be usedto answer different questions, depending on the stageof implementation of the intervention and evaluation.Experimental designs offer the most rigorous way ofdetermining whether a cause-effect relationship existsbetween an intervention and an outcome. They includerandomised controlled trials (RCTs), cluster randomisedcontrolled trials, and stepped-wedge and multiple baselinedesigns. These are the preferred designs for health serviceinterventions but may not be a practical approach forevaluating population-based programs.Quasi-experimental and non-experimentaldesigns offer a pragmatic alternativeQuasi-experimental designs offer a compromise betweenresearch integrity and pragmatic considerations, particularlywhen it is not possible to randomise individuals to interventionor control groups. These designs include controlled before andafter and interrupted time series designs. Quasi-experimentaldesigns attempt to demonstrate causality between anintervention and an outcome when random assignment hasnot occurred. Non-experimental studies may be used whenit is not feasible or ethical to conduct a true experiment, forexample when an intervention is already underway or whereharm would be caused by withholding an intervention. Thesestudies include cohort, repeat cross-sectional and single grouppre-post test and post-program designs. Alone, they cannot beused to demonstrate cause and effect.2 Study Design for Evaluating Population Health and Health Service Interventions: A Guide

2. IntroductionNSW Health is committed to the evaluation ofpopulation health and health service interventionsin order to develop evidence-based policies andprograms. This guide will support NSW Health staffin the planning of evaluations of interventionsusing appropriate study designs.Study design (also referred to as research design) refers to thedifferent study types used in research and evaluation. In thecontext of an impact/outcome evaluation, study design is theapproach used to systematically investigate the effects of anintervention or a program.Study designs may be experimental, quasi-experimental ornon-experimental. Study design is distinct from the studymethods (e.g. structured interviews) and the study instruments(e.g. structured interview questions). The primary purposeof a good study design is to enable us to be as confidentas possible that any observed changes were caused bythe intervention, rather than by chance or other unknownfactors.1,2 The design should match the scale of the programand the significance of the evaluation, and be as rigorous aspossible while meeting pragmatic needs of the real-worldcontext of health interventions.1 This guide focuses onquantitative study designs used in impact/outcome evaluationsand is relevant to simple, complicated and complex populationhealth interventions and health service interventions.Evaluation is the systematic and objectiveprocess used to make judgements about the meritor worth of a program, usually in relation to itseffectiveness, efficiency and appropriateness.1Comprehensive program evaluations shouldintegrate process, impact, outcome and economicevaluation, with all components planned at thesame time as the development of the intervention.For further information, refer to CommissioningEvaluation Services: A Guide. Process evaluationsassess how a program/intervention is progressing interms of how it is delivered and to whom, whetherthe program is being implemented as planned andthe level of program quality. Impact or outcomeevaluation measures the immediate and long termeffects, or unintended effects, of a program asdefined in the program logic, and is the primaryfocus of this guide.2,3 Economic evaluations area comparative analysis of the cost-effectivenessor cost-benefit of a program. Please refer toCommissioning Economic Evaluations: A Guide.The guide begins with an overview of initial steps neededin choosing an appropriate study design and pragmaticconsiderations for evaluation of population health and healthservice interventions. The second section outlines the strengthsand weaknesses of quantitative study designs, namelyexperimental, quasi-experimental and non-experimentaldesigns. It describes when it may be appropriate to use eachdesign in practice, and presents case studies of evaluations ofpopulation health and health service interventions.This guide is primarily intended to be used by NSW Healthstaff who plan and implement, commission, provide strategicoversight of, or use results of evaluations of population healthand health service interventions. It has been developed for anon-specialist audience with key definitions and a resourceand further reading list provided.Study Design for Evaluating Population Health and Health Service Interventions: A Guide 3

What makes an intervention simple, complicated or complex?Evaluations of population health and health serviceinterventions are rarely straightforward. There isincreasing recognition that interventions in populationhealth and health services need to address factorsoperating at the individual, social, and system levelswhich require multifactorial approaches to effectivelytarget the complexity of health and health behaviour.4For example, an intervention promoting healthy eatingin a defined region may include a mass media campaign,nutrition education in schools, and a program tointroduce healthy eating choices in workplaces. A studydesign to evaluate such an intervention must be able toaccommodate its complexity and scope.2Even so, some health interventions can still be definedas simple, in the sense that there is a single interventionbeing tested in a small group of individuals (e.g. aneducational booklet to increase health literacy amongteenagers with diabetes). There is a simple linearpathway linking the intervention to its outcome.Complicated interventions may have a number ofinterrelated components targeting individuals orgroups and there is a high degree of certainty that theintervention can be repeated. Complex interventions,on the other hand, are usually defined as interventionsthat contain several interacting components, targetingmultiple problems or designed to influence a range ofgroups, with a degree of flexibility or tailoring of theintervention permitted.2,5 This guide broadly applies toall these types of interventions but, given the populationand health services context, consideration is mostly givento both complicated and complex interventions. For amore in-depth guide to complex program evaluationplease refer to Evaluation in a Nutshell by Bauman andNutbeam.24 Study Design for Evaluating Population Health and Health Service Interventions: A Guide

3. Planning an evaluationA well-planned and conducted study design iscritical to the overall credibility and utility ofthe evaluation.1 Before choosing a study design,however, it is important to consider a number ofkey steps.3.1 Program logicAn important first stage in planning any evaluation isdeveloping or reviewing a program logic model, even if theprogram is already being delivered. A program logic modelis a schematic representation that describes how a programis intended to work by linking activities with outputs,intermediate impacts and longer-term outcomes. This willassist in determining the mechanism by which the interventioncauses change, providing a structure upon which to developan evaluation, and enabling potential identification andstrengthening of causal links. For further information ondeveloping program logic models, please see Developing andUsing Program Logic: A Guide and/or view this short animationon program logic.3.2 Good practice principles for evaluationGood practice principles should be incorporated into theplanning and conduct of high quality evaluations, including: Timeliness – Evaluation planning should ideally beconducted during the program planning phase. Particularconsideration should be given to the realistic amount oftime needed to conduct an evaluation to ensure findingswill be available when needed to support decisionmaking. Appropriateness – The scope of the evaluationshould be realistic and appropriate with respect to thesize, stage and characteristics of the program beingevaluated, the evaluation budget and availability of data. Stakeholder involvement – The participation ofstakeholders in the planning, conduct and interpretationof findings of program evaluations will increase thelikelihood of the evaluation influencing policy andpractice. Effective governance – An advisory group with clearroles and responsibilities should be established to guideand inform the evaluation process. Methodological rigour – Evaluations should useappropriate study designs and methods, and draw onrelevant instruments and data that are valid and reliable,ensuring the design is appropriate to the purpose andscope of the evaluation. Consideration of specific populations – Considerationof the health context and the needs of differentpopulation groups, such as Aboriginal populations,is essential. Engagement with identified specificpopulations is important throughout the duration of theevaluation. Ethical conduct – Evaluations should be conducted inan ethical manner that considers legislative obligations,particularly the privacy of participants, and costs andbenefits to the individuals or population involved.These principles are described in more detail in CommissioningEvaluation Services: A Guide.Study Design for Evaluating Population Health and Health Service Interventions: A Guide 5

3.3 Generating evaluation questionsThe purpose of the evaluation, and what questions it isintended to answer, will help determine the design of anevaluation.6 These questions are not survey or interviewquestions but high level evaluation questions.1 The evaluationquestions need to unpack the key issues that you hopeto address in your evaluation. Specifically, the evaluationquestions need to include the population in which you wish toobserve the change, including a clear control or comparisongroup if required, and the type of change you expect to see.Indicators which will help answer your evaluation questionsmust then be selected to ensure they are specific to theevaluation you are conducting, and measurable. This mayrequire an assessment of the data available (e.g. programmonitoring data, administrative data) or to generate the datarequired to answer your evaluation questions (e.g. directmeasurement, interview and questionnaires). These aspectsof the research must be well established before moving onto the selection of a study design. It is important to keep inmind that more complex interventions may require multipleevaluation questions to be formulated and may need to bechanged to suit the practical realities of the situation.2 Forfurther information on generating evaluation questions, referto Commissioning Evaluation Services: A Guide and the SaxInstitute’s Translational Research Framework.Table 1. Example outcome evaluation questionsExample questionExample indicatorHave smoking ratesdecreased in programparticipants?Proportion of daily smokingfrom January 2018 to July2018 among programparticipants.To what extent can increasedphysical activity be attributedto the intervention?Mean number of sessionsof moderate physicalactivity among interventionparticipants compared tocontrol group.6 Study Design for Evaluating Population Health and Health Service Interventions: A Guide

3.4 When to use quantitative methods3.5 A note on qualitative methodsIn choosing a study design for an evaluation, it is importantto understand when you will need quantitative methods.Quantitative methods are used in evaluation for a number ofreasons:2Using qualitative methods in your evaluation will dependon your research questions. Other than being used for theformative or developmental stages of an evaluation or forunderstanding implementation processes (process evaluation),qualitative methods are commonly combined with quantitativemethods in a mixed methods evaluation. A mixed methodsevaluation allows for the triangulation of both quantitativeand qualitative findings, which can strengthen confidence inthe findings and provide a deeper understanding of uniquecontexts.7,8 Mixed methods are particularly important whenprograms are more complex and require a multi-prongedevaluation.2 The more consistent the evidence, the morereasonable it is to assume that the program has produced theobserved results.2 Qualitative approaches use methods suchas focus groups, in-depth interviews or observation to analyseand explore complexity, meaning, relationships and patterns.2High quality qualitative research involves rigorous methodssuch as a clear research question, data collection, dataanalysis and interpretation.2 For further information on howrigour can be established in qualitative methods please referto Qualitative Research Methods by Liamputtong. Althoughqualitative methods are important for evaluation, this guidefocuses on quantitative study designs. For further informationon how to use qualitative and mixed methods in yourevaluation refer to the NSW Government Evaluation Toolkit. differences or change in an impact or outcome needto be quantified validated and reliable measures are needed to answeran evaluation question causal evidence of program effects is needed(keeping in mind that the program is rarely thesole cause of change; there may be other activitiesor environmental factors which provide partialattribution) data are needed on a large number of people orpopulations.Study Design for Evaluating Population Health and Health Service Interventions: A Guide 7

3.6 Pragmatic considerationsWhile scientific rigour, including prudent study designselection, is an important aspect of a high quality evaluationthere are also pragmatic considerations to take into accountwhen designing evaluations. Along with a program logicmodel, the good practice principles and the generation ofrelevant evaluation questions outlined above, selecting theappropriate design for an evaluation also requires carefulconsideration of the following factors:5,9 implementation stage of the intervention (see theTranslational Research Framework) setting of the evaluation (e.g. school, hospital, wholepopulation) the likelihood of systematic bias (e.g. selection bias) availability of data (e.g. program monitoring data) budget for and cost of the evaluation feasibility of conducting the study design acceptability of the study design to stakeholders andparticipants.A good rule of thumb when designing an evaluation is to‘keep it simple’. For example, it is not necessary to use acomplex design when a simple one will suffice. However,more complex interventions may require larger evaluationsample sizes in order to account for additional expectedvariability and longer study periods, especially if systemchanges are required. Using a range of outcome measuresallows for more efficient use of the data produced by complexinterventions. This will enhance the understanding of thepractical effectiveness of the program, including the types andcauses of variation between individuals, sites and over time,allowing for an assessment of how viable the intervention isin a real-world setting. Finally, more complex interventions aremore sensitive to the impacts of local contextual modification.They may present difficulties with adhering to standardisedevaluation designs and delivery. Research protocols maytherefore need to allow for adaptation to the local setting.5Overall, your chosen study design needs to be fit for purpose,ensuring it is realistic and appropriate with respect to purpose,rigour and context. Focusing on the most relevant evaluationquestions will help ensure your evaluation is manageable, costefficient and useful.In practice, some compromises may be needed. For examplea trade-off between rigour and budget constraints may resultin choosing the next best alternative design.1 Study designs‘lower down’ in the hierarchy (see section 4.1) can stillproduce useful results, but findings need to be interpreted inlight of the limitations of the design used.58 Study Design for Evaluating Population Health and Health Service Interventions: A Guide

4. Study designsThere are different types of study designs usedin evaluation. Different study designs may beappropriate at different stages of implementationof the intervention. This guide focuses onquantitative designs that are used in evaluating theimpacts and outcomes of population health andhealth service interventions.4.1 Experimental designsThe quality of research designs is often framed around ahierarchy of the ‘most scientific’ designs to the ‘least scientific’designs.10 Experimental designs are considered to be at thetop of the hierarchy as they provide the most compellingevidence that an intervention caused a particular effect.7 Forfurther reading on levels of evidence see the National Healthand Medical Research Council’s How to Use the Evidence. Itis also important to note that methodological quality (how astudy is implemented) affects the credibility of study findings.For example, a well-designed and delivered quasi-experimentalstudy may provide more credible evidence than a poorlyexecuted experimental study. For guidance on methodologicalquality criteria please refer to Grading of RecommendationsAssessment, Development and Evaluation (GRADE).Experiments are characterised by the establishment of twoor more groups that are identical except for a single factorof interest, for example, exposure to an intervention. Anyobserved differences between the groups can hence beattributed to that factor.9 True experiments are characterisedby randomisation of intervention and experimental controlgroups.64.1.1 Randomised controlled trials (RCTs)DescriptionIn an RCT, individuals are randomly assigned to either acontrol group or an intervention group at the start of thestudy (see Figure 1). This ensures that individual factors thatmay influence the outcomes (including those factors thatare not known) will be evenly distributed across the twogroups. Theoretically, randomisation allows the groups tobe as similar as possible at the outset of the study, thoughthe likelihood of this increases with increasing sample size.9Assessment of demographic, other important characteristicsof the two groups and the outcome measures should be madeat baseline, before the intervention begins and repeated afterthe intervention has been delivered.10 Randomisation reducesselection bias so that the differences in the outcomes betweenthe two groups can be attributed to the different treatmentof the groups (intervention or control), and not some otherconfounding factor, effect modifier or due to chance. Again,an adequate sample size is required to ensure that a differencebetween the two groups is able to be detected. In evaluationsof health service interventions, the control group typicallyreceives the usual care or no treatment (rather than receivinga placebo, as in many clinical interventions). It is becauseof the similarity of the two groups at baseline that RCTs areconsidered to provide the best evidence of cause and effectof all study designs, as differences between the groups at thestudy end-point can be attributed to the intervention itself.An important consideration in choosing an appropriate studydesign for an evaluation is that prospective studies (includingexperimental studies) are not plausible if the intervention hasalready started, unless an enhancement of the program isbeing tested.Study Design for Evaluating Population Health and Health Service Interventions: A Guide 9

FIGURE 1. Randomised controlled trialInterventionIndividualsrandomisedControlNote: Green shapes represent individuals with a change in the outcome of interest at follow upStrengths and limitationsRCTs are usually used for simple interventions. They areconsidered the best way to evaluate the effectiveness of anew treatment or prevention strategy, providing the mostcompelling evidence of all study designs (particularly ifseveral RCTs are combined in a systematic review).9 RCTs arethe best study design with regard to internal validity, hencethey are most relevant when there is a need to generate‘causal evidence’.6,10 Internal validity refers to the extentto which differences in observed effects between exposedand unexposed groups can be attributed to the interventionand not to some other possible cause. RCTs sometimes lackexternal validity; generalisations of findings outside the studypopulation may be invalid.11,12 RCTs are costly to design andimplement as they require more control over the programsetting than may be achievable.7 They may be subject topractical problems such as loss to follow-up or an inabilityto blind participants as to which study group they are in.More complex interventions may require flexible delivery andmultiple community-driven strategies; hence, adherence to astrict RCT protocol when delivering and evaluating complexpublic health interventions is often not pragmatic.13 This mayhamper the implementation and evaluation of the programto the point where no effect can be observed. One of thebiggest challenges to maintaining the fidelity of an RCT is thepotential for contamination of the control group. This occurswhen those in the control group know about the intervention,and this therefore influences their behaviour, making it difficultto detect the effects of an intervention.10 Randomisationmay also involve ethical risks, such as withholding a programintended to improve health outcomes from one group wherethe intervention is known to be effective in other settings or adifferent population. This may be overcome through the use ofother experimental designs outlined below.10 Study Design for Evaluating Population Health and Health Service Interventions: A Guide

When to use RCTs in practiceYou can use RCTs for your evaluation only if it is possible to randomly allocate individuals to intervention and controlgroups. Given the costs involved in maintaining fidelity to a program protocol required by RCTs, you should only usethis design in a well-funded project. You should use RCTs to test a causal hypothesis only after you have used simplerand cheaper designs to determine the feasibility of your intervention.13 Keep in mind that RCTs are not often used inpopulation health and health services research because they are best used for well-defined (discrete) interventions andcontrollable settings.Case study: Randomised controlled trialEffects of a pedometer-based intervention on physical activity levels after cardiacrehabilitationThis case study is an example of an evaluation where itwas possible to randomly allocate individuals to receivea pedometer-based intervention or to be in a controlgroup. This type of intervention allowed for a welltargeted and controlled setting. The RCT was conductedto evaluate the efficacy of pedometers for increasingphysical activity after a cardiac rehabilitation program(CRP). Patients (n 110) who had attended a CRP wererandomised into an intervention or a control group. Thesix-week intervention included self-monitored physicalactivity using a pedometer and step calendar and twobehavioural counselling and goal-setting sessions. Thecontrol group received two generic physical activityinformation brochures after the baseline questionnairewas administered. Self-reported physical activity andpsychosocial status were collected at baseline, sixweeks, and six months. At six weeks and six months,improvements in total physical activity sessions (sixweeks: change in mean sessions (SD) 2.9 (6.5), p 0.002;six months: change in mean sessions (SD) 0.9 (5.8),p 0.016), walking minutes (six weeks only: change inmean minutes (SD) 80.7 (219.8), p 0.013), and walkingsessions (six weeks: change in mean sessions (SD) 2.3(5.5), p 0.001; six months: change in mean sessions(SD) 0.2 (5.0), p 0.035) in the intervention group weresignificantly greater than those in the control group afteradjusting for baseline differences.Butler L, Furber S, Phongsavan P, Mark A, Bauman A.Effects of a pedometer-based intervention on physicalactivity levels after cardiac rehabilitation. J CardiopulmRehabil Prev 2009; 29(2): 105-14.Study Design for Evaluating Population Health and Health Service Interventions: A Guide 11

4.1.2 Cluster randomised controlled trialsDescriptionCluster randomised controlled trials (cluster RCTs) operate onthe same principles as RCTs, however randomisation occurs atthe group-level rather than the individual-level (see Figure 2).4These groups (clusters) are randomly allocated to interventionor control conditions. This is important where individuals in agroup share features in common which may have an impact onthe study outcome (e.g. students attending the same schoolare likely to share behaviours or beliefs that may influencetheir health). As in a traditional RCT design, randomisationof the intervention and control group ensures that thesegroups are as similar as possible at baseline so that any effectobserved may be attributed to the intervention itself.4FIGURE 2. Cluster randomised controlled trialInterventionClustersrandomisedControlNote: Green shapes represent individuals with a change in the outcome of interest at follow up12 Study Design for Evaluating Population Health and Health Service Interventions: A Guide

Strengths and limitationsCluster RCTs are often the most appropriate experimentalstudy design when there is a risk of contamination ifindividuals are randomised, if the intervention is best targetedto a group or population, or if it is more logistically feasibleto be delivered at a group-level.4 Analysis of groups ratherthan individuals requires more technical statistical expertisegiven that individuals within groups or clusters are likelyto be correlated (and standard statistical methods assumethat observations are independent). If this correlation is notaccounted for, it may result in a false positive result(Type I error).4 In cluster RCTs, two sources of variability mustbe taken into account in analyses: the variation betweenindividuals within groups, and the variation of individualsbetween groups. It is therefore important to ensure that anadequate number of clusters are used, and that there are anappropriate number of individuals within each cluster group.Careful calculation of sample size, taking into account theintracluster correlation co

Study design (also referred to as research design) refers to the different study types used in research and evaluation. In the context of an impact/outcome evaluation, study design is the approach used to systematically investigate the effects of an intervention or a program. Study designs may be experimental, quasi-experimental or non .