Replication And Meta-Analysis In Parapsychology

Transcription

Replication and Meta-Analysis in ParapsychologyAuthor(s): Jessica UttsSource: Statistical Science, Vol. 6, No. 4 (Nov., 1991), pp. 363-378Published by: Institute of Mathematical StatisticsStable URL: http://www.jstor.org/stable/2245728 .Accessed: 01/08/2014 23:40Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at ms.jsp.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact support@jstor.org.Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access toStatistical Science.http://www.jstor.orgThis content downloaded from 128.114.163.7 on Fri, 1 Aug 2014 23:40:46 PMAll use subject to JSTOR Terms and Conditions

Statistical Science1991, Vol. 6, No. 4, 363-403Replication and Meta-AnalysisinParapsychologyJessica UttsAbstract.Parapsychology,the laboratorystudyof psychicphenomena,has had its historyinterwovenwith that of statistics.Many of thein parapsychologycontroversieshave focusedon statisticalissues, andstatisticalmodels have played an integralrole in the experimentalwork.Recently,parapsychologistshave been using meta-analysisas atool for synthesizinglarge bodies of work. This paper presents anoverviewoftheuse ofstatisticsin parapsychologyand offersa summaryof the meta-analysesthat have been conducted.It begins with someanecdotalinformationabout the involvementof statisticsand statisticians withthe earlyhistoryofparapsychology.Next,it is arguedthatmostnonstatisticiansdo not appreciatethe connectionbetweenpowerand ningto paraa particularexperimentalpsychology,regimeis examinedby summarizofthe results.A new seting an extendeddebateoverthe interpretationofexperimentsdesignedto resolvethe debateis thenreviewed.Finally,are summarized.Itmeta-analysesfromseveralareas ofparapsychologyis concludedthatthe overallevidenceindicatesthatthereis an anomalous effectin need ofan explanation.Key wordsand phrases: s,randomness,vote-counting.Parapsychology,as thisfieldis called,has been aits etochangeeven in thebeliefsresistantandscientistsincluded,face of data,manypeople,seem to have made up theirmindson the questionwithoutexaminingany empiricaldata at all. e level ofthe debate duringthe past 130 yearsforanyonewho wouldhas been an embarrassmentlike to believe that scholarsand scientistsadhereto standardsofrationalityand fairplay" (Hyman,has1985a,page 89). Whilemuchofthe controversyfocusedon poorexperimentaldesignand potentialfraud,therehave been attacksand defensesofthestatisticalmethodsas well, sometimescalling intoquestion the very foundationsof probabilityandstatisticalinference.Most ofthe criticismshave been leveled by psychologists.For example,a 1988 reportofthe U.S.NationalAcademyofSciencesconcludedthat "Thecommitteefinds no scientificjustificationfromresearchconductedover a periodof 130 years forthe existence of parapsychologicalphenomena"(Druckmanand Swets,1988,page 22). The chapterwas writtenby a subcommitteeon parapsychology1. INTRODUCTIONIn a June 1990 Gallup Poll, 49% of the 1236perclaimedto believein extrasensoryrespondentsception(ESP), and one in fourclaimedto have hada personalexperienceinvolvingtelepathy(Gallupand Newport,1991). Other surveyshave showneven higher percentages; the University ofChicago's National Opinion Research Center recentlysurveyed1473 adults,ofwhich67% claimedthattheyhad experiencedESP (Greeley,1987).Public opinionis a poor arbiterof science,however, and experienceis a poor substituteforthescientificmethod.For morethan a century,smallnumbersofscientistshave been conductinglaboratory experimentsto study phenomena such astelepathy,clairvoyanceand precognition,collectively known as "psi" abilities. This paper willexaminesome ofthat work,as well as some oftheit has generated.statisticalcontroversiesJessica Utts is Associate Professor, Division ofStatistics, Universityof California at Davis, 469Kerr Hall, Davis, California 95616.363This content downloaded from 128.114.163.7 on Fri, 1 Aug 2014 23:40:46 PMAll use subject to JSTOR Terms and Conditions

J. UTTS364chaired by a psychologistwho had published ato thesimilarconclusionpriorto his appointmentcommittee(Hyman,1985a, page 7). Therewere noinvolvedwiththe ationsofbias (Palmer,Honortonand Utts, 1989) led U.S. Senator ClaibornePell to request that the CongressionalOfficeofTechnologyAssessment(OTA) conductan investigation with a more balanced group. A one-dayworkshopwas held on September30, 1988, bringcriticsand expertsing togetherparapsychologists,in somerelatedfields(includingthe authorofthispaper). The reportconcludedthat parapsychologyneeds "a fairerhearingacross a broaderspectrumso that emotionalityof the scientificcommunity,does notimpedeobjectiveassessmentofexperimental results" (Office of TechnologyAssessment,1989).It is in the spiritof the OTA reportthat thisarticleis written.AfterSection2, whichoffersananecdotalaccount of the role of statisticiansandthe discussionturnstostatisticsin parapsychology,the moregeneralquestionofreplicationof experimentalresults. Section 3 illustrateshow replicabyscientistsin manytionhas been (mis)interpretedin Section4, afields.Returningto parapsychologyparticularexperimentalregimecalled the "ganzfeld" is described,and an extendeddebate aboutthe interpretationof the experimentalresults isexamines a meta-analysisofSection5discussed.designedto resolvetherecentganzfeldexperimentsdebate. Finally,Section6 containsa briefaccountthathave been conductedin otherofmeta-analysesand conclusionsare givenareas ofparapsychology,in Section7.2. STATISTICS AND PARAPSYCHOLOGYhad its beginningsin the investiParapsychologygation of purportedmediumsand otheranecdotalclaims in the late 19th century.The SocietyforPsychicalResearchwas foundedin Britainin 1882,and its American counterpartwas founded inand theirtostonin 1884. stigatprimarilying anecdotal material, a few of the early researcherswere already conducting"forced-choice"experimentssuch as card-guessing.(Forced-choiceare like multiplechoicetests;on eachexperimentstrial the subjectmust guess froma small, knownset of possibilities.) Notable among these wasNobel Laureate Charles Richet,who is generallycreditedwithbeingthe firstto recognizethatprobability theorycould be applied to card-guessingexperiments(Rhine,1977, page 26; Richet,1884).F. Y. Edgeworth,partlyin responseto what heconsideredto be incorrectanalyses ofthese experi-ments,offeredone of the earliest treatiseson theexperimentsstatisticalevaluation of forced-choicein two articlespublishedin the Proceedingsof theSociety for Psychical Research (Edgeworth, 1885,as noted by Mauskopfand1886). Unfortunately,McVaugh (1979) in theirhistoricalaccountof theperiod,Edgeworth'spaperswere "perhapstoo difficultfortheirimmediateaudience" (page 105).Edgeworthbegan his analysis by using Bayes'theoremto derive the formulafor the posteriorprobabilitythat chance was operating,given thedata. He then continued with an argument"savouringmoreofBernoullithanBayes" in which"it is consonant,I submit,to experience,to put 1/2bothfora and ,B,"thatis, forboththe priorprobability that chance alone was operating,and thethat"thereshouldhave been somepriorprobabilityadditional agency." He then reasoned (using aTaylor series expansion of -the posteriorprobabilityformula)that if there were a large probability of observingthe data given that someadditionalagencywas at work,and a small objective probabilityofthe data underchance,thenthelatter (binomial)probability"may be taken as aroughmeasureofthe soughta posterioriprobability in favourof mere chance" (page 195). Edgeworthconcludedhis articleby applyinghis methodto some data published previouslyin the samejournal. He foundthe probabilityagainstchancetobe 0.99996,whichhe said "may fairlybe regardedas physicalcertainty"(page 199). He concluded:Such is the evidence which the calculus ofprobabilitiesaffordsas to the existenceof anagencyotherthanmerechance.The calculusissilentas to the natureofthatagency-whetherit is more likely to be vulgar illusion or extraordinarylaw. That is a question to bedecided,not by formulaeand figures,but bygeneral philosophyand commonsense [page199].Both the statisticalargumentsand the experimental controlsin these early experimentsweresomewhatloose. For example,Edgeworthtreatedas binomial an experimentin which one personchose a string of eight letters and another attemptedto guess the string.Since it has longbeenthatpeopleare re is no statisticalbasis foranalyzingsuch an experiment.Nonetheless,Edgeset the stage fortheworthand his contemporariesuse ofcontrolledwithstatisticalevaluexperimentsAn interestingation in rth'sinvolvementandthe role telepathyexperimentsplayed in the earlyhistoryof randomizationand experimentaldesignis providedby Hacking(1988).This content downloaded from 128.114.163.7 on Fri, 1 Aug 2014 23:40:46 PMAll use subject to JSTOR Terms and Conditions

REPLICATION IN PARAPSYCHOLOGYOne of the first American researchers touse statistical methods in parapsychologywasJohnEdgar Coover,who was the Thomas WeltonStanfordPsychicalResearchFellow in the PsychologyDepartmentat StanfordUniversityfrom1912to 1937 (Dommeyer,1975). In 1917, Coover published a large volume summarizinghis work(Coover, 1917). Coover believed that his resultswere consistentwith chance,but othershave argued that Coover's definitionof significancewastoo strict(Dommeyer,1975). For example,in oneevaluation of his telepathyexperiments,Cooverfounda two-tailedp-valueof0.0062. He concluded,"Since this value, then, lies within the field ofchance deviation,althoughthe probabilityof itsoccurrenceby chance is fairlylow, it cannot beaccepted as a decisive indicationof some causebeyondchancewhichoperatedin favorofsuccessinguessing" (Coover, 1917, page 82). On the nextpage, he made it explicitthat he would requireap-value of 0.0000221 to declare that somethingotherthan chancewas operating.It was during the summerof 1930, with theexperimentsof J. B. Rhine at Dukecard-guessingthatparapsychologybegan to take holdUniversity,as a laboratoryscience. Rhine's laboratorystillexists under the name of the FoundationforResearchon the NatureofMan, housedat the edge ofthe Duke Universitycampus.It wasn't long after Rhine published his firstbook, Extrasensory Perception in 1934, that thebegan. Since his claimsattackson his methodologywere whollybased on statisticalanalyses of histhe statisticalmethodswere closelyexperiments,scrutinizedby criticsanxiousto finda he most persistentcritic was a psychologistfromMcGill Universitynamed Chester Kellogg(Mauskopfand McVaugh, 1979). Kellogg's mainargumentwas that Rhine was using the binomialdistribution(and normal approximation)on a seThe expeririesoftrialsthatwerenotindependent.ments in question consistedof having a subjectguessthe orderofa deckof25 cards,withfiveeachoffivesymbols,so technicallyKelloggwas correct.By 1937, several mathematiciansand statisticians had come to Rhine's aid. MauskopfandMcVaugh(1979) speculatedthatsincestatisticswasitselfa youngdiscipline,"a numberofstatisticianswere equally outraged by Kellogg, whose argumentsthey saw as discreditingtheir profession"(page 258). The major technicalwork,which acwereaccurateknowledgedthatKellogg'scriticismsbut did little to change the significanceof theresults, was conductedby Charles Stuart andJosephA. Greenwoodand publishedin the first(Stuartvolume of the Journalof Parapsychology365and Greenwood,1937). Stuart,who had been anundergraduatein mathematicsat Duke, was one ofRhine's early subjectsand continuedto workwithhim as a researcheruntil Stuart's death in 1947.whoapparGreenwoodwas a Duke mathematician,entlyconvertedto a statisticianat the urgingofRhine.Anotherprominentfigurewho was distressedwith Kellogg's attack was E. V. Huntington,amathematicianat Harvard. After correspondingwith sethe publicwitha technicalreplytoKellogg'sarguments,a simplestatementshouldbemade to the effectthat the mathematicalissues inRhine's workhad been resolved.Huntingtonmusthave successfullyconvincedhis formerstudent,Burton Camp of Wesleyan,that this was a wiseapproach.Camp was the 1937 Presidentof IMS.Whenthe annual meetingswere held in Decemberof 1937 (jointly with AMS and AAAS), Campreleaseda statementto the pressthatread:Dr. Rhine's investigationshave two aspects:experimentaland statistical. On the experimental side mathematicians, of course,have nothingto say. On the statisticalside,however, recent mathematical work hasestablishedthe fact that, assuming that theexperimentshave been properlyperformed,the statisticalanalysis is essentiallyvalid. Ifthe Rhineinvestigationis to be fairlyattacked,it mustbe on otherthanmathematicalgrounds[Camp,1937].One statisticianwho did emergeas a criticwasWilliamFeller. In a talk at the Duke Mathematical Seminaron April 24, 1940, Feller raised threecriticismsto Rhine'swork(Feller, 1940). Theyhadbeen raised beforeby others(and continueto beraised even today). The firstwas that inadequateof the cards resultedin additionalinforshufflingmationfromone seriesto the next.The secondwaswhat is now known as the "file-drawereffect,"namely,that if one combinesthe results of published studies only,there is sure to be a bias infavorof successfulstudies.The thirdwas that theresultswere enhancedby the use of optionalstopthe numberoftrialsping,that is, by notspecifyingin advance. All threeof these criticismswere addressed in a rejoinderby Greenwoodand Stuart(1940), but Feller was neverconvinced.Even in itsthirdeditionpublishedin 1968,his book An Introduction to Probability Theory and Its Applicationsstill containshis conclusionabout GreenwoodandStuart: "Both their arithmeticand their experimentshave a distincttinge of the supernatural"(Feller,1968,page 407). In his discussionofFeller'sposition, Diaconis (1978) remarked, "I believeThis content downloaded from 128.114.163.7 on Fri, 1 Aug 2014 23:40:46 PMAll use subject to JSTOR Terms and Conditions

366J.UTTSFeller was confused. he seemedto have decidedthe oppositionwas wrongand thatwas that."Several statisticianshave contributedto theliteraturein parapsychologyto greateror lesserdegrees. T. N. E. Greville developed applicablestatisticalmethodsformanyofthe experimentsinand was StatisticalEditor of theparapsychologyJournal of Parapsychology(with J. A. Greenwood)fromits startin 1937 throughVolume31 in 1967;Fisher (1924, 1929) addressedsome specificprobWilks(1965a,b)lemsin card-guessingexperiments;describedvarious statisticalmethodsforparapsychology;Lindley(1957) presenteda Bayesian analysis of some parapsychologydata; and Diaconis(1978) pointedout someproblemswithcertainexperimentsand presenteda methodfor analyzingwhenfeedbackis given.experimentsOccasionally,attacks on parapsychologyhaveintakentheformofattackson statisticalinferencegeneral, at least as it is applied to real data.Spencer-Brown(1957) attemptedto showthat truerandomnessis impossible,at least in finite sequences,and thatthis couldbe the explanationforthe resultsin parapsychology.That argumentreemergedin a recentdebate on the role ofrandomness in parapsychology,initiatedby psychologistJ.BarnardGilmore(Gilmore,1989, 1990; Utts,1989;Palmer,1989, 1990). Gilmorestatedthat "The agnostic statistician,advising on research in psi,shouldtake accountof the possibleinappropriateness of classical inferentialstatistics"(1989, page338). In his studies showingpurportedlytems that do not behave as they should underrandomness (e.g., Iversen, Longcor, Mosteller,Gilbert and Youtz, 1971; Spencer-Brown,1957).Gilmore concluded that "Anomalous data.should not be found nearly so oftenif classicalstatisticsoffersa valid model of reality" (1990,page 54), thusrejectingthe use ofclassical statistiforreal-worldcal inferenceapplicationsin general.3. REPLICATIONImplicitand explicitin the literatureon parapsychologyis the assumptionthat, in orderto trulyestablish itself,the field needs to find a repeatable experiment.For example, Diaconis (1978)startedthe summaryofhis articlein Science withthe words "In search of repeatable ESP experiments,moderninvestigators." (page 131). OnOctober28-29, 1983, the 32nd InternationalConFoundationwas heldferenceofthe Parapsychologyin San Antonio,Texas, to address"The Repeatability Problemin Parapsychology."The ConferenceProceedings(Shapin and Coly, 1985) reflectthediverseviews amongparapsychologistson the nature of the problem.Honorton(1985a) and on is uncommonin most branchesofscienceandshould not be singledout asthat parapsychologyunique in this regard. Other authors expressedin the lack of a single repeatabledisappointmentexperimentin parapsychology,with titles suchas "Unrepeatability:Parapsychology'sOnly Finding" (Blackmore,1985), and "Research Strategiesfor Dealing with Unstable Phenomena" (Beloff,1985).It has never been clear, however,just exactlywhat would constituteacceptableevidenceofa reIn the earlydaysofinvestigapeatableexperiment.tion, the major critics"insisted that it would besufficientforRhine and Soal to convincethem ofESP if a parapsychologistcould performsuccessfullya single 'fraud-proof'experiment"(Hyman,1985a, page 71). However,as soonas well-designedexperiments showing statistical significanceemerged,the criticsrealized that a single experiment could be statisticallysignificantjust bychance.BritishpsychologistC. E. M. Hansel quantifiedthe new expectation,that the experimentshouldbe repeateda fewtimes,as follows:If a result is significantat the .01 level andthisresultis notdue to chancebut to information reachingthe subject,it may be expectedthat by makingtwo furthersets of trials theantichanceodds of one hundredto one will beincreasedto arounda millionto one, thus enabling the effectsof ESP-or whateveris responsibleforthe originalresult-to manifestitselfto such an extentthattherewill be littledoubt that the result is not due to chance[Hansel,1980,page 298].In onvinceHanselthat somethingpcotherthan chancewas at work.This argumentimpliesthatifa particularexperimentproducesa ationsfail to attain significance,thentheoriginalresultwas probablydue to chance,or at least remainsunconvincing.The problemwiththis line of reasoningis that there is no consideration given to sample size or power. Only anexperimentwith extremelyhigh power shouldbe expected to be "successful" three times insuccession.It is perhaps a failure of the way statisticsistaughtthat manyscientistsdo not understandtheimportanceofpowerin definingsuccessfulreplication.To illustratethispoint,psychologistsTverskyand Kahnemann(1982) distributeda questionnaireThis content downloaded from 128.114.163.7 on Fri, 1 Aug 2014 23:40:46 PMAll use subject to JSTOR Terms and Conditions

REPLICATION IN PARAPSYCHOLOGYto theircolleaguesat a professionalmeeting,withthe question:has reporteda resultthatyouAn investigatorconsiderimplausible.He ran 15 subjects,andvalue, t 2.46. Anotherreporteda significanthas attemptedto duplicatehis proinvestigatorvaluecedure,and he obtaineda nonsignificantof t with the same numberof subjects.Thedirectionwas the same in both sets of data.You are reviewingthe literature.What is thehighestvalue of t in the secondset ofdata thatyou would describeas a failureto replicate?[1982,page 28].In reportingtheir results, Tversky and Kahnemannstated:The majorityof our respondentsregardedt 1.70 as a failureto replicate.If the data oftwosuch studies (t 2.46 and t 1.70) are pooled,the value of t forthe combineddata is about3.00 (assumingequal variances).Thus,we arefaced with a paradoxical state of affairs,inwhichthe same data that would increaseourin the findingwhenviewedas partconfidenceof the original study, shake our confidencewhen viewed as an independentstudy[1982,page 28].At a recentpresentationto the Historyand Philosophyof Science Seminar at the UniversityofCaliforniaat Davis, I asked the followingquestion.Two scientists,ProfessorsA and B, each have atheorytheywouldlike to demonstrate.Each plansto run a fixednumberofBernoullitrials and thenAtest Ho: p 0.25 versilsHa: p 0.25. Professorhas access to large numbers of students eachsemesterto use as subjects.In his firstexperiment,he runs 100 subjects,and there are 33 successes(p 0.04, one-tailed).Knowingthe importanceofA runsan additional100 subProfessorreplication,jects as a secondexperiment.He finds36 successes(p 0.009, one-tailed).ProfessorB only teaches small classes. Eachon her studentstoquarter,she runsan experimenttest her theory.She carries out ten studies thisway,withthe resultsin Table 1.I asked the audience by a show of hands toindicatewhetheror nottheyfeltthe ccessfullyA's theoryreceived overwhelmingsupport,with20 votes,while ProfessorB's theoryapproximatelyreceivedonlyone vote.If you aggregatethe resultsof the experimentsforeach professor,you will noticethat each conducted200 trials,and ProfessorB actuallydemonstrateda higherlevel ofsuccessthan ProfessorA,367with 71 as opposed to 69 successfultrials. Theone-tailed p-values for the combinedtrials are0.0017 forProfessorA and 0.0006 forProfessorB.To address the questionof replicationmoreexplicitly,I also posed the followingscenario. InDecemberof 1987, it was decidedto prematurelyterminatea studyon the effectsofaspirinin reducingheartattacksbecause the data wereso convincing (see, e.g., Greenhouseand Greenhouse,1988;Rosenthal,1990a). The physician-subjectshad beenrandomlyassigned to take aspirin or a placebo.There were 104 heart attacks among the 11,037subjectsin the aspiringroup,and 189 heartattacksamong the 11,034 subjects in the placebo group(chi-square 25.01, p 0.00001).Aftershowingthe results of that study,I presentedthe audience with two hypotheticalexperiments conductedto try to replicate the originalresult,withoutcomesin Table 2.I asked the audienceto indicatewhichone theythoughtwas a moresuccessfulreplication.The audiencechosethe secondone, as wouldmostjournaleditors,because of the "significantp-value." Infact,the firstreplicationhas almost exactlytheofheartattacksin the two groupssame proportionas the originalstudyand is thus a veryclose replication of that result. The second replicationhasTABLE potheticalreplicationsof the aspirin/ heartattack areYesNo1119115610902.596, p 0.11This content downloaded from 128.114.163.7 on Fri, 1 Aug 2014 23:40:46 PMAll use subject to JSTOR Terms and 013.206, p 0.0003

368J.UTTSand in factthe relativeproportions,verydifferentriskfromthe secondstudyis noteven containedinintervalforrelativeriskfromthea 95% confidenceoriginal study. The magnitudeof the effecthasbeen much more closelymatchedby the stsare beginningto noastice that replicationis not as straightforwardtheywereoriginallyled to believe. A special issueof the Journal of Social Behavior and Personalitywas entirelydevotedto the questionofreplication(Neuliep, 1990). In one of the articles,Rosenthalcautionedhis colleagues: "Given the levels of statistical power at which we normallyoperate,weofsignificanthave no rightto expectthe proportionresultsthat we typicallydo expect,even if in naeffect"turethereis a veryreal and veryimportant(Rosenthal,1990b,page 16).Jacob Cohen, in his insightfularticle titled"Things I Have Learned (So Far)," identifiedanothermisconceptioncommonamong social scientists: "Despite widespreadmisconceptionsto thecontrary,the rejectionof a given null hypothesisthatgivesus no basis forestimatingtheprobabilitya replicationof the researchwill again result inrejectingthat null hypothesis"(Cohen,1990, page1307).Cohen and Rosenthalboth advocate the use ofeffectsizes as opposedto significancelevels whendefiningthe strengthofan experimentaleffect.Ingeneral,effectsizes measurethe amountby whichthe data deviatefromthe null hypothesisin termsof standardizedunits. For instance,the effectsizefora two-samplet-testis usually definedto be thein the two means, dividedby the standifferencedard deviationforthe controlgroup.This measurecan be comparedacross studieswithoutthe dependence on sample size inherentin significancelevels. (Of coursetherewill still be variabilityin theofsamsampleeffectsizes,decreasingas a functionple size.) Comparisonofeffectsizes acrossstudiesisofmeta-analysis.one ofthe majorcomponentsSimilar argumentshave recentlybeen made inthe medicalliterature.For example,GardnerandAltman(1986) statedthat the use of p-values "todefinetwo alternativeoutcomes-significantandnot significant-isnothelpfuland encourageslazythinking"(page 746). They advocatedthe use ofintervalsinstead.confidenceAs discussedin the next section,the argumentshas failedtoused to concludethat parapsychologya replicableeffecthingeon these misdemonstrateconceptionsof replicationand failureto examinepower.A moreappropriateanalysiswouldcomparethe effectsizes forsimilarexperimentsacross experimentersand across time to see if there havebeen consistenteffectsof the same magnitude.Rosenthalalso advocatesthis view ofreplication:The traditionalview of replicationfocusesonsignificancelevel as the relevant summarystatisticofa studyand evaluatesthe successofa replicationin a dichotomousfashion. Thenewer,moreusefulview of replicationfocuseson effectsize as the moreimportantsummarystatisticofa studyand evaluates the successofa replicationnot in a dichotomousbut in acontinuousfashion[Rosenthal,1990b,page 28].The dichotomousview of replicationhas beenbythe historyof parapsychology,used throughoutand critics(Utts,1988). Forbothparapsychologistsexample,the National AcademyofSciencesreportbutcriticallyevaluated ed"nonsignificant"In the nextthreesections,we will examinesomeusingthe broader,ofthe resultsin parapsychologyofreplication.In doingmoreappropriatedefinitionso, we will show that the results are far morethan the criticswouldhave us believe.interesting4. THE GANZFELD DEBATE INPARAPSYCHOLOGYAn extensivedebate tookplace in the mid-1980sand critic,questioningbetweena parapsychologistwhetheror nota particularbodyofparapsychologipsi abilities.The experical data had demonstratedments in question were all conductedusing theganzfeldsetting(describedbelow). Several authorson the debate.were invitedto writecommentariesAs a result,this data base has been more thoroughly analyzed by both critics and proponentsthan any other and provides a good source forstudyingreplicationin parapsychology.The debate concludedwith a detailed series ofand ornotwhetherofthepsiquestionopenhad been demonstrated.A new series of experiments that followedthe recommendationswereconductedover the next few years. The resultsofwill be presentedin Section5.the newexperiments4.1 Free-Response ExperimentsRecent experimentsin parapsychologytend touse more complextargetmaterial than the cardspartiallyand dice used in the tiallyresemblethe conditionsof spontaneouspsi occurrences"(Burdickand Kelly,1977,page 109). Theseexperimentsfall under the general heading ofbecause the subjectisexperiments,"free-response"asked to give a verbalor writtendescriptionoftheThis content downloaded from 128.114.163.7 on Fri, 1 Aug 2014 23:40:46 PMAll use subject to JSTOR Terms and Conditions

REPLICATION IN PARAPSYCHOLOGYtarget,ratherthan being forcedto make a choicefroma small discreteset of possibilities.Varioustypesoftargetmaterialhave been used, includingpictures,shortsegmentsof movieson video tapes,actual locationsand small objects.Despite the more complextarget material,thestatisticalmethodsused to analyze these experimentsare similarto thoseforforced-choiceexperiments.A typicalexperimentproceedsas follows.Beforeconductingany trials,a large pool ofpotential targetsis assembled,usuallyin packetsoffour.Similarityof targetswithina packet is kept to aminimum,for reasons made clear below. At thestartofan experimentalsession,afterthe subjectissequesteredin an isolatedroom,a targetis selectedat randomfromthe pool. A sender is placed inanotherroomwiththe target.The subjectis askedto providea verbal or writtendescriptionof whathe or she thinksis in the target,knowingonlythatit is a photograph,an object,etc.Afterthe subject'sdescriptionhas been recordedand securedagainst the potentialforlater alteration,a judge (whomay or may notbe the subject)is givena copyofthe subject'sdescriptionand thefourpossibletargetsthat were in the packetwiththe correcttarget. A properlyconductedexperimenteitheruses video tapes or has two identicalsets of targetmaterial and uses the duplicatesetforthis part of the process,to ensure that cluessuch as fingerprintsdon't give away the answer.Based on the subject'sdescription,and ofcourseona blindbasis, thejudge is asked to eitherrank thefourchoicesfrommostto lea

controversies in parapsychology have focused on statistical issues, and statistical models have played an integral role in the experimental work. Recently, parapsychologists have been using meta-analysis as a tool for synthesizing large bodies of work. This paper presents an overview of the use of statistics in parapsychology and offers a summary