NONPARAMETRIC PERMUTATION TESTING - University Of Arizona

Transcription

NONPARAMETRICPERMUTATIONTESTINGCHAPTER 33TONY YE

WHAT IS PERMUTATIONTESTING?Framework for assessing the statistical significance of EEGresults.Advantages: Does not rely on distribution assumptions Corrections for multiple comparisons are easy to incorporate Highly appropriate for correcting multiple comparisons in EEG data

WHAT IS PARAMETRICSTATISTICAL TESTING?The test statistic is compared against a theoreticaldistribution of test statistics expected under the H0. t-valueχ2-valueCorrelation coefficientThe probability (p-value) of obtaining a statistic under the H0is at least as large as the observed statistic. [INSERT 33.1A]

NONPARAMETRICPERMUTATION TESTINGNo assumptions are made about the theoretical underlyingdistribution of test statistics under the H0. Instead, the distribution is created from the data that youhave!How is this done? Shuffling condition labels over trials Shuffling condition labels over subjects Within-subject analysesGroup-level analysesRecomputing the test statistic

NULL-HYPOTHESISDISTRIBUTIONEvaluating your hypothesis using a t-test of alpha powerbetween two conditions.Two types of tests: Discrete tests Compare conditionsContinuous tests Correlating two continuous variables

DISCRETE TESTSCompare EEG activity between Condition A & B H0 No difference between conditionsRandom relabeling of conditions Test Statistic (TS) as large as the TS BEFORE therandom relabeling.Steps1. Randomly swap condition labels from many trials2. Compute t-test across conditions3. If TS 0, there is sampling error or outliers

CONTINOUS TESTSThe idea: Testing statistical significance of a correlation coefficientWhat’s the difference between this and discrete tests? TS is created by swapping data points instead of labels

SIMILARITIESThe data are not altered The “mapping” of data are shuffled around.Analysis steps:1. Creates a H0 TS value2. Repeats the process many MANY times3. The MANY iterations of H0 values creates a distribution ofTS values

SIMILARITIESStatistical evaluation entails: Comparing the original TS value with the new distributionof TSEffect is not statistically significant if The original TS does not exceed the boundaries of thenew distribution of TSEffect IS statistically significant if The original TS is “far away” from the new TS distribution

ITERATIONSHow many do you need? 1000 for high signal-to-noise-ratio distribution 250 – 500 for permutation testing at each trial, time point,and frequency.Why? Less iterations greater chance for unusually large/smallTS by chanceMore iterations estimates of H0 distribution will bestronger with greater reliability in significanceWARNING! More iterations longer computation times!

DETERMINING STATISTICALSIGNIFICANCEMethod 1:1. Count the number of H0 values that are more extremethan the original TS value Extreme further to the right/left tail of the distribution2. Divide the number of H0 values by the total number oftests PN p-value based on the number of suprathresholdtests

DETERMINING STATISTICALSIGNIFICANCEMethod 2:1. Convert original TS to the standard deviation of the H0distribution2. Convert that into the p-value Ve original TSVn vector of H0 TSVn bar mean

DETERMINING STATISTICALSIGNIFICANCEMethod 2 cont’d: Z-value p-value Matlab function: normcdfp-value PZAdvantages: A p- and Z-value at eachpixel can be incorporatedinto the H0 distribution.

THE EVER-CHANGING P-VALUEWarning! P-values can change each time you recompute the H0distributionRest-assured if you have sufficient iterations! The change in p-values should be tiny

THE EVER-CHANGING P-VALUEWhat if you don’t have sufficient iterations? You may get p-values that CAN affect your interpretation!

THE EVER-CHANGING P-VALUEVariability depends on: Clean dataResult’s significance

MULTIPLE COMPARISONSThe Bonferroni correction is available to use!When to use Bonferroni: Hypothesis-driven analyses Testing effect for 3 different electrodes in ONE timefrequency windowIf there are not too many testsIf you expect robust effects

MULTIPLE COMPARISONSWhen NOT to use Bonferroni correction: Exploratory data-driven analysesMany tests over: Time pointsFrequency bandsElectrodesWhy not?1. Bonferroni correction assumes that the tests areindependent – which many EEG results are NOT2. The p-value will drop and hide actual effects3. Bonferroni correction is based only on the number oftests, instead of the information that can be found in thetests.

PERMUTATION TESTING TO THERESCUE!This framework already incorporates multiple comparisoncorrections!Unlike Bonferroni, permutation testing:1. Corrects for information in the tests, instead of numberof tests.2. Provides stable p-values that can detect effectsregardless of correlated data.

NONPARAMETRICPERMUTATION TESTINGTwo methods available1. Corrects for multiple comparisons by using the pixel todetermine the threshold2. Corrects by using the cluster to determine threshold.Generally, how do these methods work? Approaches the H0 distribution at the time-frequency maplevel, instead of pixel levelReflecting information from the ENTIRE time-frequencyelectrode space

PIXEL-BASED STATISTICSCreating a distribution that contains the pixel from eachiteration with the most extreme statistical value.Steps:1.2.3.4.Generate TS values under theH0 (as previously outlined)Store one or two pixels with themost extreme TS values in amatrixFor AND – effects: Define thestatistical threshold for 2.5percentile and 97.5th percentileFor OR – effects: Definethreshold for 95th or 5thpercentile, respectively

PIXEL-BASED STATISTICSThings to note: A summary of the most extreme H0 TS are saved acrossall pixels, at each iteration Map-level thresholdingSingle pixels can be statistically significant Even if neighboring pixels are non-significantInterpretability depends on experimental design and sizeof time-frequency pixels

CLUSTER-BASED CORRECTIONWhat is a cluster? A group of contiguously significant points in timefrequency-electrode spaceCan be seen after applying a threshold with any pixel thathas a value below it set to zeroWhat is cluster-based correction? Significance enough neighboring pixels withsuprathreshold values.Individual pixels that are significant aren’t really significant

CLUSTER-BASEDCORRECTIONS“Big enough”? Number of extracted frequenciesResolution of the resultsTemporally downsampledExample: 1 time point @ 1 ms Significance is false1 time point @ several Hz and several hundreds of ms Significance is valid

CLUSTER-BASEDCORRECTIONSNon-data-driven method:1. Predefine a number of target time and frequency points E.g., Clusters of 200 ms, 3 Hz2. Remove clusters that are less than that numberData-driven method:1. Perform permutation testing (as previously outlined)2. At each iteration, apply a threshold to the time-frequencymap using an uncorrected p-value (“precluster threshold”)3. Threshold the H0 iteration map

THRESHOLDINGSTRATEGIESMethod 1: Use parametric statistics P-value from your t-test or correlation coefficientIdeal for normally distributed dataMethod 2: Loop through the iterations TWICE Once to build the H0 distribution at each pixelSecond to threshold them using nonparametric pixelbased significance thresholding.

THRESHOLDINGSTRATEGIESNow you have a distribution of the largest suprathresholdclusters under the H0 Threshold the map of observed statistical values usinguncorrected p-valueNext steps:1. Identify clusters in threshold map2. Remove clusters thatare less than 95th%of the largest clusterdistribution

CLUSTER-BASEDMETHOD SUMMARYPerforms map-level thresholdingCorrections are based on the information within the data,instead of the number of tests performedPrecluster threshold affects the cluster correction threshold! Small p-value remove many clustersLarger p-value leave many clustersExtremely sensitive to large clusters Localized true effects may go unnoticed!

FALSE DISCOVERYRATE (FDR) METHODHow does it work? Controls for the probability of Type I errors within a distribution ofp-valuesfunction [pID,pN] fdr(p,q) pID p-value based on independence or positive dependence pN nonparametric p-value p p-value vector q FDR levelLimitations Critical p-value is based on the number of tests performed ANDdistribution of p-values

SUMMARYShuffling depends on your focus of analysis and hypothesis Comparing two conditions Correlations of time-frequency and reaction times overtrials Shuffle condition labelsShuffle mapping of reaction time to trialsConnectivity between two electrodes Shuffle ordering of time segmentsIf still unsure What effects does your hypothesis concern?Sometimes, shuffling can be performed with more thanone option

SUMMARYWhat about complex statistical designs? There is a lack of support for complex analyses incognitive electrophysiologyTake a hypothesis-testing approach with SPSS, SAS, or RfirstHow can you report your analyses in a methods section? What variables were shuffled?How many iterations?How were p-values created?Which multiple comparisons correction did you use?

Analysis steps: 1. Creates a H 0 TS value 2. Repeats the process many MANY times 3. The MANY iterations of H . Precluster threshold affects the cluster correction threshold! Small p-value remove many clusters . Take a hypothesis-testing approach with SPSS, SAS, or R first How can you report your analyses in a methods section?