Learning From Data: The Art Of Statistics

Transcription

Learning from Data: the art of statistics#LSEStatsDavid SpiegelhalterWinton Professor of the Public Understanding of Risk, University of CambridgeChair: Fiona SteeleProfessor in Statistics and Deputy Head of DepartmentHosted by The Department of Statistics, LSE

Learning from data: the artof statisticsDavid SpiegelhalterChair of the Winton Centre for Risk & Evidence Communication,University of Cambridgeex-President, Royal Statistical Society (2017-2018)LSE 2019

out March 28th!

Winton Centre for Risk and Evidence CommunicationWintonCentre@maths.cam.ac.uk

Numbers are often used topersuade rather than inform

Data does not speak for itself

The traditional statistics course Describing data with summary statisticso dull Probability theory for drawing random observation from apopulation distributiono difficult and mathematical Probability theory for distributions of summary statisticso mathematical and incomprehensible Formulae for statistical testso mathematical, unmotivated, just a bag of tools (If lucky) Examples of using statistical models in real life.

A ‘modern’ statistical course Motivate by problem solving Start with visualisation and exploring data Focus on what can be reasonably learned from data, biases indata, concluding causation, etc Models and algorithms Assessing uncertainty through re-sampling data (‘bootstrap’) Probability theory as neat way of turning random variationinto uncertainty about what is true Hypothesis testing and its potential problems Bayesian methods

All these rather abstract, challenging, ideasare there to help answer real questions The ’data cycle’ eg PPDAC (promoted in New Zealand)

Looking at dataWhat was the pattern of Harold Shipman’smurders?

‘I have nothing to hide’Dr Harold Shipman, general practitioner,on his arrest in September 1998

Shipman Inquiry July2002:215 definite victims,45 probable

8070605040age of victim90WomenMen197519801985year19901995

Looking at dataWhat was the pattern of Harold Shipman’smurders? Problem: can more detail tell us more about what Shipman did? Plan: compare actual times at which his patients died with the timesof deaths recorded by other local GPs Data: a huge exercise requiring examination of death certificates Analysis: simple plotting .

Peopledie at allhours

Peopledie at allhours- but notShipman’s victims

Inference and biasHow many sexual partners have people in Britainhad in their lifetime? Problem: cannot know this as a fact Plan: survey in which people are carefully asked about the sexualactivity (Natsal) Data: reports of numbers of partners Analysis: plotting and summary statistics

How many sexual partners do people report?Natsal-3, 2010, n 2000

Inference and biasHow many sexual partners have people in Britainreally had in their lifetime?Reported number of sexual partners in lifetimeMeanMedianModeRangeInter-quartile rangeStandard deviationMen aged Women aged35–4435–4414.38.585110 to 5000 to 5504 to 183 to 1024.219.7 Conclusions: can we generalise this to the whole population?

Induction: the stages in generalising from data 1 to 2. How reliable are thereports? Poor memory, socialacceptability bias etc 2 to 3. How representative isthe sample of those eligiblefor the study? Random sampling of families(soup), 66% response 3 to 4. How close does thestudy population match thetarget population? No people in institutions, etc

Causation (or correlation)The power of the press release .

abstract: We observed consistent associations betweenhigher socio-economic position and higher risk ofglioma press release High levels of education linked to heightenedbrain tumour risk Daily Mirror

Scientists might even have an agenda

we performed a search on the online Web of Sciencedatabase using the keywords [insect*] AND [declin*]AND [survey], which resulted in a total of 653publications.

Predictive analytics

Regression, prediction and algorithmsWho was the luckiest person on the Titanic?

Ilfracombe, North Devon Database of

William Somerton’s entry in a public database of 1309 passengers (39% survive) Can we construct an algorithm to predict whosurvives?

Copy structure of Kaggle competition (currently over59,000 entries)Split data-base of 1309 passengers at random intotraining set (70%)test set (30%)

Unsurprisingfactors predictsurvival

A simple classification treeYesNoTitle Mr?YesYes 3rd Class ?At least 5in family?3rd Class ?NoYesRare title?NoEstimated chanceof survivalEstimated chanceof survivalEstimated chanceof survivalEstimated chanceof survivalEstimated chanceof survival16%3%60%37%93%

How good is my algorithm? ‘Accuracy’ is a very crude way of judging analgorithmic prediction Better to use the probabilities provided If probability p is given to an event X (0,1), then theBrier score is (X– p)2

Performance of a range of methods on the test setMethodAccuracy(high is good)Brier score(low is good)Everyone has a 39% chance of surviving 0.6390.232All females survive, all males do not0.7860.214Simple classification treeClassification tree (over-fitted)0.8060.8060.1390.150Logistic regressionRandom forestSupport Vector Machine (SVM)Neural networkAveraged neural 460.142

Who was the luckiest person on the Titanic? Karl Dahl, a 45-year-old Norwegian/Australianjoiner travelling on his own in third class, paid thesame fare as Francis Somerton Had the lowest average Brier score amongsurvivors – a very surprising survivor He apparently dived into the freezing water andclambered into Lifeboat 15, in spite of some onthe lifeboat trying to push him back. Hannah Somerton was left just 5, less thanFrancis spent on his ticket.

Statistical methods are not always used well.

The mysteries of the P-value P-value: a measure of the conflict between the data and a‘null hypothesis’ of no effect Specifically, P probability of getting such an extremeresult, were the null hypothesis true. Not the probability of the null hypothesis Traditional threshold of 5%, to declare ‘statisticallysignificant’ Not significant does not mean ‘no effect’ If many tests, or crucial decision, use more stringentthreshold

Rare example of accurate reporting ofmeaning of P-value

So what did Andromeda find? ‘Two-sided P 0.06’ i.e. the probability of observing such a bigimprovement, were there no effect, is 0.03 Could say there is 97% confidence of improvement. So what is the authors’ conclusion?

But just last week . Not against P-values Just their dichotomisation

When might a split into ‘significant’ /‘not-significant’ be more reasonable? Where a decision has to be made, e.g. Drug regulation Monitoring the performance of a list ofcentres/hospitals/doctors – when tointervene?

Hypothesis testingCould Harold Shipman have been caught earlier? Using mortality rates from local GPs, calculate howmany deaths he would have been expected toobserve each year, under the null hypothesis that hismortality rates were normal. Subtract expected from observed number to getexcess mortality

1989Year19871985198319811979-201977Cumulative observed – expected mortality160(NB: Shipman Inquiry total of definite or probable victims:189 female 65, 55 male over 65)

Hypothesis testingCould Harold Shipman have been caught earlier? But when to ‘blow the whistle’? This are two possible types of error Type I error: falsely accuse an innocent person (the nullhypothesis) Type II error: miss someone with true increased risk Generally, we want to control the probability of a Type I error at a low value (a) collect enough data to make Type II errors rare (b)

Shipman: “Sequential probability ratio test” (SPRT)older females would have set off ‘alarm’ in 1985, after only 40 deaths120Male80Female60alpha beta 0.00000140alpha beta 0.0001alpha beta 11979-201977SPRT statistic100

Probability and Bayes

Bayes theoremthe initial odds for a hypothesisxthe likelihood ratio the final odds for a hypothesis

Bayes theorem Suppose 1,000 possible perpetrators of a crime, plusone suspect The initial odds that a suspect is guilty 1 / 1,000 The ‘likelihood ratio’Pr(evidence suspect guilty) 1,000,000Pr(evidence someone else did it) After evidence is considered, final odds that a suspectis guilty1,000𝑖 1,000,0001 11,000

Probability and BayesWhat is the probability that the skeleton in aLeicester car park was really Richard III?

A recent case On Saturday 25 August 2012, archeologists started digging in a carpark in Leicester – the site of Grey Friars friary In a few hours they found their first skeleton This was later claimed to be Richard III

probability of evidence, if skeleton were Richard IIILikelihood ratio probability of evidence, if someone else

Suggested ‘verbal equivalents’ for bands oflikelihood ratios

EvidenceLikelihood ratio(conservativeestimate)Radiocarbon dating AD1456–1530Age and sex of skeletonScoliosisPost-mortem wounds521242Weak supportModerately strong supportModerate supportmtDNA matchY chromosome not matching4780.2Moderately strong supportWeak evidence againstCombined evidence2Verbal equivalent6.5 millionWeak supportMore than extremely strongsupportResearchers claimed at least 0.999994 probability thatthey had found Richard III

When communication goes wrong.

Book cover

How often do (opposite sex) couples reporthaving sex?199020002010

When I said all this in a talk .

Why do old men have big ears?

Potentially a very misleadinggraphic! When comparing, need toacknowledge that tested onsame cases Calculate differences and theirstandard error How confident can we be thatsimple CART is best algorithm?

The traditional statistics course Describing data with summary statistics o dull Probability theory for drawing random observation from a population distribution o difficult and mathematical Probability theory for distributions of summary statistics o mathemat