IMS Presidential Address Teaching Statistics In The Age Of Data Science

Transcription

IMS Presidential AddressTeaching Statisticsin the Age of Data ScienceJon A. WellnerUniversity of Washington, SeattleIMS Annual Meeting, & JSM, Baltimore, July 31, 2017

IMS Annual Meeting and JSM Baltimore

STATISTICS and DATA SCIENCE What has happened and is happening?B Changes in degree structures:many new MS degree programs in Data Science.B Changes in Program and Department Names;2 programs with the name“Statistics and Data Science”Yale and Univ Texas at Austin.B New pathways in Data Science and Machine Learningat the PhD level: UW, CMU, and . . . Changes (needed?) in curricula / teaching?IMS Presidential Address, Baltimore, 31 July 20171.2

?Full Disclosure:1. Task from my department chair:a. Review the theory course offerings in the Ph.D. programin Statistics at the UW.b. Recommend changes in the curriculum, if needed.2. I will be teaching Statistics 581 and 582,Advanced Statistical Theory during Fall and Winterquarters 2017-2018. What should I be teaching?IMS Presidential Address, Baltimore, 31 July 20171.3

Exciting times for Statistics and Data Science: Increasing demand! Challenges of “big data”:B challenges for computationB challenges for theory Changes needed in statistical education?IMS Presidential Address, Baltimore, 31 July 20171.4

Exciting times for Statistics and Data Science: Increasing demand!Projections, Bureau of Labor Statistics, 2014-24:Job Description StatisticiansMathematiciansSoftware DeveloperComputer and Information Research ScientistsBiochemists and BiophysicistsPhysicists and AstronomersChemists and Materials ScientistsComputer ProgrammerIMS Presidential Address, Baltimore, 31 July 2017Increase %34%21%17%11%8%7%3%-8%1.5

The increasing demand for statisticians raises questions:Q1 Can we meet the demand?Q2 How should we be changing to meet the increased demand?Q3 What changes should we be making in the teaching ofstatistics to attract the best and brightest students?Q4 What should we be teaching?IMS Presidential Address, Baltimore, 31 July 20171.6

Exciting times for Statistics and Data Science: Challenges of “big data”?B challenges for computation and analysis?B challenges for theory?B dismissed by Donoho as a distinction betweenSTAT & DS. nce.pdfIMS Presidential Address, Baltimore, 31 July 20171.7

Exciting times for Statistics and Data Science:Changes needed in statistical education? changes in degree structures? changes in curricula?B high schoolB all college/universityB undergraduate majorsB graduate: MS degreeB graduate: PhD degree changes in the modes of teaching?IMS Presidential Address, Baltimore, 31 July 20171.8

Differing Views:from (2013) Report of the London Workshop on theFuture of Statistical Sciences Marie Davidian, past President of the ASA:“Statistical sciences are at a crossroads . ” and change isneeded. Terry Speed, past President of the IMS:“We have a great tradition . ” and . “we are in thisbusiness for the long term”. . we need to “evolve andadapt”. Richard De Veaux:“Statistics education remains mired in the 20th (some wouldsay the 19th) century.”IMS Presidential Address, Baltimore, 31 July 20171.9

Differing Views:(from (2013) Report of the London Workshop on theFuture of Statistical Sciences Marie Davidian, past President of the ASA:“I believe that the statistical sciences are at a crossroads, andthat what we do currently . will have profound implications forthe future state of our discipline. The advent of big data, datascience, analytics, and the like requires that we as a disciplinecannot sit idly by . but must be proactive in establishing bothour role in and our response to the ‘data revolution’ and developa unified set of principles that all academic units involved inresearch, training, and collaboration should be following. . Atthis point, these new concepts and names are here to stay, and itis counterproductive to spend precious energy on trying to changethis. We should be expending our energy instead to promotestatistics as a discipline and to clarify its critical role in any datarelated activity.”IMS Presidential Address, Baltimore, 31 July 20171.10

Differing Views:from (2013) Report of the London Workshop on theFuture of Statistical Sciences Terry Speed, past President of the IMS:“Are we doing such a bad job that we need to rename ourselvesdata scientists to capture the imagination of future students,collaborators, or clients? Are we so lacking in confidence . thatwe shiver in our shoes the moment a potential usurper appears onthe scene? Or, has there really been a fundamental shift aroundus, so that our old clumsy ways of adapting and evolving are nolonger adequate? . I think we have a great tradition and a greatfuture, both far longer than the concentration span of fundingagencies, university faculties, and foundations. . We might missout on the millions being lavished on data science right now, butthat’s no reason for us to stop trying to do the best we can atwhat we do best, something that is far wider and deeper thandata science. As with mathematics more generally, we are in thisbusiness for the long term. Let’s not lose our nerve.”†† January2014 AMSTAT NewsIMS Presidential Address, Baltimore, 31 July 20171.11

Back tracking. History part 1:Harold Hotelling dates: 1895: b. Minnesota; 1904: moved to Washington. 1915-19: BA in Journalism, UW Seattle (& Army) 1920-21: MS Math, UW Seattle 1924: PhD Math, Princeton 1924-31: Food Research Institute andAssistant Professor, Mathematics (1927-31), Stanford 1929: 6 months with R. A. Fisher at Rothamsted. 1931-46: Columbia, Economics (& SRG) 1946-1973: UNC, Statistics 1973: d. Chapel Hill, NCIMS Presidential Address, Baltimore, 31 July 20171.12

Back tracking. History part 1:IMS Presidential Address, Baltimore, 31 July 20171.13

IMS Presidential Address, Baltimore, 31 July 20171.14

Hotelling’s 1940 paper: “Teaching of statistics”IMS Invited talk, Hannover, New Hampshire H. Hotelling (chair), Walter Bartky, W. E. Deming, M.Friedman, and P. Hoel. Hotelling’s talk laid out the difficulties involved in theteaching of statistics as of 1940:B Failure to recognize statistics as a science requiringspecialists to teach it.B Shortage of qualified instructors. Strongly influenced Neyman Discussion by W. E. Deming:applications;raises issues relevant for“Above all, a statistician must be a scientist. Ascientist does not neglect any pertinent information. .”IMS Presidential Address, Baltimore, 31 July 20171.15

As noted by Ingram Olkin:“His (Hotelling’s) 1940 paper on the teaching of statisticshad a phenomenal impact. Jerzy Neyman stated thatit was one of the most influential papers in statistics.Faculty attempting to convince university administratorsto form a Department of Statistics often used this paperas an argument why the teaching of statistics should bedone by statisticians and not by faculty in substantivefields that use statistics.”IMS Presidential Address, Baltimore, 31 July 20171.16

Two other works by Hotelling on teaching statistics: The teaching of statistics: A report of the IMS Committeeon the Teaching of Statistics.H. Hotelling (chair), Walter Bartky, W. E. Deming, M.Friedman, and P. Hoel. Ann. Math. Statist. 19 (1948),95 - 115. The place of statistics in the university. Proc. Berk. Symp.Math. Statist. Prob. (1949). Part II (by Hotelling) of the1948 Annals paper.IMS Presidential Address, Baltimore, 31 July 20171.17

Part I of the 1948 Committee Report addressed the followingquestions:(1) Who are the prospective students of statistics?(a) All college (university) students.(b) Future consumers of statistics.(c) Future users of statistical methods.(d) Future producers and teachers of statistical methods.(2) What should they be taught?(3) Who should teach statistics?(4) How should the teaching of statistics be organized?(5) What should be done about adult education?IMS Presidential Address, Baltimore, 31 July 20171.18

Part II of the 1948 report was written by Hotelling and reflectedhis views:IMS Presidential Address, Baltimore, 31 July 20171.19

IMS Presidential Address, Baltimore, 31 July 20171.20

IMS Presidential Address, Baltimore, 31 July 20171.21

IMS Presidential Address, Baltimore, 31 July 20171.22

The (1940) paper and (1948) committee report Ann. Math.Statist. 19 (1948), 95-115, were reprinted in Stat. Sci. 3(1988), 63 - 108. Followed by a discussion by: David S. Moore, J. V. Zidek,Kenneth J. Arrow, Harold Hotelling Jr, Ralph Bradley, W.Edwards Deming, Shanti S. Gupta, and Ingram Olkin. Discussion(s) reflected the long-standing (and creative tensions) between theory/mathematics and applications/dataanalysis. S. S. Gupta’s view of Hotelling’s papers:“He rightly visualized the academic statistician as atoolmaker who ‘must not put all his time on using thetools he makes’, but must focus his/her attention onthe tools themselves.”IMS Presidential Address, Baltimore, 31 July 20171.23

Review of Hotelling committee report (1948) and another paperon teaching of statistics by a committee of the RSS by TrumanL. Kelley, Professor of Education, Harvard: From Kelly’s review:“It seems to the reviewer that there is implicit in theBritish recommendation an induction of the studentinto statistics via the subject matter of his field ofspecialization, and in the American an induction via logic,including principles of mathematics and probability. It isneedless to say that these approaches are far asunder.”IMS Presidential Address, Baltimore, 31 July 20171.24

From Hotelling’s paper (page 466):“Statistical theory is a big enough thing in itself toabsorb the full-time attention of a specialist teachingit, without his going out into applications too freely.Some attention to applications is indeed valuable, andperhaps even indispensable as a stage in the training ofa teacher of statistics and as a continuing interest. Butparticular applications should not dominate the teachingof the fundamental science, any more than particulardiseases should dominate the teaching of anatomy andbacteriology to pre-medical students.”These two quotes are a small sample of the long-running tensionswithin statistics and statistics education. In my view, thesetensions are an inherent part of the process of creating newstatistical methods and perspectives.IMS Presidential Address, Baltimore, 31 July 20171.25

Kelley continues:“The American committee, by omission and by inclusion, revealswhat it considers to be preparatory background for students ofstatistics. It at no point cites knowledge of data in some scientificfield as essential. . . . The American committee deplores the generallack of mathematical competence of most teachers of statisticsin different subject matter fields.This is deplorable as is theirlack of knowledge of the genius of data in their fields. However,the progress of recent decades should make one optimistic, andthese two committee reports should encourage college presidentsto strengthen and broaden the instruction in both mathematical andapplied statistics.”IMS Presidential Address, Baltimore, 31 July 20171.26

Adaptation of Efron & Hastie (2016), Epilogue, p. 448Applications1900: Pearson Chi-square statistic19th Century1908: Student's t statistic19001925: Fisher's estimation paper19081933: Neyman-Pearson; optimal testing19251937: Neyman's optimal CI's1950: Wald's decision theory1933,19371950MathematicsComputationIMS Presidential Address, Baltimore, 31 July 20171.27

Overlay 1 for Efron & Hastie (2016), Epilogue, p. 448Applications1900: Pearson Chi-square statistic19th Century1908: Student's t statistic19001925: Fisher's estimation paper19081930: Hotelling Max Lik19251933: Neyman-Pearson; optimal testing19301936: Doob Max Lik1933,193619371937: Neyman's optimal CI's1946194819501946: Cramer's book1950: Wald's decision theoryMathematicsComputationIMS Presidential Address, Baltimore, 31 July 20171.28

History part 2 (shortened)A second set of important developments: John Tukey’s (1962) paper, The future of data analysis.Tukey called for a revamping of academic statistics, andpointed to a new science focused on data analysis. Bill Cleveland (1993) and John Chambers (2001) developedTukey’s ideas further. Leo Breiman’s (2001) Two cultures .paper clearlydelineated the differing approaches to data analysis whichdeveloped in the years since Tukey (1962).B Predictive modeling; Common Task FrameworkB Generative modeling; inference Donoho (2015), 50 years of Data Science, gives a guideto this history, explains the key role of the Common TaskFramework, and provides an updated road map to what hecalls Greater Data Science.IMS Presidential Address, Baltimore, 31 July 20171.29

Adaptation of Efron & Hastie (2016), Epilogue, p. 448Applications2016b: genomics, biol2016a: data science19002000: random forests, ML19082001: microarrays; large scale inference19252016b1995: FDR &LASSO20001933,19372016a1979: bootstrap/MCMC20011979 199519721972: Cox proportional hazards19621962: Tukey, Data Analysis1950MathematicsComputationIMS Presidential Address, Baltimore, 31 July 20171.30

Fast forward to 2002-2004: By the beginning of the 21st century the era of datascience, “big data”, and machine learning was well underway.Breiman’s (2001) paper clearly delineated the differences inapproaches to data analysis which had developed in the yearssince Tukey (1962). In May 2002, the NSF hosted a workshop on future challenges and opportunities for the statistics community.IMS Presidential Address, Baltimore, 31 July 20171.31

The resulting “Report on the Future of Statistics” by BruceLindsay, Jon Kettenring, and David O. Siegmund (2004):(a) addressed features of the statistical enterprise relevant tothe NSF;(b) biostatistics was not included;(c) teaching of statistics was not addressed explicitly, butindirectly through “manpower” problems;(d) identified opportunities and needs for the “core of statistics”:“If there is exponential growth in data collected and inthe need for data analysis, why is “core research relevant?. . . Because unifying ideas can tame this growth, and thecore area of statistics is the one place where these ideascan happen and be communicated throughout science.”IMS Presidential Address, Baltimore, 31 July 20171.32

Comparisons: then (1940) and now (or 2015)Of course there have been big changes both in statistics and inthe world of science in general since Hotelling’s time and evensince the Lindsay-Kettenring-Siegmund report of 2004. Here isan oversimplified summary:IMS Presidential Address, Baltimore, 31 July 20171.33

Comparisons# of departmentsof statistics# of departmentsof biostatistics# of graduate students,statistics# of graduate students,biostatisticsIMS membershipComputer clockspeedTerminology/Department Namesthen (1940)5 10now (2015) 60 1 43 50?4597 (24%) 10?1960 (14%) 100?5 10 hzZuse (1941)Mathematical StatisticsApplied StatisticsIMS Presidential Address, Baltimore, 31 July 20173500 2.7 GhzMac PowerbookStatisticsData AnalysisData Science1.34

IMS Presidential Address, Baltimore, 31 July 20171.35

IMS Presidential Address, Baltimore, 31 July 20171.36

Source: AMSTAT News, October 2016; Steve PiersonIMS Presidential Address, Baltimore, 31 July 20171.37

MS curricula in Data ScienceThe MS in Data Science and Machine Learning:What is the curriculum?Donoho (2015) section 7 reviews a typical such Data Science MSdegree curriculum: the core of the MS Data Science curriculumincludes:Research Design and Application for Data and AnalysisExploring and Analyzing DataStoring and Retrieving DataApplied Machine LearningData Visualization and CommunicationIMS Presidential Address, Baltimore, 31 July 20171.38

while the advanced courses include:Experiments and Causal InferenceApplied Regression and Time Series AnalysisLegal, Policy, and Ethical Considerations for Data ScientistsMachine Learning at ScaleScaling up! Really big data.Capstone course (with data analysis project)The program at Berkeley is run by the Information School.IMS Presidential Address, Baltimore, 31 July 20171.39

At (my home) the University of Washington, the DS MSprogram is run by the E-Science Institute (with co-operationfrom Statistics, CS, and Biostatistics):Introduction to Statistics and ProbabilityData Visualization & Exploratory AnalyticsApplied Statistics and Experimental DesignData Management for Data ScienceStatistical Machine Learning for Data ScientistsSoftware Design for Data ScienceScalable Data Systems and AlgorithmsHuman-Centered Data ScienceData Science Capstone ProjectThere is clear overlap in both lists with courses offered ina traditional statistics MS program, but with a number ofsubstitutions from a Computer Science. Stat & Biostat Faculty:Adrian Dobra, Zaid Harchaoui, Brian Leroux.IMS Presidential Address, Baltimore, 31 July 20171.40

MS Programs in Data Science and Analytics: survey / interviewsin AMSTAT News, April and June 2017:University of TennesseeGeorge Mason UniversityBentley UniversityUniversity of MinnesotaNC State UniversityPenn State UniversityUniversity of VermontUniversity of Wisconsin-MadisonSouth Dakota State UniversityHarvardQuery:“Do you have any advice for institutions considering the establishment of such a degree?”Reply: Mark Craven, Univ of Wisconsin-Madison:“I would advise any institution considering this area to build onexisting partnerships between statistics, biostatistics, computersciences, and biomedical informatics. No one unit can or should“own” this area, so proceeding in a broad and inclusive waymakes the most sense.”IMS Presidential Address, Baltimore, 31 July 20171.41

Donoho (2015) gives an analysis of the Berkeley Data Sciencecurriculum in the context of Tukey’s critiques and writings.Donoho writes:‘‘Although my heroes Tukey, Chambers, Cleveland, and Breimanwould recognize positive features in these programs, it’sdifficult to say whether they would approve of their long-termdirection - or if there is even a long-term direction tocomment about. . . Data Science Masters curricula are compromises:taking some material out of a Statistics masters programto make room for large database training; or equally, astaking some material out of a database masters in CS andinserting some statistics and machine learning. Such acompromise helps administrators to quickly get a degreeprogram going, without providing any guidance about thelong-term direction of the program and about the researchwhich its faculty will pursue. What long-term guidancecould my heroes have provided?’’IMS Presidential Address, Baltimore, 31 July 20171.42

Ph.D.curricula in StatisticsAt the UW: Ph.D. program has four possible tracks: Normal or Basic track.Requirements: 581-582-583 & 570-571 Statistical genetics Statistics for the Social Sciences Machine learning and big data:B 570, 581-582 (advanced stat theory),B ML/BD Core:(i) Foundational ML: STAT 535(ii) One advanced ML course: STAT 538 or STAT 548(iii) One CSE course: CSE 544 (Databases)or CSE 512 (Visualization)(iv) One elective:* Advanced Statistical Learning (STAT 538)IMS Presidential Address, Baltimore, 31 July 20171.43

Ph.D.curricula in Statistics*****Machine Learning for Big Data (STAT 548)Graphical Models (CSE 515)Visualization (CSE 512)Databases (CSE 544)Convex Optimization (EE 578)IMS Presidential Address, Baltimore, 31 July 20171.44

UW PhD student numbers by tracks: 2001-2016trackNormal track, StatNormal track, BiostStatGen, StatStatGen, BiostStat in Soc Sci:ML-BD:total, Stattotal 52IMS Presidential Address, Baltimore, 31 July 2017Total1201521686141561601.45

My heroes: H. Chernoff J. L. Doob R. A. Fisher Harold Hotelling Wassily Hoeffding Jaroslav Hajek Jack Kiefer Lucien Le Cam Charles Stein Abraham WaldIMS Presidential Address, Baltimore, 31 July 20171.46

Future areas needing more math: manifold learning topological data analysis nonstandard data types: functions, trees, images, .IMS Presidential Address, Baltimore, 31 July 20171.47

What’s in Stat 581 - 582 - 583, Advanced Stat Theory now?Outline for Stat 581: Inequalities; basic asymptotic theory in statistics. Examples:B robustness (or lack of robustness) of normal theory testsB chi-square statistic and power of chi-square tests underfixed and local alternatives.B limit theory for fixed dimension linear regressionB limit theory for correlation coefficientsB limit theory for empirical distributions and sample quantiles.B examples from survival analysis / censored data Lower bounds for estimationB Multiparameter Cramér - Rao lower bounds.IMS Presidential Address, Baltimore, 31 July 20171.48

B Superefficiency & introduction to Hajek-LeCam convolution theorem and local asymptotic minimax theorems.B Simple Lower bound Lemma via two point inequalities. Classical (and nonparametric) maximum likelihood:B Existence; empirical d.f.MLEs& and empirical measure asB Algorithms, one step approximations, and EMB LR, Wald, and Rao tests: fixed and local alternatives.B Brief introduction to agnostic viewpoint:model fails?IMS Presidential Address, Baltimore, 31 July 2017what if the1.49

Outline for Stat 582: Elementary Decision Theory: Bayes rules, minimax rules, andconnections. Bayes theory, inadmissibility, and empirical Bayes. Optimal tests and tests optimal in subclasses: eliminatingnuisance parameters by conditioning and invariance.IMS Presidential Address, Baltimore, 31 July 20171.50

Unifying aspects of the “statistics core”: (a)Notions ofoptimality (b)design of experiments (c)(survey) samplingtheory (d) classical and modern multivariate analysis. (1) asymptotic theory: LLNs and CLTs. (2) uniform laws of large numbers and uniform central limittheorems i.e. empirical process theory. (3) optimality theory via upper bounds and lower bounds(parametric, semiparametric, and nonparametric) (4) inequalities (exponential, basic, oracle) (5) convexity theory (6) optimization theory.IMS Presidential Address, Baltimore, 31 July 20171.51

? Stat 581-582 this year ?New in Stat 581 - 582 this coming year?New?New?New?New?Large scale hypothesis testing and FDR’s?More on empirical Bayes?More on convexity?More on empirical process theory?(What will need to be reduced or deleted?)I don’t know exactly yet, but I’m working on it . . . . . and on the report to my chair.IMS Presidential Address, Baltimore, 31 July 20171.52?

IMS Data Science Group! Initiated in 2015 by Bin Yu and Richard Davis with assistancefrom David Dunson. New Group Coordinators: Sofia Olhede (s.olhede@ucl.ac.uk)and Patrick Wolfe (p.wolfe@ucl.ac.uk). Watch for further developments soon!IMS Presidential Address, Baltimore, 31 July 20171.53

From Efron & Hastie (2016), Preface, page xvii:“Useful disciplines that serve a wide variety of demandingclients run the risk of losing their center. Statistics hasmanaged, for the most part, to maintain its philosophicalcohesion despite a rising curve of outside demand. Thecenter of the field has . moved in the past sixtyyears, from its traditional home in mathematics and logictoward a more computational focus.”From Efron & Hastie (2016), Epilogue, page 447:“It is the job of statistical inference (theory) to connect ‘dangling algorithms’ to the central core of wellunderstood methodology. The connection process isalready underway.”IMS Presidential Address, Baltimore, 31 July 20171.54

My Views: Embrace and encourage data science! Continue evolving the curriculum to teach the unifyingthemes of statistical research. Keep doing what statisticians do best: question, question,question . and then provide the best answers possible basedon the available data. Attract the best and brightest students to research work instatistics. Teach what we know!IMS Presidential Address, Baltimore, 31 July 20171.55

IMS Presidential Address, Baltimore, 31 July 20171.56

bacteriology to pre-medical students." These two quotes are a small sample of the long-running tensions within statistics and statistics education. In my view, these tensions are an inherent part of the process of creating new statistical methods and perspectives. IMS Presidential Address, Baltimore, 31 July 20171.25