Transcription
Predictive modeling competitionsmaking data science a sportAnthony GoldbloomCEO, Kagglee-mail anthony.goldbloom@kaggle.comtwitter @antgoldbloom
Global competitionsPredicting HIV viral loadCompetition closes 77%1½ weeks 70.8%State of the art 70%
Diverse experts solving diverse lesForecastingEdmund & AdrianDr. DerekLondon & USAGathererFelipe MaiaUKUppsala UniversityPhilipp arGeraGiuseppe RagusaLjubljanaRomeIvanRussian FederationRobertWarsawChih-Li Sung & Roy TsengPenghu & TaipeiUri BlassTel-AvivJeremy HowardAustralia Thomas MahonyGlen MaherCanberraCanberraEmir DelicAustraliaDr. ChristopherHefele, New YorkCole HarrisTexasChris DuBoisClaudio PerlichPortlandEdmundJason TriggJohn BlatzUSA& AdrianLondon & diJasonBarrabasLee BakerBatimorePennsylvaniaUSALas Cruces,Nan ZhouNMPittsburgh
1. Motivation2. Why host a competition?3. Why compete?4. How it works5. Heritage Health Prize6. Questions
“I keep saying the sexy jobin the next ten years willbe statisticians.”Hal VarianGoogle Chief Economist2009
CrowdsourcingMismatch between those with data andthose with the skills to analyse it
Not MIT, not SAS UoL?Additional slides
1. Motivation2. Why host a competition?3. Why compete?4. How it works5. Heritage Health Prize6. Questions
Tourism Forecasting CompetitionForecast Error(MASE)Existing modelAug 92 weekslater1 monthlaterCompetitionEnd
Chess Ratings CompetitionExisting model (ELO)Error Rate(RMSE)Aug 41 monthlater2 monthslaterToday
Our User Base
Users apply different techniques neural networkslogistic regressionsupport vector machinedecision treesensemble methodsadaBoostBayesian networks genetic algorithmsrandom forestMonte Carlo methodsprincipal component analysisKalman filterevolutionary fuzzy modeling
Benchmarking
25%Successfulgrant applicationsNASA tried, now it’s our turn
Untouched problems
25%Outcomes of a competition to predictthe success of grant applications:Successfulgrant applications-Better identify likely successes toavoid wasting resources onhopeless applications-Identify and communicate thecharacteristics of a successfulapplication to future applicants
Who to hire?
Branding: “we do analytics”
1. Motivation2. Why host a competition?3. Why compete?4. How it works5. Heritage Health Prize6. Questions
Why Participants Compete21Clean, Real world data3Interactions with experts in related fieldsProfessional Reputation & Experience4Prizes
User base
User base
1. Motivation2. Why host a competition?3. Why compete?4. How it works5. Heritage Health Prize6. Questions
123UploadSubmitEvaluate &Exchange
Use the wizard to post a competition
Participants make their entries
Competitions are judged based on predictive accuracy
Competition MechanicsCompetitions are judged on objective criteria
1. Motivation2. Why host a competition?3. Why compete?4. How it works5. Heritage Health Prize6. Questions
NetFlix Prize2006 – 2009 1 million prize50,000 registrations2011 3 million prizeProjected 100,000 registrations
1. Motivation2. Why host a competition?3. Why compete?4. How it works5. Heritage Health Prize6. Questions
What could the world’s bestanalysts find in your data?e-mail anthony.goldbloom@kaggle.comphone 61438400053Photo by gidzy, www.flickr.com/photos/gidzy
Global competitions 1½ weeks 70.8% Competition closes 77% State of the art 70% Predicting HIV viral load