Predictive Modeling Competitions

Transcription

Predictive modeling competitionsmaking data science a sportAnthony GoldbloomCEO, Kagglee-mail anthony.goldbloom@kaggle.comtwitter @antgoldbloom

Global competitionsPredicting HIV viral loadCompetition closes 77%1½ weeks 70.8%State of the art 70%

Diverse experts solving diverse lesForecastingEdmund & AdrianDr. DerekLondon & USAGathererFelipe MaiaUKUppsala UniversityPhilipp arGeraGiuseppe RagusaLjubljanaRomeIvanRussian FederationRobertWarsawChih-Li Sung & Roy TsengPenghu & TaipeiUri BlassTel-AvivJeremy HowardAustralia Thomas MahonyGlen MaherCanberraCanberraEmir DelicAustraliaDr. ChristopherHefele, New YorkCole HarrisTexasChris DuBoisClaudio PerlichPortlandEdmundJason TriggJohn BlatzUSA& AdrianLondon & diJasonBarrabasLee BakerBatimorePennsylvaniaUSALas Cruces,Nan ZhouNMPittsburgh

1. Motivation2. Why host a competition?3. Why compete?4. How it works5. Heritage Health Prize6. Questions

“I keep saying the sexy jobin the next ten years willbe statisticians.”Hal VarianGoogle Chief Economist2009

CrowdsourcingMismatch between those with data andthose with the skills to analyse it

Not MIT, not SAS UoL?Additional slides

1. Motivation2. Why host a competition?3. Why compete?4. How it works5. Heritage Health Prize6. Questions

Tourism Forecasting CompetitionForecast Error(MASE)Existing modelAug 92 weekslater1 monthlaterCompetitionEnd

Chess Ratings CompetitionExisting model (ELO)Error Rate(RMSE)Aug 41 monthlater2 monthslaterToday

Our User Base

Users apply different techniques neural networkslogistic regressionsupport vector machinedecision treesensemble methodsadaBoostBayesian networks genetic algorithmsrandom forestMonte Carlo methodsprincipal component analysisKalman filterevolutionary fuzzy modeling

Benchmarking

25%Successfulgrant applicationsNASA tried, now it’s our turn

Untouched problems

25%Outcomes of a competition to predictthe success of grant applications:Successfulgrant applications-Better identify likely successes toavoid wasting resources onhopeless applications-Identify and communicate thecharacteristics of a successfulapplication to future applicants

Who to hire?

Branding: “we do analytics”

1. Motivation2. Why host a competition?3. Why compete?4. How it works5. Heritage Health Prize6. Questions

Why Participants Compete21Clean, Real world data3Interactions with experts in related fieldsProfessional Reputation & Experience4Prizes

User base

User base

1. Motivation2. Why host a competition?3. Why compete?4. How it works5. Heritage Health Prize6. Questions

123UploadSubmitEvaluate &Exchange

Use the wizard to post a competition

Participants make their entries

Competitions are judged based on predictive accuracy

Competition MechanicsCompetitions are judged on objective criteria

1. Motivation2. Why host a competition?3. Why compete?4. How it works5. Heritage Health Prize6. Questions

NetFlix Prize2006 – 2009 1 million prize50,000 registrations2011 3 million prizeProjected 100,000 registrations

1. Motivation2. Why host a competition?3. Why compete?4. How it works5. Heritage Health Prize6. Questions

What could the world’s bestanalysts find in your data?e-mail anthony.goldbloom@kaggle.comphone 61438400053Photo by gidzy, www.flickr.com/photos/gidzy

Global competitions 1½ weeks 70.8% Competition closes 77% State of the art 70% Predicting HIV viral load