Transcription
EECS 510: Social Media MiningSpring 2015DataMiningEssen,als2:DataMininginPrac,ce, withPythonRosanneLiurosanne.liu@northwestern.edu
Outline WhyPython?IntrotoPythonIntrotoScikit- ‐LearnUnsupervisedLearning– DemoonPCA,K- ‐Means SupervisedLearning– DemoonLinearRegression,LogisGcRegression
Outline WhyPython?IntrotoPythonIntrotoScikit- ‐LearnUnsupervisedLearning– DemoonPCA,K- ‐Means SupervisedLearning– DemoonLinearRegression,LogisGcRegression
WhyPython?What programming languagedo you use for data mining?Source from: http://www.kdnuggets.com/polls/index.html
How much is your salary asanalytics, data mining, datascience professionals?Source from: http://www.kdnuggets.com/polls/index.html
Should data scientist / dataminers be responsible fortheir predictions?Source from: http://www.kdnuggets.com/polls/index.html
WhyPython? WhyPython?NotThinkaboutthescien,st’sneeds:§ Getdata(simulaGon,experimentcontrol)§ Manipulateandprocessdata.§ Visualizeresults.tounderstandwhatwearedoing!§ icaGons,writepresentaGons.
WhyPython? WhyPython?Not– Easy Easytolearn,easilyreadable ScienGstsfirst,programmerssecond– Efficient Managingmemoryiseasy–ifyoujustdon’tcare– AsingleLanguageforeverything AvoidlearninganewsoXwareforeachnewproblem
MoretoTakeAway FreedistribuGonfromhZp://www.python.org ilyforyou,thiscommunitycanwrite Twopopularversions,2.7or3.x Asingle- ‐clickinstaller:EnthoughtCanopy PrepareyourselfforcodeindentaGonheaven
- ‐likefeatureshZp://ipython.orgScikit- ‐Learn,MLresourceandlibraryhZp://scikit- ‐learn.org/dev/index.html earn2/ More:mlpy,PyBrain,Orange,Scrapy,
Outline WhyPython?IntrotoPythonIntrotoScikit- ‐LearnUnsupervisedLearning– DemoonPCA,K- ‐Means SupervisedLearning– DemoonLinearRegression,LogisGcRegression
TheUseofPython:Simpledemos0–PythonIntro.ipynb
Outline WhyPython?IntrotoPythonIntrotoScikit- ‐LearnUnsupervisedLearning– DemoonPCA,K- ‐Means SupervisedLearning– DemoonLinearRegression,LogisGcRegression
WhatisScikit- ‐learn APythonMachineLearningLibrary Focusedonmodelingdata jectin2007. 2010. reFoundaGon. foreyoucanusescikit- ‐learn.
Outline WhyPython?IntrotoPythonIntrotoScikit- ‐LearnUnsupervisedLearning– DemoonPCA,K- ‐Means SupervisedLearning– DemoonLinearRegression,LogisGcRegression
TheuseofScikit- ‐Learn:unsupervisedlearningdemos
PCASummary oodfirstinsightintodataset IdenGfyimportantvariablesinprojecGonmatrixW:
1–PCA.ipynb
K- ‐MeansAlgorithm
2–kmeans.ipynb
Outline WhyPython?IntrotoPythonIntrotoScikit- ‐LearnUnsupervisedLearning– DemoonPCA,K- ‐Means SupervisedLearning– DemoonLinearRegression,LogisGcRegression,kNN
TheuseofScikit- ‐Learn:supervisedlearningdemos
LinearRegression1DTo find w and b, minimize the error:2D
pynb
Logis,cRegression
Logis,cRegression
4–LogisGcRegression.ipynb
NonlinearProblems able,but
KNearestNeighbors ClassificaGon:samesetupaslogisGcregression. ︎Verysimplebutpowerfulidea:Doasyourneighborsdo. threenearest,.)point(s)inthetrainingdataforalabel. ︎Usualdistancemeasure:Euclideandistance
SimpleAlgorithm Pickak,forexamplek 3.Wanttoclassifynewexamplex.Computedi d(xi,x),i.e.d(xi,x) xi x rsmostoXenamongyi0,yi1,yi2.
5–kNN.ipynb
Data Mining Essenals 2: Data Mining in Pracce, with Python Rosanne’Liu’ rosanne.liu@northwestern.edu’ E