Machine Learning Introduction

Transcription

Machine LearningIntroductionJeff HowbertIntroduction to Machine LearningWinter 20141

Course logistics (1)zCourse: CSS 581, Introduction to Machine Learning– course ructor: Jeff Howbert– email:peaklist@u.washington.edu (preferred)– phone:(206) 669-6629 [ cell ]– office:UW1-040– office hours: to be determined– faculty website:http://faculty.washington.edu/peaklistJeff HowbertIntroduction to Machine LearningWinter 20142

Course logistics (2)zExercises– about 7 over quarter– mix of problem sets, hands-on tutorials, minor coding– 25% of gradezProjects– 3 projects– each 25% of gradezGrading will be on a curveJeff HowbertIntroduction to Machine LearningWinter 20143

Course logistics (3)zTextbook: Introduction to Data Mining, Pang-Ning Tan,Michael Steinbach, and Vipin Kumar, Addison-Wesley,2006zProgramming language: MATLAB– For both exercises and programming projects– Available on CSS departmental Linux machines– Student license 99, plus 29 for Neural NetworktoolboxJeff HowbertIntroduction to Machine LearningWinter 20144

Goals for coursezPrimary– Hands-on experience with a variety of commonmachine learning techniques and application domains– Enough grounding in theory to design applicationsand interpret results intelligently zprobability, statisticsSecondary– Hands-on experience with a high-level scientificcomputing language– Ability to implement working code directly from amathematical specificationJeff HowbertIntroduction to Machine LearningWinter 20145

Machine learningBroad definition:Automated discovery of patterns in data by acomputer.This is learning, because computer is given aninitial pattern-recognition model and some data,and figures out how to make the model better.This is machine, because computer learnsautomatically, without intervention from humans(other than selection of initial model and data).Jeff HowbertIntroduction to Machine LearningWinter 20146

Why is machine learning important?zzzData in many domains is huge– Thousands to billions of data samples– Hundreds to millions of attributes– Impossible for human analysts to see patterns acrossso much dataPatterns in many domains are subtle, weak, buried innoise, or involve complex interactions of attributes– Often very difficult for human analysts to findIn some domains discovery and use of patterns musthappen in real time, e.g. in streaming data– Human analysts could never keep upJeff HowbertIntroduction to Machine LearningWinter 20147

Machine learning applications (1)zCommercial– Targeted marketing: understand purchasingpatterns of individuals or groups web-based advertising– Recommender systems: help people finditems they will like– Fraud detectionz Finance– Predict movements in markets– Portfolio risk managementJeff HowbertIntroduction to Machine LearningWinter 20148

Machine learning applications (2)Natural language processing– Speech recognition– Machine translation– Document classification and retrieval (books,email, web pages)– Sentiment analysisz Computer vision– Optical character recognition– Face recognition– Image classification and retrievalzJeff HowbertIntroduction to Machine LearningWinter 20149

Machine learning applications (3)zIT– Network intrusion detection– Spam filteringzRoboticszManufacturing process controlzSocial mediaJeff HowbertIntroduction to Machine LearningWinter 201410

Machine learning applications (4)zScientific– Remote sensing networks:atmosphere, ocean, fresh-water,land-based, satelliteweather and climate modeling environmental management resource management – Biomedical: gene sequencing,gene expression, epidemiology,disease predictionJeff HowbertIntroduction to Machine LearningWinter 201411

Machine learning careerszzzzzA.k.a. predictive analytics, business analytics, datamining, data science, quantitative modelingDemand continues to outstrip supplyNot necessary to have Ph.D.For every “analyst” position, there are several tightlyallied positions in software dev, engineering, databases,cloud, etc.In demand with local employers– Amazon, Google, Microsoft, Facebook, Zillow– Fred Hutchinson Cancer Research Center, many labsat University of Washington– Many smaller companies, including startupsJeff HowbertIntroduction to Machine LearningWinter 201412

Demo: Google Goggles.\videos\Google Goggles.wmvon web:http://www.youtube.com/watch?v Hhgfz0zPmH4http://www.youtube.com/watch?v 8SdwVCUJ0QEhttp://techtalks.tv/talks/54457/Jeff HowbertIntroduction to Machine LearningWinter 201413

Demo: autonomous helicopter flight.\videos\Autonomous Helicopter Stanford University AI Lab.flvon web:http://heli.stanford.eduJeff HowbertIntroduction to Machine LearningWinter 201414

Demo: Xbox Kinect motion capture./videos/kinectresearch.mp4on web:http://www.dailymotion.com/video/xhvql0 kinectresearch-mp4 videogameshttp://techtalks.tv/talks/54443/Jeff HowbertIntroduction to Machine LearningWinter 201415

Related and overlapping fieldszzMachine learning is coalescence of ideas drawn from artificialintelligence, pattern recognition, statistics, and data miningThese days:– Pattern recognitionpatternstatisticsand machine learningrecognitionessentially the same– Data mining is machinemachinelearning plus large-scalelearningdata retrieval methodsartificial– Machine learning is oneintelligencedata miningof the hot research frontiersin statisticsJeff HowbertIntroduction to Machine LearningWinter 201416

Stages of knowledge extractionInterpretation reprocessingSelectionDataJeff etdataIntroduction to Machine LearningWinter 201417

Types of machine learningzSupervised methods (“predictive”)– Build a predictive model from examples ofdata with known outcomes.– Use model to predict outcomes for unknownor future examples.zUnsupervised methods (“descriptive”)– Discover structure in data for whichoutcomes are not known.Jeff HowbertIntroduction to Machine LearningWinter 201418

Machine learning tasksSupervised– Classification– Regression– Recommender systems– Reinforcement learningUnsupervised– Clustering– Association analysiszzJeff HowbertWe will cover taskshighlighted in redRankingAnomaly detectionIntroduction to Machine LearningWinter 201419

Classification definitionzGiven a collection of records (training set)– Each record contains a set of attributes.– Each record also has a discrete class label.zzLearn a model that predicts class label as afunction of the values of the attributes.Goal: model should assign class labels topreviously unseen records as accurately aspossible.– A test set is used to determine the accuracy of themodel. Usually, the given data set is divided intotraining and test sets, with training set used to buildthe model and test set used to validate it.Jeff HowbertIntroduction to Machine LearningWinter 201420

Classification illustratedRefund MaritalStatusTaxableIncome CheatNoSingle75K?Tid Refund MaritalStatusTaxableIncome esDivorced rried80K?4YesMarried120K5NoDivorced 95KYes6NoMarriedNo7YesDivorced redictedclassesJeff HowbertIntroduction to Machine LearningWinter 201421

Classification application 1zDirect marketing– Goal: Reduce cost of mailing by targeting a set ofcustomers likely to buy a new cell-phone product.– Approach: Use the data for a similar product introduced before. We know which customers decided to buy and whichdecided otherwise. This {buy, don’t buy} decision forms theclass label. Collect various demographic, lifestyle, and companyinteraction related information about all such customers.– Type of business, where they stay, how much they earn, etc. Use this information as input attributes to learn a classifiermodel.From [Berry & Linoff] Data Mining Techniques, 1997Jeff HowbertIntroduction to Machine LearningWinter 201422

Classification application 2zCustomer attrition:– Goal: To predict whether a customer is likelyto be lost to a competitor.– Approach: Usedetailed record of transactions with each of thepast and present customers, to find attributes.– How often the customer calls, where he calls, what timeof-the day he calls most, his financial status, maritalstatus, etc. Labelthe customers as loyal or disloyal. Find a model for loyalty.From [Berry & Linoff] Data Mining Techniques, 1997Jeff HowbertIntroduction to Machine LearningWinter 201423

Classification application 3zSky survey cataloging– Goal: To predict whether a sky object is a star or agalaxy (class), especially visually faint ones, based ontelescopic survey images from Palomar Observatory.– 3000 images with 23,040 x 23,040 pixels per image.– Approach: Segment the image. Measure image attributes (features) - 40 of them per object. Model the class based on these features. Success story: Found 16 new high red-shift quasars – verydistant objects, very difficult to identify.From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996Jeff HowbertIntroduction to Machine LearningWinter 201424

Classification application 4Classify galaxies according to stage of formation:early, intermediate, or lateEarlyAttributes:Intermediate Image features Characteristics of lightwaves received etc.LateData size: 72 million stars, 20 million galaxies Object catalog: 9 GB Image database: 150 GBCourtesy: http://aps.umn.eduJeff HowbertIntroduction to Machine LearningWinter 201425

Classification application 5C-Path: automated pathologic gradingof breast cancer specimenszStarted with 6642 high-level featuresper imagezFeatures characterized both malignantepithelium and surrounding stromaAlgorithm simultaneously selectedsmall subset of features and learned todiscriminate 5-year survivors from nonsurvivorszzFinal model: 11 features, 89%accuracy on predicting 5-year survivalScience Translational Medicine, 3, 108ra113, 2011Jeff HowbertIntroduction to Machine LearningWinter 201426

Clustering definitionzzzGiven:– Set of data points– Set of attributes on each data point– A measure of similarity between data pointsFind clusters such that:– Data points within a cluster are more similar to oneanother– Data points in separate clusters are less similar to oneanotherSimilarity measures:– Euclidean distance if attributes are continuous– Other problem-specific measuresJeff HowbertIntroduction to Machine LearningWinter 201427

Types of clusteringzPartitional– Data points divided into finite number ofpartitions (non-overlapping subsets)zHierarchical– Data points arranged in tree structure thatexpresses a continuum of similarities andclusteringJeff HowbertIntroduction to Machine LearningWinter 201428

Partitional clustering illustratedEuclidean distance based clustering in 3D spaceIntracluster distancesare minimizedAssign toclustersIntercluster distancesare maximizedJeff HowbertIntroduction to Machine LearningWinter 201429

Hierarchical clustering illustratedDriving distances between Italian citiesJeff HowbertIntroduction to Machine LearningWinter 201430

Clustering application 1zMarket segmentation– Goal: subdivide a market into distinct subsetsof customers, such that each subset isconceivably a submarket which can bereached with a customized marketing mix.– Approach: Collectdifferent attributes of customers based ontheir geographical and lifestyle related information. Find clusters of similar customers. Measure the clustering quality by observing buyingpatterns of customers in same cluster vs. thosefrom different clusters.Jeff HowbertIntroduction to Machine LearningWinter 201431

Clustering application 2zDocument clustering– Goal: Find groups of documents that aresimilar to each other based on the importantterms appearing in them.– Approach: Identify frequently occurring termsin each document. Form a similarity measurebased on the frequencies of different terms.Use it to cluster.– Benefit: Information retrieval can utilize theclusters to relate a new document or searchterm to clustered documents.Jeff HowbertIntroduction to Machine LearningWinter 201432

Document clustering examplezzItems to cluster: 3204 articles of Los Angeles Times.Similarity measure: Number of words in commonbetween a pair of documents (after some word 73Entertainment354278FinancialJeff HowbertIntroduction to Machine LearningWinter 201433

Clustering application 3Image segmentation with mean-shift algorithmz Allows clustering of pixels in combined (R, G, B)plus (x, y) spacezJeff HowbertIntroduction to Machine LearningWinter 201434

Clustering application 4zGenetic demographyJeff HowbertIntroduction to Machine LearningWinter 201435

Association rule definitionzzGiven:– set of records each of which contain some number ofitems from a given collectionProduce dependency rules which will predict occurrenceof an item based on occurrences of other items.TIDItems12345Bread, Coke, MilkBeer, BreadBeer, Coke, Diaper, MilkBeer, Bread, Diaper, MilkCoke, Diaper, MilkJeff HowbertRules Discovered:{Milk} -- {Coke}{Diaper, Milk} -- {Beer}Introduction to Machine LearningWinter 201436

Association rule applicationzSupermarket shelf management– Goal: Identify items that are bought togetherby sufficiently many customers.– Approach: Process the point-of-sale datacollected with barcode scanners to finddependencies among items.– A classic rule Ifa customer buys diaper and milk, then he is verylikely to buy beer. So don’t be surprised if you find six-packs stackednext to diapers!Jeff HowbertIntroduction to Machine LearningWinter 201437

Sequential pattern definitionzGiven is a set of objects, with each object associated with its own timeline ofevents, find rules that predict strong sequential dependencies amongdifferent events.(A B)z(C)(D E)Rules are formed by first discovering patterns. Event occurrences in thepatterns are governed by timing constraints.(A B) xg(C)(D E) ng ws msJeff HowbertIntroduction to Machine LearningWinter 201438

Sequential pattern applicationszTelecommunications alarm logs(Inverter Problem Excessive Line Current)(Rectifier Alarm) -- (Fire Alarm)zPoint-of-sale transaction sequencesComputer Bookstore:(Intro To Visual C) (C Primer) -- (Perl for dummies,Tcl Tk)Athletic Apparel Store:(Shoes) (Racket, Racketball) -- (Sports Jacket)Jeff HowbertIntroduction to Machine LearningWinter 201439

Regression definitionzGiven a collection of records (training set)– Each record contains a set of attributes.– Each record also has a continuous response variable.zLearn a model that predicts response variableas a function of the values of the attributes.– Model can be linear or nonlinear.zGoal: model should predict value of responsevariable on previously unseen records asaccurately as possible.– Usually, the given data set is divided into training andtest sets, with training set used to build the model andtest set used to test its accuracy.Jeff HowbertIntroduction to Machine LearningWinter 201440

Regression application 1zEstimate market value of homes– Data from multiple sourcesPhysical attributes Tax assessments Prior sale prices – Data from home ofinterest, plus homesin same neighborhood, city, stateJeff HowbertIntroduction to Machine LearningWinter 201441

Regression applications 2Predict voting patterns inelections.z Predict sales volumeof new product basedon advertisingexpenditure.z Predict weather patternsas a function oftemperature, humidity,air pressure, etc.z Time seriesprediction of stockmarket indices.zJeff HowbertIntroduction to Machine LearningWinter 201442

Recommender system definitionDOMAIN: some field of activity where users buy,view, consume, or otherwise experience itemsPROCESS:1. users provide ratings on items they haveexperienced2. Take all user, item, rating data and build apredictive model3. For a user who hasn’t experienced a particularitem, use model to predict how well they willlike it (i.e. predict rating)Jeff HowbertIntroduction to Machine LearningWinter 201443

Recommender system application 1Amazon.com product recommendationsJeff HowbertIntroduction to Machine LearningWinter 201444

Recommender system application 2Netflix viewing recommendationsJeff HowbertIntroduction to Machine LearningWinter 201445

Recommender system application 3zzSocial network recommendations of essentially everycategory of interest known to mankind– Friends– Groups– Activities– Media (TV shows, movies, music, books)– News stories– Ad placementsAll based on connections in underlying social networkgraph and the expressed ‘likes’ / ’dislikes’ of yourself andyour connectionsJeff HowbertIntroduction to Machine LearningWinter 201446

Anomaly detectionzDetect significant deviations from normal behaviorzApplications:– Credit card fraud detection– Network intrusiondetectionTypical network traffic at University level may reach over 100 million connections per dayJeff HowbertIntroduction to Machine LearningWinter 201447

Challenges of machine learningzData often has poorly understood structure– Best modeling approach rarely obvious at startzHeterogeneous data types– e.g. combination of text, images, and numeric datazzzzzzFrequent class imbalanceNoisy or corrupted dataMissing or incomplete dataHigh dimensionalityScaling of algorithms to massive data setsStreaming (real-time) dataJeff HowbertIntroduction to Machine LearningWinter 201448

Schedule for rest of coursezReview schedule in syllabus– Sequence of lecture topics– Topics for programming projectsJeff HowbertIntroduction to Machine LearningWinter 201449

Jeff Howbert Introduction to Machine Learning Winter 2014 27. z. Given: – Set of data points – Set of attributes on each data point – A measure of similarity between data points. z. Find clusters such that: – Data po