Statistical Machine Learning - GitHub Pages

Transcription

Statistical Machine LearningLecture 01: IntroductionKristian KerstingTU DarmstadtSummer Semester 2020K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 20201 / 52

Today’s ObjectivesOrganizational issuesAdvertisementIntroductionK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 20202 / 52

Outline1. Organizational Issues2. Introduction3. Wrap-UpK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 20203 / 52

1. Organizational IssuesOutline1. Organizational Issues2. Introduction3. Wrap-UpK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 20204 / 52

1. Organizational IssuesInstructorsKristian Kersting heads the AI and ML Lab at the Department of Computer Science atthe TU Darmstadt. He has studied computer science and your can find him in the AlteHauptgebäude, Room 074, Hochschulstrasse 1. You can also contact Kristian throughkersting@cs.tu-darmstadt.deKarl Stelzner joined the AIML Lab as a Phd student in 2017. He is working onprobabilistic (deep) learning, in particular for unsupervised image understanding. Youcan contact Karl via email stelzner@cs.tu-darmstadt.de.PLEASE FEEL FREE TO EMAIL US WITH QUESTIONS!K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 20205 / 52

1. Organizational IssuesWebsite & Mailing listMoodle: iew.php?id 928K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 20206 / 52

1. Organizational IssuesCourse Language.will be in EnglishWhy?Essentially all machine learning literature is in English.Knowing the proper terminology is essential!Good to improve your English skills!Questions and answers in emails/homework/exams may be answeredin German (However, this is not encouraged.).K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 20207 / 52

1. Organizational IssuesFeedback: Essential for both sides.We appreciateFEEDBACK!Jeder Prof hat ’ne Meise. Meine dürfen Sie füttern!K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 20208 / 52

1. Organizational IssuesExam & Bonus Points from HomeworkFThere will be a written exam.Approximate date: The weeks after the end of classes.Homework Exercises:Homework is crucial for the exam!The bonus questions will count as bonus points to the lecture!Will max out on bonus points!Please register in Moodle with groups of 2 students.Question: Favorite Homework-Frequency? 4 homeworksK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 20209 / 52

1. Organizational IssuesHomework AssignmentsThere will be 4 homework assignments!Each assignment will contain:A few multiple choice questionsA few essay questionsSome programming exercises.K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202010 / 52

1. Organizational IssuesBackground ReadingFWe will add current papers & tutorials!Standard background reading:C.M. Bishop, Pattern Recognition and Machine Learning (2006), SpringerK.P. Murphy, Machine Learning: a Probabilistic Perspective (2012), MITPressS. Rogers, M. Girolami, A First Course in Machine Learning (2016), CRCPressMathematics for machine learning background:Marc Peter Deisenroth, A Aldo Faisal, and Cheng Soon Ong,Mathematics for Machine Learning, https://mml-book.github.io/K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202011 / 52

1. Organizational IssuesBackground ReadingFOther resourcesD. Barber, Bayesian Reasoning and Machine Learning (2012), CambridgeUniversity Press 090310.pdf)T. Hastie, R. Tibshirani, and J. Friedman (2015), The Elements ofStatistical Learning, Springer Verlag(https://web.stanford.edu/ hastie/Papers/ESLII.pdf)R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification (2nd ed. 2001),Willey- InterscienceT.M. Mitchell, Machine Learning (1997), McGraw-HillR. Sutton, A. Barto. Reinforcement Learning - an Introduction, MIT df)K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202012 / 52

1. Organizational IssuesHow does it fit in your course plan? 1/3VL Statistical Machine Learning is a good preparation for advancedlectures:VL Lernende Robot (aka Robot Learning)VL Probababilistic Graphical ModelsVL Statistical Relational AIIP Robot Learning 1, 2K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202013 / 52

1. Organizational IssuesHow does it fit in your course plan? 2/3Related Classes:Improve Foundations: Data Mining and Machine Learning (WiSe),Robot Learning (WiSe), Deep Learning: Architectures and Methods(WiSe)Useful Techniques: Optimierung statischer und dynamischerSystemeApplications of learning: Computer VisionTheses: We always have B.Sc. or M.Sc. Theses on ML topics.K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202014 / 52

1. Organizational IssuesHow does it fit in your course plan? 3/3B.Sc. / M.Sc. Informatik:Human Computer Systems (see Modulhandbuch)If you are strongly interested in machine learning you shouldtake:Statistical Machine Learning for HCS creditData Mining and Machine Learning for DKE creditRobot Learning for CE creditComputer Vision for Visual ComputingM.Sc. in Autonome SystemeM.Sc. in Visual Computing: Area “Computer Vision & ML”K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202015 / 52

2. IntroductionOutline1. Organizational Issues2. Introduction3. Wrap-UpK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202016 / 52

2. IntroductionWhy Machine Learning?“We are drowning in information and starving for knowledge.” John NaisbittEra of big data:In 2017 there are about 1.8 trillion webpages on the internet20 hours of video are uploaded to YouTube every minuteWalmart handles more than 1M transactions per hour and hasdatabases containing more than 2.5 petabytes (2.5 1015 ) ofinformation.No human being can deal with the data avalanche!K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202017 / 52

2. IntroductionWhy Machine Learning?“I keep saying the sexy job in the next ten years will be statisticiansand machine learners. People think I’m joking, but who would’veguessed that computer engineers would’ve been the sexy job of the1990s? The ability to take data — to be able to understand it, toprocess it, to extract value from it, to visualize it, to communicate it— that’s going to be a hugely important skill in the next decades.”Hal Varian, Chief Economist at Google, 2009K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202018 / 52

2. IntroductionJob Perspective"A significant constraint on realizing value from big data will be ashortage of talent, particularly of people with deep expertise instatistics and machine learning."Big data: The next frontier for innovation, competition, and productivity,2011, McKinsey Global InstituteK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202019 / 52

2. IntroductionMachine LearningWhat is ML? What is its goal?FDevelop a machine / an algorithm that learns to perform a taskfrom past experience.Why? What for?Fundamental component of every intelligent and / orautonomous systemDiscovering “rules” and patterns in dataAutomatic adaptation of systemsAttempting to understand human / biological learningK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202020 / 52

2. IntroductionMachine Learning in ActionK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202021 / 52

2. IntroductionMachine Learning ExamplesRecognition of handwritten digitsThese digits are given to us as small digital imagesWe have to build a “machine” to decide which digit it isObvious challenge: There are many different ways in whichpeople handwriteK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202022 / 52

2. IntroductionMachine Learning ExamplesCO2 predictionK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202023 / 52

2. IntroductionMachine Learning ExamplesCO2 predictionK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202024 / 52

2. IntroductionMachine Learning ExamplesCO2 predictionK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202025 / 52

2. IntroductionMachine Learning ExamplesCO2 predictionK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202026 / 52

2. IntroductionMachine Learning ExamplesEmail filteringSpeech recognitionVehicle controlK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202027 / 52

2. IntroductionMachine Learning Impact & SuccessesRecognition of speech, letters, faces, .Autonomous vehicle navigationGamesBackgammon world-championChess: Deep-Blue vs. KasparovGo: AlphaGo, AlphaGo ZeroGoogleFinding new astronomical structuresFraud detection (credit card applications).K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202028 / 52

2. IntroductionMachine LearningDevelop a machine / an algorithm that learns to perform a taskfrom past experience.Put more abstractly:Our task is to learn a mapping from input to output.f :I OPut differently, we want to predict the output from the input.y f (x; θ)Input: x I (images, text, sensor measurements, .)Output: y OParameters: θ Θ (what needs to be “learned”)K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202029 / 52

2. IntroductionClassification vs RegressionClassificationLearn a mapping into a discrete space, e.g.O {0, 1}O {0, 1, 2, 3, . . .}O {verb, noun, adjective, . . .}Examples:Spam / not spamDigit recognitionPart of Speech taggingK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202030 / 52

2. IntroductionClassification vs RegressionRegressionLearn a mapping into a continuous space, e.g.O RO R3ExamplesCurve fitting, Financial Analysis, Housing prices, .K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202031 / 52

2. IntroductionGeneral ParadigmTrainingTestingThe test dataset needs to be different than the training dataset!But ideally from the same underlying distribution.K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202032 / 52

2. IntroductionWhat data do we have for training?Data with labels (input / output pairs): supervised learningImage with digit labelSensory data for car with intended steering controlData without labels: unsupervised learningAutomatic clustering (grouping) of soundsClustering of text according to topicsDensity EstimationDimensionality ReductionData with and without labels: semi-supervised learningNo examples: learn-by-doingReinforcement LearningK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202033 / 52

2. IntroductionSome Key ChallengesWe need generalization!We cannot simply memorize the training set.What if we see an input that we haven’t seen before?Different shape of the digit image (unknown writer)“Dirt” on the picture, etc.We need to learn what is important for carrying out our task.This is one of the most crucial points that we will return to manytimes.K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202034 / 52

2. IntroductionGeneralizationHow do we achieve generalization?K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202035 / 52

2. IntroductionGeneralizationHow do we achieve generalization?We should not make the model overly complex!K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202036 / 52

2. IntroductionProminent example of overfitting.K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202037 / 52

2. IntroductionSome Key ChallengesInput:FeaturesChoosing the “right” features is very important.Coding and use of domain knowledge.May allow for invariance (e.g., volume and pitch of voice).Curse of Dimensionality:If the features are too high-dimensional, we will run into troubleDimensionality reduction.K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202038 / 52

2. IntroductionSome Key ChallengesHow do we measure performance?99% correct classification in speech recognition: What does thatreally mean?We understand the meaning of the sentence? We understandevery word? For all speakers?Need more concrete numbers:% of correctly classified lettersaverage distance driven (until accident.)% of games won% correctly recognized words, sentences, etc.Training vs. testing performance!K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202039 / 52

2. IntroductionSome Key ChallengesWe also need to define the right error metric:Which is better?Euclidean distance (L2 norm) might be useless.K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202040 / 52

2. IntroductionSome Key ChallengesWhich is the right model?The learned parameters (w) can mean a lot of different things:May characterize the family of functions or the model spaceMay index the hypothesis spacew can be a vector, adjacency matrix, graph, .K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202041 / 52

2. IntroductionSome Key ChallengesEven if we have solved the other problems, computation is usuallyquite hard:Learning often involves some kind of optimizationFind (search) best model parametersOften we have to deal with thousands, millions, billions, ., oftraining examplesGiven a model, compute the prediction efficientlyK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202042 / 52

2. IntroductionWhy is machine learning interesting (for you)?Machine learning is a challenging problem that is far from beingsolved.Our learning systems are primitive compared to us humans.Think about what and how quickly a child can learn!It combines insights and tools from many fields and disciplines:Traditional artificial intelligence (logic, semantic networks, .)StatisticsComplexity theoryArtificial neural networksPsychologyAdaptive control.K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202043 / 52

2. IntroductionWhy is machine learning interesting (for you)?Allows you to apply theoretical skills that you may otherwiseonly use rarely.Has lots of applications:Computer visionComputer linguisticsSearch (think Google)Digital “assistants”Computer systemsRobotics.K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202044 / 52

2. IntroductionWhy is machine learning interesting (for you)?It is a growing field:Many major companies are hiring people with machine learningknowledge.Learning machine learning is probably the most promising routeto such a 80-160.000 Euro Job.Lampert: “Most Computer Vision is just machine learning appliedto pictures.”It is beating traditional hand-engineered methods in many tasks(e.g., Vision, Natural Language, .)Because it is fun!K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202045 / 52

2. IntroductionPreliminary Syllabus (Subject to change!)Refresher of Statistics, Linear Algebra & Optimization ( 2Weeks)Fundamentals ( 3 weeks)Bayes decision theory, maximum likelihood, Bayesian inferencePerformance evaluationProbability density estimationMixture models, expectation maximizationLinear Methods ( 3-4 weeks)Linear regressionPCA, robust PCAFisher linear discriminantGeneralized linear modelsK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202046 / 52

2. IntroductionPreliminary SyllabusLarge-Margin Methods ( 3-4 weeks)Statistical learning theorySupport vector machinesKernel methodsNeural Networks ( 3 weeks)Neural Networks: From Inspiration to ApplicationDeep Learning: What is really different?Miscellaneous ( 3 weeks)Model averaging (bagging & boosting)Graphical models (basic introduction)K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202047 / 52

2. IntroductionCreditsThese slides are essentially the slides of Jan Peters.Some parts of Jan’s lecture material have been developed byProfs. Bernt Schiele, Stefan Roth and Stefan Schaal for theprevious iterations of this course or similar classes.Many figures that I will use are directly taken out of the books byChris Bishop and Duda, Hart & Stork and Kevin Murphy.K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202048 / 52

3. Wrap-UpOutline1. Organizational Issues2. Introduction3. Wrap-UpK. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202049 / 52

3. Wrap-Up3. Wrap-UpYou know now:What Machine Learning is and what it is not.Some of Machine Learning applications.The different types of learning problems.What classification and regression are.The challenges in solving a problem with Machine Learning.K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202050 / 52

3. Wrap-UpSelf-Test QuestionsWhat are some of Machine Learning applications?When can we benefit from using Machine Learning methods?What are the different types of learning?What is the difference between classification and regression?Can you give some examples of both tasks (and identify thedomain and codomain)?What are the challenges when solving a Machine Learningproblem?What is generalization? What is overfitting?K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202051 / 52

3. Wrap-UpHomeworkSelect some Machine Learning applications and check:What type of learning is it?Is it a classification or regression problem?What challenges do you foresee when solving this problem usingMachine Learning methods?Reading assignmentJordan Book, Linear Algebra chapter (online)Pedro Domingos, A few useful things to know about MachineLearning (https://homes.cs.washington.edu/ pedrod/papers/cacm12.pdf)Bishop ch. 1K. Kersting based on Slides from J. Peters· Statistical Machine Learning · Summer Semester 202052 / 52

K.P. Murphy, Machine Learning: a Probabilistic Perspective (2012), MIT Press S. Rogers, M. Girolami, A First Course in Machine Learning (2016), CRC Press Mathematics for machine learning background: . D. Barber, Bayesian Reasoning and Machine Learning (2012), Cambridge University Press (http: