MACHINE LEARNING - Alexandru Ioan Cuza University

Transcription

0.MACHINE LEARNINGLiviu CiortuzDepartment of CS, University of Iaşi, România

1.What is Machine Learning? ML studies algorithms that improve with experience. {z}learn fromTom Mitchell (Definition of the [general ] learning problem):“A computer program is said to learn from experience E with respectto some class of tasks T and performance measure P , if its performancea tasks in T , as measured by P , improves with experience E.” Examples of [specific] learning problems (see next slide) [Liviu Ciortuz:] ML is data-driven programming [Liviu Ciortuz:] ML gathers a number of well-defined subdomains/disciplines, each one of them aiming to solve in itsown way the above-formulated [general ] learning problem.

2.What is Machine Learning good for? natural language (text & speech) processinggenetic sequence analysisroboticscustomer (financial risc) evaluationterrorist threat detectioncompiler optimisationsemantic webcomputer securitysoftware engineeringcomputer vision (image processing)etc.

3.A multi-domain ncept ics(model fitting)DataMiningDatabaseSystems(Knowledge Discoveryin Databases)PatternRecognitionEngineering

4.The Machine Learning BSc Course:Plan0. Introduction to Machine Learning (T.Mitchell, ch.1)1. Probabilities Revision (Ch.Manning & H.Schütze, ch.2)2. Decision Trees (T.Mitchell, ch.3)3. Bayesian Learning (T.Mitchell, ch.6)4. Instance-based Learning (T.Mitchell, ch.8)5. Clustering Algorithms (Ch.Manning & H.Schütze, ch.14)6. The EM algorithmic schemata (T.Mitchell, ch.6.12)

5.Bibliography1. “Machine Learning”Tom Mitchell. McGraw-Hill, 19972. “Pattern Recognition and Machine Learning”Christopher Bishop. Springer, 20063. “Machine Learning – A Probabilistic Perspective”Kevin Murphy, MIT Press, 20124. “The Elements of Statistical Learning”Trevor Hastie, Robert Tibshirani, Jerome Friedman. Springer, 2nd ed. 20095. “Foundations of Statistical Natural Language Processing”Christopher Manning, Hinrich Schütze. MIT Press, 20026. “Exerciţii de ı̂nvăţare automată”L. Ciortuz, A. Munteanu E. Bădărău.Editura Universităţii “Alexandru Ioan Cuza”, Iaşi, Romania, 2015

6.Other suggested readings:More on the theoretical side (I)1. “Pattern Recognition” (2nd ed.)Duda, Hart, Stork. John Wiley & Sons Inc., 20012. “Bayesian Reasoning and Machine Learning”David Barber, 20123. “Pattern Recognition”, (Fourth Edition)Sergios Theodoridis, Konstantinos Koutroumbas. Academic Press, 20084. “Machine Learning. A Bayesian and Optimization Perspective”,Sergios Theodoridis. Elsevier, 20155. “Apprentissage artifficiel” (2e ed.)Antoine Cornuéjols. Eyrolles, 2010

7.Other suggested readings:More on the theoretical side (II)1. “Data mining with decision trees” (2nd ed.)Lior Rokach, Oded Maimon. World Scientific, 20152. “Clustering”Rui wu, Donald C. Wunsch II; IEEE Press, 20093. “The EM Algorithm and Extensions” (2nd ed.)Geoffrey J. McLachlan, Thriyambakam Krishnan. John Wiley & Sons, 20084. “A Tutorial on Support Vector Machines for Pattern Recognition”Christopher Burges, 19985. “Support Vector Machines and other kernel-based learning methods”Nello Cristianini, John Shawe-Taylor. Cambridge University Press, 2000.6. “Apprentissage statistique. Réseaux de neurones, cartes topologiques, machines àvecteurs supports” (3e ed.)G. Dreyfus, J.-M. Martinez, M. Samuelides, M.B. Gordon, F. Badran, S. Thiria.Eyrolles, 2007

8.Other suggested readings:More on the practical side1. “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations”, Ian Witten, Eibe Frank (3rd ed.). Morgan Kaufmann Publishers,20112. “An Introduction to Statistical Learning”Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. Springer, 20133. “Applied Predictive Modeling”Max Kuhn, Kjell Johnson; Springer, 20134. “An introduction to Pattern Recognition: A Matlab approach”,Sergios Theodoridis, Konstantinos Koutroumbas. Academic Press, 20105. “Machine Learning with R”, Brett Lantz. PACT Publishing, 20136. “Data Mining with R – Learning with Case Studies”Luı́s Torgo. CRC Press, 20117. “Mining of Massive Datasets”Anand Rajaraman, Jure Leskovec, Jeffrey D. Ullman; 2013

9.A general schema for machine learning methodstest/generalizationdatatrainingdatamachine learningalgorithmdatamodelpredictedclassification

10.Basic ML Terminology1. instance x, instance set Xconcept c X, or c : X {0, 1}example (labeled instance): hx, c(x)i; positive examples, neg. examples2. hypotheses h : X {0, 1}hypotheses representation languagehypotheses set Hhypotheses consistent with the concept c: h(x) c(x), example hx, c(x)iversion space3. learning train testsupervised learning (classification), unsupervised learning (clustering)4. errorh {x X, h(x) 6 c(x)} training error, test erroraccuracy, precision, recall5. validation set, development setn-fold cross-validation, leave-one-out cross-validationoverfitting

11.The Inductive Learning AssumptionAny hypothesis found to conveniently approximate thetarget function over a sufficiently large set of trainingexampleswill also conveniently approximate the target functionover other unobserved examples.

12.Inductive BiasConsider a concept learning algorithm L the instances X, and the target concept c the training examples Dc {hx, c(x)i}. Let L(xi , Dc ) denote the classification assigned to the instance xi by Lafter training on data Dc .Definition:The inductive bias of L is any minimal set of assertions B suchthat( xi X)[(B Dc xi ) L(xi , Dc )]for any target concept c and corresponding training examples Dc .(A B means A logically entails B)

13.Inductivesystemscan be modelled byequivalent deductivesystems

14.Evaluation measures in Machine Learninghcfntpfpaccuracy:tp tnAcc tp tn fp fnprecision:P recall (or: sensitivity):F-measure:tntpfptnfn true positivesfalse positivestrue negativesfalse negativestptp fpR tptp fnP RF 2 P Rspecificity:Sp follout: fptn fptntn fpMathew’s Correlation Coefficient:tp tn fp fnMCC q(tp fp) (tn fn) (tp fn) (tn fp)

15.Lazy learning vs. eager learning algorithmsEager: generalize before seeing query ID3, Backpropagation, Naive Bayes, Radial basis function networks, . . . Must create global approximationLazy: wait for query before generalizing k-Nearest Neighbor, Locally weighted regression, Case based reasoning Can create many local approximationsDoes it matter?If they use the same hypothesis space H, lazy learners can representmore complex functions.E.g., a lazy Backpropagation algorithm can learn a NN which is different for each query point, compared to the eager version of Backpropagation.

16.Who is Liviu Ciortuz? Diploma (maths and CS) from UAIC, Iaşi, Romania, 1985PhD in CS from Université de Lille, France, 1996 programmer:Bacău, Romania (1985-1987) full-time researcher:Germany (DFKI, Saarbrücken, 1997-2001),UK (Univ. of York and Univ. of Aberystwyth, 2001-2003),France (INRIA, Rennes, 2012-2013) assistant, lecturer and then associated professor:Univ. of Iasi, Romania (1990-1997, 2003-2012, 2013-today)

17.ADDENDA“.colleagues at the Computer Sciencedepartment at Saarland University havea strong conviction, that nothing is aspractical as a good theory.”Reinhard Wilhelm,quoted by Cristian Calude,in The Human Face of Computing,Imperial College Press, 2016

18.“Mathematics translates concepts intoformalisms and applies those formalismsto derive insights that are usually NOTamenable to a LESS formal analysis.”Jürgen Jost,Mathematical Concepts,Springer, 2015

19.“Mathematics is a journey that must beshared, and by sharing our own journey withothers, we, together, can change the world.”“Through the power of mathematics, we canexplore the uncertain, the counterintuitive,the invisible; we can reveal order and beauty,and at times transform theories into practical objects, things or solutions that you canfeel, touch or use.”Cedric Villani,winner of the Fields prize, 2010cf. -inside-the-mind-of-a-mathematician, 15.03.2017xxx

1. "Machine Learning" Tom Mitchell. McGraw-Hill, 1997 2. "Pattern Recognition and Machine Learning" Christopher Bishop. Springer, 2006 3. "Machine Learning - A Probabilistic Perspective" Kevin Murphy, MIT Press, 2012 4. "The Elements of Statistical Learning" Trevor Hastie, Robert Tibshirani, Jerome Friedman. Springer, 2nd ed .