Practical Machine Learning In R - Introduction

Transcription

Practical Machine Learning inRIntroductionLars Kotthoff12larsko@uwyo.edu12with slides from Bernd Bischl and Michel Langslides available at http://www.cs.uwyo.edu/ larsko/ml-fac1

What is Machine Learning? “gives computes the ability to learn without being explicitlyprogrammed” (Wikipedia)2

What is Machine Learning? “gives computes the ability to learn without being explicitlyprogrammed” (Wikipedia) “A computer program is said to learn from experience E withrespect to some class of tasks T and performance measure Pif its performance at tasks in T , as measured by P , improveswith experience E.” (Tom Mitchell)2

Examples3

Examples4

predicting-machine-learning-tutorial/5

ing-nba-divisions-by-clustering/6

Supervised Learning learn the relationship between input x and output y training data with labels available – y known for given x can see this as function approximation – find an f such thaty f (x)7

Supervised Learning x are features or attributes y is the ground truth denote predictions f (x) ŷ loss function L(y, ŷ) measures how good predictions are, e.g.L(y, ŷ) (y ŷ)2 want to minimize loss given training data Xtrain {(xi , yi )}n :arg minn L(yi , ŷi )i 18

Supervised Learning want to learn a general function that is predictive on new data second set Xtest that is not used in training to testgeneralization performance:n L(yi , ŷi )i 1 usually full data set X is split into non-overlapping train andtest sets:Xtrain Xtest XXtrain Xtest 9

Supervised Classification0.8 0.6 class b car0.4truck0.2 0.023456aGoal: Predict a class (discrete quantity), or membershipprobabilities10

Supervised Regression7.5 5.0 b 2.5 0.0 2.5 3 2 10123aGoal: Predict a continuous quantity11

Unsupervised Learning no ground truth y available determine group membership or assign labels loss function measures properties of groups, e.g. homogeneitywrt. features still want to minimize loss given training data and generalize12

Unsupervised Clustering2.5 2.0 1.5 b 1.0 0.5 0.0246aGoal: Group data by similarity, or estimate membershipprobabilities13

In this Course classification regression clustering data preprocessing (missing values, dimensionality reduction) performance evaluation parameter tuning14

Not in this Course R tutorial details on particular methods deep learning time series Big Data15

What you’ll need16

Install wnload/17

Install mlr on the R console:install.packages(”mlr”) or see s/InstallPackagesRStudio.html extensive tutorial available: 18

Format meetings roughly every week half lecture, half practical exercises happy to discuss specific problems19

What is Machine Learning? “gives computes the ability to learn without being explicitly programmed” (Wikipedia) “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as mea