Introduction Of Machine Learning

Transcription

INTRODUCTION OFMACHINE LEARNINGStanley Liang, PhD Candidate, Lassonde School of Engineering, YorkUniversityHelix Science Engagement Programs 2018

2WHAT IS MACHINE LEARNING The goal of machine learning is to programcomputers to use example data or pastexperience to solve a given program. - ACM Machine learning is a subset of artificial intelligencein the field of computer science that often usesstatistical techniques to give computers the abilityto "learn" with data, without being explicitlyprogrammed - Arthur Lee Samuel Machine learning is the science of gettingcomputers to act without being explicitlyprogramedStanley Liang, Lassonde School of Engineering, York University, 2018

3MACHINE LEARNING TOPICS Supervised learning: Supervised learningis where you have input variables (x) andan output variable (Y) and you use analgorithm to learn the mapping functionfrom the input to the output: Y f(x) Classification: A classification problem iswhen the output variable is a category,such as “red” or “blue” or “disease” and“no disease”. Regression: A regression problem is whenthe output variable is a real value, such as“dollars” or “weight”.Stanley Liang, Lassonde School of Engineering, York Univerrsity

4MACHINE LEARNING TOPICS Unsupervised learning: you only have input data (X)and no corresponding output variables. The goal forunsupervised learning is to model the underlyingstructure or distribution in the data in order to learnmore about the data. Clustering: A clustering problem is where youwant to discover the inherent groupings in thedata, such as grouping customers bypurchasing behavior. Association: An association rule learningproblem is where you want to discover rules thatdescribe large portions of your data, such aspeople that buy X also tend to buy Y.Stanley Liang, Lassonde School of Engineering, York Univerrsity

5SEMI-SUPERVISED MACHINELEARNING Problems where you have a large amount ofinput data (X) and only some of the data islabeled (Y) are called semi-supervised learningproblems. Between supervised and unsupervised learning. A photo archive where only some of the imagesare labeled, (e.g. dog, cat, person) and themajority are unlabeled. You can use unsupervised learning techniquesto discover and learn the structure in the inputvariables. You can also use supervised learning techniquesto make best guess predictions for theunlabeled data, feed that data back into thesupervised learning algorithm as training dataand use the model to make predictions on newunseen data.Stanley Liang, Lassonde School of Engineering, York Univerrsity

6THE MACHINE LEARNING FLOWCHARTStanley Liang, Lassonde School of Engineering, York Univerrsity

7DECISION BOUNDARY A decision boundary or decision surfaceis a hypersurface that partitions theunderlying vector space into two sets,one for each class. Decision boundaries are not always clearcut. Adjust the decision boundary sometimescan have better performanceStanley Liang, Lassonde School of Engineering, York Univerrsity

8COST FUNCTION AND CONVEXITY A cost function is something you want tominimize In machine learning, the Cost is the differencebetween the true value and the prediction:ℎ𝜃𝜃 𝑋𝑋 𝑌𝑌, in some form of a function Convexity means to optimize the functionparameters, the function should be in a downforward bell-shaped with a smooth curve withonly one minimum Two common ways to optimize a convexfunction Gradient Descent (currently common) Normal Equation (efficient for small dataset)Stanley Liang, Lassonde School of Engineering, York Univerrsity

9GRADIENT DESCENT Gradient descent is a first-order iterativeoptimization algorithm for finding theminimum of a function. To find a local minimum of a function usinggradient descent, one takes stepsproportional to the negative of the gradientof the function at the current point. To improve computing efficiency, we cansubsample the data points and compute thesubsample gradient, a typical way is calledstochastic gradient descentStanley Liang, Lassonde School of Engineering, York Univerrsity

SEMI-SUPERVISED MACHINE Problems where you have a large amount of LEARNING input data (X) and only some of the data is labeled (Y) are called semi-supervised learning problems. Between supervised and unsupervised learning. A photo archive where only some of the images are labeled, (e.g. dog, cat, person) and the majority are unlabeled.