PATTERN RECOGNITION - Ocw.snu.ac.kr

Transcription

PATTERN RECOGNITIONJIN YOUNG CHOIECE, SEOUL NATIONAL UNIVERSITY

NoticeLecture Notes: SlidesReferences:Pattern Classification by Richard O. Duda, et al.Pattern Recognition and Machine Learning by CristopherM. BishopAssistant: λ°•μŠ¬κΈ°, 133(ASRI)-412, seulki.park@snu.ac.krEvaluation: Quiz 40%, Midterm 30%, Final 30%,Video: Every week two videos are uploaded.Video 1: upload: Sun. 09:00, Quiz Due: Tue. 24:00Video 2: upload: Wed. 09:00, Quiz Due: Fri. 24:00Class web: etl.snu.ac.krOffice: 133(ASRI)-406, jychoi@snu.ac.kr

INTRODUCTION TOAI: ARTIFICIAL INTELLIGENCEML: MACHINE LEARNINGDL: DEEP LEARNINGJIN YOUNG CHOIECE, SEOUL NATIONAL UNIVERSITY

Artificial n TheoryStatistical Machine LearningData baseSearch, InferenceDecision TreeSymbolismCognitive scienceMinskyBackpropagation RuleDeep LearningNeural NetworksConnectionismNeuroscienceRosenblatt

Artificial IntelligenceLearning from Experience (Observations, Examples)?Inference(Reasoning) for a Question?

Artificial IntelligenceLearning from Experience (Observations, Examples)If birds are given, then we can learn their features such as # of legs, shape of mouth, etc.If cancer patients are given, then we can observe their symptoms via diagnosis .Inference(Reasoning) for a QuestionIf features of something are given, then we can recognize what is it.If symptoms of a patient are given, then we can infer what is his decease. .

Artificial IntelligenceLearning from Experience (Observations, Examples)If birds are given, then we can learn their features such as # of legs, shape of mouth, etc.If cancer patients are given, then we can record their symptoms via diagnosisIf 𝑦 𝑦1 , then π‘₯ π‘₯1𝑦π‘₯If 𝑦 𝑦2 , then π‘₯ π‘₯2𝑦1 π‘₯11 π‘₯12 π‘₯13 π‘₯14 π‘₯15 π‘₯16DB𝑦2 π‘₯21 π‘₯22 π‘₯23 π‘₯24 π‘₯25 π‘₯26 Decision TreeInference(Reasoning) for a QuestionIf features of something are given, then we can recognize what is it.If symptoms of a patient are given, then we can infer what is his decease.If π‘₯ π‘₯1 , then 𝑦 𝑦1𝑦π‘₯If π‘₯ π‘₯2 , then 𝑦 𝑦2𝑦1 π‘₯11 π‘₯12 π‘₯13 π‘₯14 π‘₯15 π‘₯16𝑦2 π‘₯21 π‘₯22 π‘₯23 π‘₯24 π‘₯25 π‘₯26SymbolismSearch-based Inference Engine

Artificial IntelligenceDensity EstimationLearning from Experience (Observations, Examples)If 𝑦 𝑦1 , then π‘₯ π‘₯1If 𝑦 𝑦2 , then π‘₯ π‘₯2𝑝 π‘₯ π‘₯𝑖 𝑦 𝑦𝑗 , 𝑝(𝑦 𝑦𝑗 )Inference(Reasoning) for a QuestionIf π‘₯ π‘₯1 , then 𝑦 𝑦1If π‘₯ π‘₯2 , then 𝑦 𝑦2𝑝 𝑦 𝑦𝑗 π‘₯ π‘₯𝑖 𝑝(π‘₯ π‘₯𝑖 𝑦 𝑦𝑗 )𝑝(𝑦 𝑦𝑗 )𝑝(π‘₯),𝑝 π‘₯ σ𝑖 𝑝 π‘₯ π‘₯𝑖 𝑦 𝑦𝑗 𝑝(𝑦 𝑦𝑗 )Bayesian Theory

Artificial IntelligenceDeep Neural Networks for Learning and Inferenceπ‘œ 𝑓 π‘Š, π‘₯ , ex, π‘œπ‘— 𝑝 𝑦 𝑦𝑗 π‘₯ π‘₯𝑖Training (learning)Find π‘Š to minimize the errors between π‘œπ‘— and 𝑑𝑗for given training data { π‘₯𝑝 , 𝑙𝑝 }Inference(Reasoning)Calculate π‘œπ‘— via the deep networkConnectionismNetwork TrainingInference (feedforward)

Learning and InferenceHightFeature SpaceWeight

Learning and InferenceA general tree structureHightroot node0internal(split) nodeFeature Space3Weight𝑦𝑦2 π‘₯21 π‘₯22 π‘₯23 π‘₯24 π‘₯25 π‘₯268425DBDecision Tree69 10 11 12 13 14terminal (leaf) nodeπ‘₯𝑦1 π‘₯11 π‘₯12 π‘₯13 π‘₯14 π‘₯15 π‘₯16Symbolism71

Learning and InferenceHightFeature SpaceWeightp ( x) Bayesian Theory 1 t 1exp (x ΞΌ) (x ΞΌ) 2 (2 )d / 2 1/ 21

Learning and InferenceHightNetwork TrainingInference (feedforward)Feature SpaceWeightgi (x) wti x wi 0 ,Connectionism

Convolutional Neural Networksπ‘œ 𝑓 π‘Š, π‘₯

Supervised/Unsupervised Learning,HiddenObservedP ( i )UnsupervisedP (x 1 )P (x 2 )P (x 3 )Supervised

Generative/Discriminative Model Generating images𝑧1𝑧3𝑧2π‘₯𝑝 𝑧1 , 𝑧2 , 𝑧3 π‘₯Generative approach:Model 𝑝 𝑧, π‘₯ 𝑝 π‘₯ 𝑧 𝑝(𝑧)Use Bayes’ theorem𝑝 π‘₯ 𝑧 𝑝(𝑧)𝑝 𝑧 π‘₯ 𝑝(π‘₯)Discriminative approach:Model 𝑝 𝑧 π‘₯ directly

Unsupervised Learning,Clustering: K-means, etc.Variational Auto-Encoder

Statistical Learning𝐿2 Loss𝐿 𝑑, 𝑓 π‘Š, π‘₯ 𝑑 𝑓 π‘Š, π‘₯22Total Lossβ„’ π‘Š 𝑑 𝐿 , 𝑓 π‘Š, π‘₯ 𝑑𝑝(π‘₯, 𝑑)where 𝑝(π‘₯, 𝑑) is a joint PDF of π‘₯ and 𝑑, but unknownEmpirical Total Lossℒ𝑖 π‘Š 1 𝑁σ𝐿𝑁 𝑛 1𝑑𝑛 , 𝑓 π‘Š, π‘₯𝑛

Statistical Learning1π‘π‘Žπ‘ π‘’ 2 𝑏𝑖𝑑𝑠𝐼 π‘₯π‘˜ log( ) α‰Šπ‘π‘Žπ‘ π‘’ 𝑒 π‘›π‘Žπ‘‘π‘ π‘π‘˜32 bits π‘π‘˜ 1/232 for uniform distribution πΌπ‘˜ 32 bitsβ‘  𝐼 π‘₯π‘˜ 0 for π‘π‘˜ 1β‘‘ 𝐼 π‘₯π‘˜ 0 for 0 π‘π‘˜ 1β‘’ 𝐼 π‘₯π‘˜ 𝐼 π‘₯𝑗 for π‘π‘˜ 𝑝𝑗Entropy : a measure of the average amount of informationconveyed per message, i.e., expectation of Information𝐻 π‘₯ 𝐸 𝐼 π‘₯π‘˜ π‘π‘˜ 𝐼 π‘₯π‘˜ π‘π‘˜ log π‘π‘˜π‘˜π‘˜

Statistical LearningEntropy becomes maximum when π‘π‘˜ is equiprobable32 bits π‘π‘˜ 1/232 for uniform distribution πΌπ‘˜ 32 bits232 11 0 𝐻(𝑋) log 32 32322π‘˜ 1 2 𝐻 𝑋 0 for an event π‘π‘˜ 1 and 𝑝𝑗 π‘˜ 0π‘œπ‘˜Cross Entropy Loss:β„’ W ΟƒπΎπ‘˜ π‘‘π‘˜ log π‘“π‘˜ π‘Š, π‘₯π‘“π‘˜ π‘Š, π‘₯ 𝑒 π‘Žπ‘˜Οƒπ‘—π‘Žπ‘’ 𝑗1k j β„Žπ‘—j1 Kπ‘Žπ‘˜k wkj h jnetπ‘€π‘˜π‘—(softmax)π‘‘π‘˜ : target label (one hot: 0000100) π‘Žπ‘— j w ji xineti𝑀𝑗𝑖iπ‘₯𝑖 I

Statistical LearningCross Entropy Loss:𝐾ℒ W [π‘‘π‘˜ log π‘“π‘˜ π‘Š, π‘₯ (1 π‘‘π‘˜ )log (1 π‘“π‘˜ π‘Š, π‘₯ )]π‘˜π‘“π‘˜ π‘Š, π‘₯ 1(sigmoid)1 𝑒 π‘Žπ‘˜π‘‘π‘˜ : target label (multi-hot: 00110100)π‘œπ‘˜1 k jπ‘€π‘˜π‘— β„Žπ‘—j1 Kπ‘Žπ‘˜k wkj h jnet π‘Žπ‘— j w ji xineti𝑀𝑗𝑖iπ‘₯𝑖 I

Statistical LearningTheorem (Gray 1990)π‘π‘˜ π‘π‘˜ log 0π‘žπ‘˜π‘˜Relative entropy (or Kullback – Leibler divergence)𝐷𝐾𝐿 (𝑝 π‘ž) π‘π‘˜Οƒπ‘˜ π‘π‘˜ logπ‘žπ‘˜π·πΎπΏ (𝑝 π‘ž) 0 for 𝑝 π‘žπ‘π‘˜ probability mass functionπ‘žπ‘˜ reference probability mass function

Scene and Object GenerationPose Transformer

Motion Retargeting

Outline of ML techniquesBayes RuleLikelihoodLearningML(P)EBayes. LPosterioriPrioriBayes DecisionGenerative ModelDiscriminative ModelGMLin. ClassifierLSSVMRandom ForestConvex O.HistogramK-NNEM, MCMCGMMK-SVMMCMC, VIBayesian NetDeep NNMLEBoltzm. MachineICAMCMC, VILatent DALinear DAMax. Separa.EMHMMPCAMax. ScatterVIParzen W.LearningEntropyK-SVDDSAConvex O.GABP(GD)NMK-L Divergence26

Course OutlineIntro. AI, ML, and DLIntro. Linear AlgebraIntro. Prob. & InformationBayesian Decision TheoryDim reduction PCA & LDALearning RulesSupport Vector MachineDeep Convolutional NetworksBayesian NetworksParametric pdf EstimationNon-Parametric pdf EstimationBoltzman MachineMarkov Chain Monte CarloInference of Bayesian Net, MCMCInference of Bayesian Net, VITraffic Pattern Analysis, VIRecent Papers- Active Learning- Imbalanced Data Learning- Out of Distribution- Weakly Supervised Learning- Etc.

Questions1. Describe the common things anddifferences among symbolism,connectionism, and Bayesian approach.2. Explain supervised/weekly-supervised/unsupervised learning.3. What is the difference betweendiscriminative and generative model?

Pattern Classification by Richard O. Duda, et al. . where ( , )is a joint PDF of and , but unknown Empirical Total Loss . Traffic Pattern Analysis, VI Recent Papers - Active Learning - Imbalanced Data Learning - Out of Distribution - Weakly Supervised Learning