Transcription
PATTERN RECOGNITIONJIN YOUNG CHOIECE, SEOUL NATIONAL UNIVERSITY
NoticeLecture Notes: SlidesReferences:Pattern Classification by Richard O. Duda, et al.Pattern Recognition and Machine Learning by CristopherM. BishopAssistant: λ°μ¬κΈ°, 133(ASRI)-412, seulki.park@snu.ac.krEvaluation: Quiz 40%, Midterm 30%, Final 30%,Video: Every week two videos are uploaded.Video 1: upload: Sun. 09:00, Quiz Due: Tue. 24:00Video 2: upload: Wed. 09:00, Quiz Due: Fri. 24:00Class web: etl.snu.ac.krOffice: 133(ASRI)-406, jychoi@snu.ac.kr
INTRODUCTION TOAI: ARTIFICIAL INTELLIGENCEML: MACHINE LEARNINGDL: DEEP LEARNINGJIN YOUNG CHOIECE, SEOUL NATIONAL UNIVERSITY
Artificial n TheoryStatistical Machine LearningData baseSearch, InferenceDecision TreeSymbolismCognitive scienceMinskyBackpropagation RuleDeep LearningNeural NetworksConnectionismNeuroscienceRosenblatt
Artificial IntelligenceLearning from Experience (Observations, Examples)?Inference(Reasoning) for a Question?
Artificial IntelligenceLearning from Experience (Observations, Examples)If birds are given, then we can learn their features such as # of legs, shape of mouth, etc.If cancer patients are given, then we can observe their symptoms via diagnosis .Inference(Reasoning) for a QuestionIf features of something are given, then we can recognize what is it.If symptoms of a patient are given, then we can infer what is his decease. .
Artificial IntelligenceLearning from Experience (Observations, Examples)If birds are given, then we can learn their features such as # of legs, shape of mouth, etc.If cancer patients are given, then we can record their symptoms via diagnosisIf π¦ π¦1 , then π₯ π₯1π¦π₯If π¦ π¦2 , then π₯ π₯2π¦1 π₯11 π₯12 π₯13 π₯14 π₯15 π₯16DBπ¦2 π₯21 π₯22 π₯23 π₯24 π₯25 π₯26 Decision TreeInference(Reasoning) for a QuestionIf features of something are given, then we can recognize what is it.If symptoms of a patient are given, then we can infer what is his decease.If π₯ π₯1 , then π¦ π¦1π¦π₯If π₯ π₯2 , then π¦ π¦2π¦1 π₯11 π₯12 π₯13 π₯14 π₯15 π₯16π¦2 π₯21 π₯22 π₯23 π₯24 π₯25 π₯26SymbolismSearch-based Inference Engine
Artificial IntelligenceDensity EstimationLearning from Experience (Observations, Examples)If π¦ π¦1 , then π₯ π₯1If π¦ π¦2 , then π₯ π₯2π π₯ π₯π π¦ π¦π , π(π¦ π¦π )Inference(Reasoning) for a QuestionIf π₯ π₯1 , then π¦ π¦1If π₯ π₯2 , then π¦ π¦2π π¦ π¦π π₯ π₯π π(π₯ π₯π π¦ π¦π )π(π¦ π¦π )π(π₯),π π₯ Οπ π π₯ π₯π π¦ π¦π π(π¦ π¦π )Bayesian Theory
Artificial IntelligenceDeep Neural Networks for Learning and Inferenceπ π π, π₯ , ex, ππ π π¦ π¦π π₯ π₯πTraining (learning)Find π to minimize the errors between ππ and ππfor given training data { π₯π , ππ }Inference(Reasoning)Calculate ππ via the deep networkConnectionismNetwork TrainingInference (feedforward)
Learning and InferenceHightFeature SpaceWeight
Learning and InferenceA general tree structureHightroot node0internal(split) nodeFeature Space3Weightπ¦π¦2 π₯21 π₯22 π₯23 π₯24 π₯25 π₯268425DBDecision Tree69 10 11 12 13 14terminal (leaf) nodeπ₯π¦1 π₯11 π₯12 π₯13 π₯14 π₯15 π₯16Symbolism71
Learning and InferenceHightFeature SpaceWeightp ( x) Bayesian Theory 1 t 1exp (x ΞΌ) (x ΞΌ) 2 (2 )d / 2 1/ 21
Learning and InferenceHightNetwork TrainingInference (feedforward)Feature SpaceWeightgi (x) wti x wi 0 ,Connectionism
Convolutional Neural Networksπ π π, π₯
Supervised/Unsupervised Learning,HiddenObservedP ( i )UnsupervisedP (x 1 )P (x 2 )P (x 3 )Supervised
Generative/Discriminative Model Generating imagesπ§1π§3π§2π₯π π§1 , π§2 , π§3 π₯Generative approach:Model π π§, π₯ π π₯ π§ π(π§)Use Bayesβ theoremπ π₯ π§ π(π§)π π§ π₯ π(π₯)Discriminative approach:Model π π§ π₯ directly
Unsupervised Learning,Clustering: K-means, etc.Variational Auto-Encoder
Statistical LearningπΏ2 LossπΏ π, π π, π₯ π π π, π₯22Total Lossβ π π πΏ , π π, π₯ ππ(π₯, π)where π(π₯, π) is a joint PDF of π₯ and π, but unknownEmpirical Total Lossβπ π 1 πΟπΏπ π 1ππ , π π, π₯π
Statistical Learning1πππ π 2 πππ‘π πΌ π₯π log( ) απππ π π πππ‘π ππ32 bits ππ 1/232 for uniform distribution πΌπ 32 bitsβ πΌ π₯π 0 for ππ 1β‘ πΌ π₯π 0 for 0 ππ 1β’ πΌ π₯π πΌ π₯π for ππ ππEntropy : a measure of the average amount of informationconveyed per message, i.e., expectation of Informationπ» π₯ πΈ πΌ π₯π ππ πΌ π₯π ππ log ππππ
Statistical LearningEntropy becomes maximum when ππ is equiprobable32 bits ππ 1/232 for uniform distribution πΌπ 32 bits232 11 0 π»(π) log 32 32322π 1 2 π» π 0 for an event ππ 1 and ππ π 0ππCross Entropy Loss:β W ΟπΎπ π‘π log ππ π, π₯ππ π, π₯ π ππΟπππ π1k j βπj1 Kππk wkj h jnetπ€ππ(softmax)π‘π : target label (one hot: 0000100) ππ j w ji xinetiπ€ππiπ₯π I
Statistical LearningCross Entropy Loss:πΎβ W [π‘π log ππ π, π₯ (1 π‘π )log (1 ππ π, π₯ )]πππ π, π₯ 1(sigmoid)1 π πππ‘π : target label (multi-hot: 00110100)ππ1 k jπ€ππ βπj1 Kππk wkj h jnet ππ j w ji xinetiπ€ππiπ₯π I
Statistical LearningTheorem (Gray 1990)ππ ππ log 0πππRelative entropy (or Kullback β Leibler divergence)π·πΎπΏ (π π) ππΟπ ππ logπππ·πΎπΏ (π π) 0 for π πππ probability mass functionππ reference probability mass function
Scene and Object GenerationPose Transformer
Motion Retargeting
Outline of ML techniquesBayes RuleLikelihoodLearningML(P)EBayes. LPosterioriPrioriBayes DecisionGenerative ModelDiscriminative ModelGMLin. ClassifierLSSVMRandom ForestConvex O.HistogramK-NNEM, MCMCGMMK-SVMMCMC, VIBayesian NetDeep NNMLEBoltzm. MachineICAMCMC, VILatent DALinear DAMax. Separa.EMHMMPCAMax. ScatterVIParzen W.LearningEntropyK-SVDDSAConvex O.GABP(GD)NMK-L Divergence26
Course OutlineIntro. AI, ML, and DLIntro. Linear AlgebraIntro. Prob. & InformationBayesian Decision TheoryDim reduction PCA & LDALearning RulesSupport Vector MachineDeep Convolutional NetworksBayesian NetworksParametric pdf EstimationNon-Parametric pdf EstimationBoltzman MachineMarkov Chain Monte CarloInference of Bayesian Net, MCMCInference of Bayesian Net, VITraffic Pattern Analysis, VIRecent Papers- Active Learning- Imbalanced Data Learning- Out of Distribution- Weakly Supervised Learning- Etc.
Questions1. Describe the common things anddifferences among symbolism,connectionism, and Bayesian approach.2. Explain supervised/weekly-supervised/unsupervised learning.3. What is the difference betweendiscriminative and generative model?
Pattern Classification by Richard O. Duda, et al. . where ( , )is a joint PDF of and , but unknown Empirical Total Loss . Traffic Pattern Analysis, VI Recent Papers - Active Learning - Imbalanced Data Learning - Out of Distribution - Weakly Supervised Learning