CS 3710: Visual Recognition Recognition Basics

Transcription

CS 3710: Visual RecognitionClassification and DetectionAdriana KovashkaDepartment of Computer ScienceJanuary 13, 2015

Plan for Today Visual recognition basics part 2: Classificationand detection Adriana’s research Next time: First student presentation

Classification vs Detection Classification– Given an image or an image region, determinewhich of N categories it represents Detection– Determine where in the image a category is to befound

Classification

James Hays

The machine learning framework Apply a prediction function to a feature representation of theimage to get the desired output:f( ) “apple”f( ) “tomato”f( ) “cow”Svetlana Lazebnik

The machine learning frameworky f(x)outputpredictionfunctionimage features Training: given a training set of labeled examples{(x1,y1), , (xN,yN)}, estimate the prediction function fby minimizing the prediction error on the training set Testing: apply f to a never before seen test example xand output the predicted value y f(x)Svetlana Lazebnik

uresTrainingLearnedmodelTestingImageFeaturesTest ImageDerek Hoiem, Svetlana LazebnikLearnedmodelPrediction

Recognition task and supervision Images in the training set must be annotated with the “correctanswer” that the model is expected to produce“Contains a motorbike”Svetlana Lazebnik

Generalization How well does a learned model generalize from thedata it was trained on to a new test set?Training set (labels known)Svetlana LazebnikTest set (labels unknown)

Classification Assign input vector to one of two or more classes Any decision rule divides the input space intodecision regions separated by decision boundariesSvetlana Lazebnik

Supervised classification Given a collection of labeled examples, come up with afunction that will predict the labels of new examples.“four”“nine”?Training examplesNovel input How good is some function that we come up with to do theclassification? Depends on– Mistakes made– Cost associated with the mistakesKristen Grauman

Supervised classification Given a collection of labeled examples, come up with afunction that will predict the labels of new examples. Consider the two-class (binary) decision problem– L(4 9): Loss of classifying a 4 as a 9– L(9 4): Loss of classifying a 9 as a 4 Risk of a classifier s is expected loss:R ( s ) Pr 4 9 using s L 4 9 Pr 9 4 using s L 9 4 We want to choose a classifier so as to minimize this total riskKristen Grauman

Supervised classificationOptimal classifier willminimize total risk.Feature value xAt decision boundary,either choice of labelyields same expected loss.If we choose class “four” at boundary, expected loss is: P(class is 9 x) L(9 4) P(class is 4 x) L(4 4)If we choose class “nine” at boundary, expected loss is: P(class is 4 x) L(4 9)So, best decision boundary is at point x whereP(class is 9 x) L(9 4) P(class is 4 x) L(4 9)Kristen Grauman

Supervised classificationOptimal classifier willminimize total risk.Feature value xAt decision boundary,either choice of labelyields same expected loss.To classify a new point, choose class with lowest expected loss;i.e., choose “four” ifP(9 x) L(9 4) P(4 x) L(4 9)Loss for choosing “four”Kristen GraumanLoss for choosing “nine”

Supervised classificationP(4 x)P(9 x)Optimal classifier willminimize total risk.At decision boundary,either choice of labelyields same expected loss.Feature value xTo classify a new point, choose class with lowest expected loss;i.e., choose “four” ifP(9 x) L(9 4) P(4 x) L(4 9)Loss for choosing “four”Kristen GraumanLoss for choosing “nine”How to evaluate these probabilities?

Classifiers: Nearest neighborTrainingexamplesfrom class 1TestexampleTrainingexamplesfrom class 2f(x) label of the training example nearest to x All we need is a distance function for our inputs No training required!Svetlana Lazebnik

1-nearest neighborxxoxox oooox2x1James Haysxo xxx

5-nearest neighborxxoxox oooox2x1James Haysxo xxx

Linear classifiersC. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Lines in R2Let a w c x x y ax cy b 0Kristen Grauman

Lines in R2Letw a w c x x y ax cy b 0w x b 0Kristen Grauman

x0 , y0 Lines in R2Letw a w c x x y ax cy b 0w x b 0Kristen Grauman

Lines in R2 x0 , y0 LetDw a w c x x y ax cy b 0w x b 0D ax0 cy0 bKristen Graumana c22 w x b wdistance frompoint to line

Lines in R2 x0 , y0 LetDw a w c x x y ax cy b 0w x b 0D ax0 cy0 bKristen Graumana c22 w x b wdistance frompoint to line

Linear classifiers Find linear function to separate positive andnegative examplesx i positive :xi w b 0x i negative :xi w b 0Which lineis best?C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Support vector machines Discriminativeclassifier based onoptimal separatingline (for 2d case) Maximize themargin between thepositive andnegative trainingexamplesC. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Support vector machines Want line that maximizes the margin.Support vectorsx i positive ( yi 1) :xi w b 1x i negative ( yi 1) :x i w b 1For support, vectors,xi w b 1MarginC. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Support vector machines Want line that maximizes the margin.x i positive ( yi 1) :xi w b 1x i negative ( yi 1) :x i w b 1For support, vectors,xi w b 1Distance between pointand line:Support vectorsMargin xi w b w For support vectors:wΤ x b 11 12 M wwwwwC. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Support vector machines Want line that maximizes the margin.x i positive ( yi 1) :xi w b 1x i negative ( yi 1) :x i w b 1For support, vectors,xi w b 1Distance between pointand line: xi w b w Therefore, the margin is 2 / w Support vectorsMarginC. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Finding the maximum margin line1. Maximize margin 2/ w 2. Correctly classify all training data points:x i positive ( yi 1) :xi w b 1x i negative ( yi 1) :x i w b 1Quadratic optimization problem:1 TMinimizew w2Subject to yi(w·xi b) 1One constraint for eachtraining point.Note sign trick.C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Finding the maximum margin line Solution: w i i yi xiLearnedweightSupportvectorC. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Finding the maximum margin line Solution: w i i yi xib yi – w·xi (for any support vector) Classification function:f ( x) sign (w x b) sign y x x b iiiiIf f(x) 0, classify as negative, otherwise classify as positive. Notice that it relies on an inner product between the testpoint x and the support vectors xi (Solving the optimization problem also involvescomputing the inner products xi · xj between all pairs oftraining points)C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Nonlinear SVMs Datasets that are linearly separable work out great:x0 But what if the dataset is just too hard?x0 We can map it to a higher-dimensional space:x2Andrew Moore0x

Nonlinear SVMs General idea: the original input space can always bemapped to some higher-dimensional feature spacewhere the training set is separable:Φ: x φ(x)Andrew Moore

Nonlinear SVMs The kernel trick: instead of explicitlycomputing the lifting transformationφ(x), define a kernel function K such thatK(xi ,xj) φ(xi ) · φ(xj)Andrew Moore

Examples of kernel functionsK ( xi , x j ) xi x jT Linear: Gaussian RBF:K ( xi ,x j ) exp( xi x j2 22)Histogram intersection:K ( xi , x j ) min( xi ( k ), x j ( k ))kAndrew Moore

Summary:SVMs for image classification1. Pick an image representation2. Pick a kernel function for that representation3. Compute the matrix of kernel values betweenevery pair of training examples4. Feed the kernel matrix into your favorite SVMsolver to obtain support vectors and weights5. At test time: compute kernel values for your testexample and each support vector, and combinethem with the learned weights to get the valueof the decision functionSvetlana Lazebnik

What about multi-class SVMs? Unfortunately, there is no “definitive” multi-class SVMformulation In practice, we have to obtain a multi-class SVM bycombining multiple two-class SVMs One vs. others– Traning: learn an SVM for each class vs. the others– Testing: apply each SVM to the test example, and assign it to theclass of the SVM that returns the highest decision value One vs. one– Training: learn an SVM for each pair of classes– Testing: each learned SVM “votes” for a class to assign to thetest exampleSvetlana Lazebnik

Detection

PASCAL Visual Object Challengeaeroplane bike bird boat bottle bus car cat chair cow table doghorse motorbike person plant sheep sofa train tvDeva Ramanan

Detection: Scanning-window templatesDalal and Triggs CVPR05 (HOG)Papageorgiou and Poggio ICIP99 (wavelets)posnegww weights for orientation and spatial binsw·x 0Deva Ramanan

Deformable part modelsModel encodes local appearance spring deformationDeva Ramanan

Homework for Next Time Paper review for Features due at 10pm on1/14, send to kovashka@cs.pitt.edu

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998 . Lines in R2 . CS 3710: Visual Recognition Recognition Basics Author: Adriana Kovashka Created Date: 1/8/2015 4:17:47 PM .