Introduction To Computer Vision For Robotics

Transcription

Introduction to Computer Visionfor RoboticsJan-Michael FrahmOverview Camera model Multi-view geometry Camera pose estimation Feature tracking & matching Robust pose estimationIntroduction to Computer Vision for Robotics1

Homogeneous coordinatesMUnified notation:e3include origin in affine basisa3e2a2oHomogeneousCoordinates of Me1a1r r e1 e2M 0 0re30Affine basis matrix a1 r o a2 1 a3 1 Introduction to Computer Vision for RoboticsProperties of affine transformationTransformation Taffine combines linear mapping andcoordinate shift in homogeneous coordinates Linear mapping with A3x3 matrix coordinate shift with t3 translation vectort3 A3 x 3M ′ Taffine M M 0 0 0 1 Taffine a11 a12 aa 21 22 a31 a32 0 0 Parallelism is preserved ratios of length, area, and volume are preserved Transformations can be concatenated:a13a23a330tx t y tz 1 if M1 T1M and M 2 T2M1 M 2 T2T1M T21MIntroduction to Computer Vision for Robotics2

Projective geometryProjective space P2 is space of rays emerging from O view point O forms projection center for all rays rays v emerge from viewpoint into scene ray g is called projective point, defined as scaled v: g lvwR3 x′ v y ′ 1 x g (λ ) y λv , λ ℜ 0 w vyOxIntroduction to Computer Vision for RoboticsProjective and homogeneous points Given: Plane Π in R2 embedded in P2 at coordinates w 1 viewing ray g intersects plane at v (homogeneous coordinates) all points on ray g project onto the same homogeneous point v projection of g onto Π is defined by scaling v g/l g/wwΠ (Ρ2)w 1 x′ x g (λ ) λv λ y ′ y 1 w R3vyOx x w x ′ 1yv y ′ g w w 1 1 Introduction to Computer Vision for Robotics3

Finite and infinite pointsAll rays g that are not parallel to Π intersect at an affine point v on Π. The ray g(w 0) does not intersect Π. Hence v is not an affine point but adirection. Directions have the coordinates (x,y,z,0)TProjective space combines affine space with infinite points (directions).gP3Π (Ρ2)w 1g1g2vv1v2Ov x g y 0 Introduction to Computer Vision for RoboticsAffine and projective transformations Affine transformation leaves infinite points at infinityM ' TaffineM X ′ a11 a12 Y ′ aa 21 22 Z ′ a31 a32 0 0 0a13a23a330tx X t y Y tz Z 1 0 Projective transformations move infinite points into finite affine spaceM ' Tprojective M tx X a11 a12 a13 xp X ′ a y Y ′ a22 a23 t y Y 21p λ λ a31 a32 a33 zp Z′ tz Z 1 w ′ w 41 w 42 w 43 w 44 0 Example: Parallel lines intersect at the horizon (line of infinite points).We can see this intersection due to perspective projection!Introduction to Computer Vision for Robotics4

Pinhole Camera (Camera obscura)Interior of camera obscura(Sunday Magazine, 1838)Camera obscura(France, 1830)Introduction to Computer Vision for RoboticsPinhole camera modelagImnteCersoenseaperturer), cy(c xView directionobjectFocal length fimagelensIntroduction to Computer Vision for Robotics5

Perspective projectionPerspective projection in P3 models pinhole camera: scene geometry is affine Ρ3 space with coordinates M (X,Y,Z,1)T camera focal point in O (0,0,0,1)T, camera viewing direction along Z image plane (x,y) in Π(Ρ2) aligned with (X,Y) at Z Z0 Scene point M projects onto point Mp on plane surface X Y M Z 1 Z (Optical axis)Π3Z0Π (Ρ2)yxImage planeMp X x p ZZ0 X y Z0Y Z YM p p ZZZ 0 Z0 Z0 Z Z Z Z0 1 1 YOCamera centerXIntroduction to Computer Vision for RoboticsProjective TransformationProjective Transformation maps M onto Mp in Ρ3 space MρMpYO xp X 1 y Y 0ρ M p TpM ρ p Z0 Z 0 0 1 W Xρ 0 01 00 10 z100 X 0 Y 0 Z 0 1 Z projective scale factorZ0Projective Transformation linearizes projectionIntroduction to Computer Vision for Robotics6

Perspective ProjectionDimension reduction from Ρ3 into Ρ2 by projection onto Π(Ρ2) xp 1 x p 10 yy p 0 Zp 0 10 0 1 0MmpΠ (Ρ2)001100000001000 x 0 p 0 y p 0 0 Z0 1 1 1 YOXPerspective projection P0 from Ρ3 onto Ρ2: x 1 0 0 ρ mp DpTpM P0M ρ y 0 1 0 1 0 0 z1 0 X 0 YZ0 , ρ Z Z00 1 Introduction to Computer Vision for RoboticsImageplane and image sensorA sensor with picture elements (Pixel) is added onto the image plane Z (Optical axis)Image centerc (cx, cy)TImage sensorPixel coordinatesm (y,x)TyxPixel scalef (fx,fy)T Focal length Z0YXImage-sensor mapping:m KmpProjection centerPixel coordinates are related to image coordinates by affine transformation Kwith five parameters: fx s Image center c at optical axis distance Zp (focal length) and Pixel size determines pixel resolution fx, fy image skew s to model angle between pixel rows and columnscx K 0 fy cy 0 0 1 Introduction to Computer Vision for Robotics7

Projection in general pose R C Tcam T 0 1 PRotmpation[Projection: ρ mpR]Projection center C PMM 1camTscene T RT RTC T 1 0World coordinatesIntroduction to Computer Vision for RoboticsProjection matrix P Camera projection matrix P combines: inverse affine transformation Tcam-1 from general pose to origin Perspective projection P0 to image plane at Z0 1 affine mapping K from image to sensor coordinates R Tscene pose transformation: Tscene T 0 1 0 0 0 projection: P0 0 1 0 0 [I 0] 0 0 1 0 RT C 1 fx s c x sensor calibration: K 0 fy cy 0 0 1 ρ m PM, P KP0Tscene K RT RT C Introduction to Computer Vision for Robotics8

2-view geometry: F-MatrixProjection onto two views:P0 K 0R0 1 [I 0]P1 K1R1 1 [Iρ0 m0 P0M K 0R0 1 [I 0] M ρ 0 m0 K 0R 10[Iρ1m1 P1M K1R0] M C1 ] M C1 ]Oρ1m1 K1R1 1R0 K 0 1ρ 0 m0 K1R1 1C1 ρ1m1 ρ0H m0 e1ρ0M M H m 0Zm0m1YO[I K1R1 1 [I 0 ] M K1R1 1 [IMP0 C1 ] 11P1e1XC1Epipolar line X X 0 Y Y 0 M M O Z Z 0 1 0 1 Introduction to Computer Vision for RoboticsThe Fundamental Matrix F The projective points e1 and (H m0) define a plane in camera 1 (epipolarplane Π e)the epipolar plane intersect the image plane 1 in a line (epipolar line ue)the corresponding point m1 lies on that line: m1Tue 0If the points (e1),(m1),(H m0) are all collinear, then the collinearitytheorem applies: (m1T e1 x H m0) 0.collinearity of m 1, e1, H m 0 m 1T ([e1 ]x H m 0 ) 0[ e ]x 0 ez ey ez0exey e x 0 Fundamental Matrix FF [e1 ]x H F3 x 3Epipolar constraintm1T Fm0 0Introduction to Computer Vision for Robotics9

Estimation of F from correspondences Given a set of corresponding points, solve linearily for the 9 elements ofF in projective coordinatessince the epipolar constraint is homogeneous up to scale, only eightelements are independentsince the operator [e]x and hence F have rank 2, F has only 7independent parameters (all epipolar lines intersect at e)each correspondence gives 1 collinearity constraint solve F with minimum of 7 correspondencesfor N 7 correspondences minimize distance point-line:N (mT1,nFm0,n )2 min!n 0m1Ti Fm 0 i 0det(F ) 0 (Rank 2 constraint)Introduction to Computer Vision for RoboticsThe Essential Matrix E F is the most general constraint on an image pair. If the cameracalibration matrix K is known, then more constraints are available Essential Matrix E() )T F (Km ) m T K T FK m m1T Fm0 (Km101 14243 0E 0 E [e]x R with [e]x ez e y ez0exey ex 0 E holds the relative orientation of a calibrated camera pair. It has5 degrees of freedom: 3 from rotation matrix Rik, 2 from directionof translation e, the epipole.Introduction to Computer Vision for Robotics10

Estimation of P from E From E we can obtain a camera projection matrix pair: E Udiag(0,0,1)VT P0 [I3x3 03x1] and there are four choices for P1:P1 [UWVT u3] or P1 [UWVT -u3] or P1 [UWTVT u3] or P1 [UWTVT -u3] 0 1 0 with W 1 0 0 0 0 1 only one with 3D point infront of both camerasfour possible configurations:Introduction to Computer Vision for Robotics3D Feature Reconstruction corresponding point pair (m0, m1) is projected from3D feature point M M is reconstructed from by (m0, m1) triangulation M has minimum distance of intersection2P0m0MI0d min!dconstraints:I0T d 0m1FI1I1T d 0minimize reprojection error:P12(m0 P0M )2 (m1 PM1 ) min.Introduction to Computer Vision for Robotics11

Multi View Tracking 2D match: Image correspondence (m1, mi) 3D match: Correspondence transfer (mi, M) via P1 3D Pose estimation of Pi with mi - Pi M min.P0Mm03D matchmim1P1Pi2D matchNMinimize lobal reprojection error:K mk ,i Pi Mk2 min!i 0 k 0Introduction to Computer Vision for RoboticsCorrespondences matching vs. tracking Image-to-image correspondences are essential to 3DreconstructionSIFT-matcherKLT-trackerExtract features independentlyand then match by comparingdescriptors [Lowe 2004]Extract features in first images andfind same feature back in next view[Lucas & Kanade 1981] , [Shi &Tomasi 1994] Small difference between frames potential large difference overallIntroduction to Computer Vision for Robotics12

SIFT-detectorScale and image-plane-rotation invariant feature descriptor[Lowe 2004] Introduction to Computer Vision for RoboticsSIFT-detector Image content is transformed into local feature coordinatesthat are invariant to translation, rotation, scale, and otherimaging parametersIntroduction to Computer Vision for Robotics13

Difference of Gaussian for ScaleinvarianceGaussianDifference ofGaussianDifference-of-Gaussian with constant ratio of scales is a closeapproximation to Lindeberg’s scale-normalized Laplacian[Lindeberg 1998] Introduction to Computer Vision for RoboticsDifference of Gaussian for ScaleinvarianceDifference-of-Gaussian with constant ratio of scales is a closeapproximation to Lindeberg’s scale-normalized Laplacian[Lindeberg 1998] Introduction to Computer Vision for Robotics14

Key point localization Detect maxima and minima ofdifference-of-Gaussian in scalespace Fit a quadratic to surroundingvalues for sub-pixel and sub-scaleinterpolation (Brown & Lowe, 2002) Taylor expansion around point:ulBruSbrtac Offset of extremum (use finitedifferences for derivatives):Introduction to Computer Vision for RoboticsOrientation normalization Histogram of local gradient directionscomputed at selected scaleAssign principal orientation at peak ofsmoothed histogramEach key specifies stable 2Dcoordinates (x, y, scale, orientation)02πIntroduction to Computer Vision for Robotics15

Example of keypoint detectionThreshold on value at DOG peak and on ratio of principlecurvatures (Harris approach)(a) 233x189 image(b) 832 DOG extrema(c) 729 left after peakvalue threshold(d) 536 left after testingratio of principlecurvaturescourtesy LoweIntroduction to Computer Vision for RoboticsSIFT vector formation Thresholded image gradients are sampled over 16x16 array oflocations in scale space Create array of orientation histograms 8 orientations x 4x4 histogram array 128 dimensionsexample 2x2 histogram array LoweIntroduction to Computer Vision for Robotics16

Sift feature detectorIntroduction to Computer Vision for RoboticsRobust data selection: RANSAC Estimation of plane from point data{potential plane points}Select m samplesxxxxxxxxxxxCompute n-parameter solutionxxxEvaluate on {potential plane points}{inliersamples}xxxBest solution so far?yesnoKeep itno1-(1-(#inlier/ {potential plane points} )n)steps 0.99Best solution, {inlier}, {outlier}Introduction to Computer Vision for Robotics17

RANSAC: Evaluate HypothesesEvaluate cost functionc0 λ ε 1 cc1 c λ2ε 2 1 ccelse22xxxxxxxxε)xxx22(2λ εifε λε c c 2 c 1 λ2ε 2 1 2xxxxxxpotential points ε0ε1ε0 ε0ε1 ε1εnεn εnplane hypothesesε0ε1εnIntroduction to Computer Vision for RoboticsRobust Pose Estimation Calibrated CameraBootstrap{2D-2D correspondences}E-RANSAC(5-point algorithm)known 3D points{2D-3D correspondences}RANSAC(3-point algorithm)PEEstimate P0,P1nonlinear refinement with all inliersPP0,P1nonlinear refinement with all inliersP0,P1triangulate pointsIntroduction to Computer Vision for Robotics18

References[Lowe 2004] David Lowe, Distinctive Image Features from ScaleInvariant Keypoints, IJCV, 60(2), 2004, pp91-110[Lucas & Kanade 1981] Bruce D. Lucas and Takeo Kanade. AnIterative Image Registration Technique with an Application to StereoVision. In Proceedings International Joint Conference on ArtificialIntelligence, 1981.[Shi & Tomasi 1994] Jianbo Shi and Carlo Tomasi, Good Features toTrack, IEEE Conference on Computer Vision and PatternRecognition 1994[Baker & Matthews 2004] S. Baker and I. Matthews, Lucas-Kanade 20Years On: A Unifying Framework.International Journal of ComputerVision, 56(3):221–255, March 2004.[Fischler & Bolles] M.A. Fischler and R.C. Bolles, Random sampleconsensus: A paradigm for model fitting with applications to imageanalysis and automated cartography.CACM, 24(6), June’81.Introduction to Computer Vision for RoboticsReferences [Mikolajczyk 2003], K.Mikolajczyk, C.Schmid. “A PerformanceEvaluation of Local Descriptors”. CVPR 2003[Lindeberg 1998], T. Lindeberg, "Feature detection withautomatic scale selection," International Journal of ComputerVision, vol. 30, no. 2, 1998[Hartley 2003] R. hartley and A. Zisserman, “Multiple ViewGeometry in Computer Vision”, 2nd edition, Cambridge Press,2003Introduction to Computer Vision for Robotics19

3 Introduction to Computer Vision for Robotics Projective geometry Projective space P2 is space of rays emerging from O view point O forms projection center for all rays rays v emerge from viewpoint into scene ray g is called projective point, defined as