Visual SLAM - An Overview - Luigi Freda

Transcription

Visual SLAMAn OverviewL. FredaALCOR LabDIAGUniversity of Rome ”La Sapienza”May 3, 2016L. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 20161 / 39

Outline1IntroductionWhat is SLAMMotivations2Visual Odometry (VO)Problem FormulationVO AssumptionsVO AdvantagesVO PipelineVO DriftVO or SFM3Visual SLAMVO vs Visual SLAML. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 20162 / 39

Outline1IntroductionWhat is SLAMMotivations2Visual Odometry (VO)Problem FormulationVO AssumptionsVO AdvantagesVO PipelineVO DriftVO or SFM3Visual SLAMVO vs Visual SLAML. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 20163 / 39

SLAMSimultaneous Localization And MappingMapping – ”What does the world look like?”Integration of the information gathered with sensors into a givenrepresentation.Localization – ”Where am I?”Estimation of the robot pose relative to a map. Typical problems:(i) pose tracking, where the initial pose of the vehicle is known(ii) global localization, where no a priori knowledge about the startingposition is given.Simultaneous localization and mapping (SLAM)Build a map while at the same time localizing the robot within thatmap. The chicken and egg problem: A good map is needed forlocalization while an accurate pose estimate is needed to build a map.Visual SLAM: SLAM by using visual sensors such as monocularcameras, stereo rigs, RGB-D cameras, DVS, etcL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 20164 / 39

Why using a camera?Why using a camera?Vast informationExtremely low Size, Weight, and Power (SWaP) footprintCheap and easy to usePassive sensorChallengeWe need power efficiency for truly capable always-on tiny devices; orto do much more with larger devicesQuestionHow does the human brain achieve always-on, dense, semantic visionwith very limited power?L. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 20165 / 39

Key Applications of Visual SLAMLow-cost robotics (e.g. a mobile robot with a cheap camera)Agile robotics (e.g. drones)SmartphonesWearablesAR/VR: inside-out tracking, gamingL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 20166 / 39

Outline1IntroductionWhat is SLAMMotivations2Visual Odometry (VO)Problem FormulationVO AssumptionsVO AdvantagesVO PipelineVO DriftVO or SFM3Visual SLAMVO vs Visual SLAML. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 20167 / 39

Why working on Visual SLAM?Robotics and Computer Vision market is exponentially growing. Manyrobotic products, augmented reality and mixed reality apps/games, etc.Google (Project Tango, Google driverless car)Apple (acquisition of Metaio and Primesense, driverless car)Dyson (funded Dyson Robotics Lab, Research lab at Imperial Collegein London)Microsoft (Hololens and its app marketplace)Magic Leap (funded by Google with 542M)How many apps related to machine learning and pattern recognition?L. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 20168 / 39

Why working on Visual SLAM?From the article of WIRED magazine: The Untold Story of Magic Leap,the Worlds Most Secretive StartupBut to really understand whats happening at Magic Leap, you need to alsounderstand the tidal wave surging through the entire tech industry. All themajor players —Facebook, Google, Apple, Amazon, Microsoft, Sony,Samsung — have whole groups dedicated to artificial reality, and theyrehiring more engineers daily. Facebook alone has over 400 people workingon VR. Then there are some 230 other companies, such as Meta, theVoid, Atheer, Lytro, and 8i, working furiously on hardware and contentfor this new platform.This technology will allow users to share and live active experiences byusing InternetL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 20169 / 39

VideosWhat research can doPTAM (with advanced AR)DTAMElastic FusionWhat industry is actually doingHololensDyson360Project TangoMagic LeapL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201610 / 39

Visual SLAM Modern SystemsPositioning and reconstruction now rather mature. though manyResearchers believe its still rather premature to call even that solvedQuality open source systems: LSD-SLAM, ORB-SLAM, SVO,KinectFusion, ElasticFusionCommercial products and prototypes: Google Tango, Hololens, Dyson360 Eye, Roomba 980But SLAM continues. and evolves into generic real-time 3Dperception researchL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201611 / 39

BenefitsWorking on Visual SLAMThe skills learned by dealing the Visual SLAM will be very appreciated andhighly valued in IndustryGain valuable skills in real-time C programming (codeoptimization, multi-threading, SIMD, complex data structuresmanagement)Work on a technology which is going to change the worldEnrich your CV with a collaboration with the ALCOR LabHave fun with Computer Graphics and Mixed RealityL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201612 / 39

Outline1IntroductionWhat is SLAMMotivations2Visual Odometry (VO)Problem FormulationVO AssumptionsVO AdvantagesVO PipelineVO DriftVO or SFM3Visual SLAMVO vs Visual SLAML. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201613 / 39

Visual SLAMVO Problem FormulationAn agent is moving through the environment and taking images witha rigidly-attached camera system at discrete times kIn case of a monocular system, the set of images taken at times k isdenoted byIl,0:n {I0 , ., In }In case of a stereo system, the set of images taken at times k isdenoted byIl,0:n {Il,0 , ., Il,n }Ir ,0:n {Ir ,0 , ., Ir ,n }In this case, without loss of generality, the coordinate system of theleft camera can be used as the originL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201614 / 39

Visual SLAMVO Problem FormulationIn case of a RGB-D camera, the set of images taken at times k isdenoted byI0:n {I0 , ., In }D0:n {D0 , ., Dn }Two camera positions at adjacent time istants k 1 and k are relatedRk 1,k tk 1,kby the rigid body transformation Tk 01The set T1:n {T1 , ., Tn } contains all the subsequent motionskL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201615 / 39

Visual SLAMVO Problem FormulationThe set of camera pose C0:n {C0 , ., Cn } contains thetransformations of the camera w.r.t. the initial coordinate frame atk 0The current camera pose Cn can be computed by concatenating allthe transformations T1:k , thereforeCn Cn 1 Tnwith C0 being the camera pose at the instant k 0, which can bearbitrarily set by the userL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201616 / 39

Visual SLAMVO Problem FormulationThe main task of VO is to compute the relative transformations Tkfrom images Ik and Ik 1 and then to concatenate thesetransformation to recover the full trajectory C0:n of the cameraThis means that VO recovers the path incrementally, pose after poseL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201617 / 39

Outline1IntroductionWhat is SLAMMotivations2Visual Odometry (VO)Problem FormulationVO AssumptionsVO AdvantagesVO PipelineVO DriftVO or SFM3Visual SLAMVO vs Visual SLAML. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201618 / 39

Visual SLAMVO AssumptionsUsual assumptions about the environmentSufficient illumination in the environmentDominance of static scene over moving objectsEnough texture to allow apparent motion to be extractedSufficient scene overlap between consecutive framesAre these examples OK?L. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201619 / 39

Outline1IntroductionWhat is SLAMMotivations2Visual Odometry (VO)Problem FormulationVO AssumptionsVO AdvantagesVO PipelineVO DriftVO or SFM3Visual SLAMVO vs Visual SLAML. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201620 / 39

Visual SLAMVO AdvantagesAdvantages of Visual odometryContrary to wheel odometry, VO is not affected by wheel slip inuneven terrain or other adverse conditions.More accurate trajectory estimates compared to wheel odometry(relative position error 0.1% 2%)VO can be used as a complement to wheel odometryGPSinertial measurement units (IMUs)laser odometryIn GPS-denied environments, such as underwater and aerial, VO hasutmost importanceL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201621 / 39

Outline1IntroductionWhat is SLAMMotivations2Visual Odometry (VO)Problem FormulationVO AssumptionsVO AdvantagesVO PipelineVO DriftVO or SFM3Visual SLAMVO vs Visual SLAML. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201622 / 39

Visual SLAMVisual Odometry PipelineVisual odometry (VO) feature-basedOverview1Feature detection2Feature matching/tracking3Motion estimation4Local optimizationL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201623 / 39

Visual SLAMVisual Odometry PipelineVisual odometry (VO) feature-basedAssumption: camera is well calibrated1Feature detection:Detect a set of features fk at time k(General idea: extract high-contrast areas in the image)L. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201624 / 39

Visual SLAMVisual Odometry Pipeline2Feature matching/Feature trackingFind correspondences between set of features fk 1 , fktracking: locally search each feature (e.g. by prediction and correlation)matching: independently detect features in each image and findcorrespondences on the basis of a similarity metric (exploit descriptorssuch SURF, SIFT, ORB, etc)L. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201625 / 39

Visual SLAMVisual Odometry Pipeline3Motion estimationCompute transformation Tk between two images Ik 1 and Ik from two sets ofcorresponding features fk 1 , fk .Different algorithms depending on available sensor data:2-D to 2-D: works on fk 1 , fk specified in 2-D image coords3-D to 3-D: works on Xk 1 , Xk , sets of 3D points corresponding tofk 1 , fk3-D to 2-D: works on Xk 1 , set of 3D points corresponding to fk 1 ,and on fk their corresponding 2-D reprojections on the image IkL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201626 / 39

Visual SLAMVisual Odometry PipelineLocal optimization4An iterative refinement over last m poses can be optionally performed aftermotion estimation to obtain a more accurate estimate of the local trajectoryOne has to minimize the following image reprojection error PRtk 1,kTk k 1,k arg min i,k kpki g (X i , Ck )k01X i ,Ckwhere pik is the i-th image point of the 3D landmark Xi measured in the k-th image andg (Xi , Ck ) is its image reprojection according to the current camera pose CkL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201627 / 39

Visual SLAMVisual Odometry PipelineVO from 2-D to 2-D (feature-based)1 Capture new frame Ik2 Extract and match features between Ik 1 and Ik3 Compute essential matrix for image pair Ik 1 , Ik4 Decompose essential matrix into Rk and tk , and form Tk5 Compute relative scale and rescale tk accordingly6 Concatenate transformation by computing Ck Ck 1 Tk7 Repeat from 1).NOTE: The minimal-case solution involves 5-point correspondencesL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201628 / 39

Visual SLAMVisual Odometry PipelineVO from 3-D to 3-D (feature-based)1Capture two stereo image pairs Il,k 1 ,Ir ,k 1 and Il,k ,Ir ,k2Extract and match features between Il,k 1 ,Il,k3Triangulate matched features for each stereo pair. Hence:Il,k 1 , Ir ,k 1 Xk 1Il,k , Ir ,k Xk4Compute Tk from 3-D features Xk 1 and Xk5Concatenate transformation by computing Ck Ck 1 Tk6Repeat from 1).NOTE: The minimal-case solution involves 3 non-collinear correspondencesL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201629 / 39

Visual SLAMVisual Odometry PipelineVO from 3-D to 2-D (feature-based)1Do only once:1.1 Capture two frames Ik 2 , Ik 11.2 Extract and match features between them1.3 Triangulate features from Ik 2 , Ik 1 and get Xk 12Do at each iteration:2.1 Capture new frame Ik2.2 Extract features and match with previous frame Ik 12.3 Compute camera pose (PnP) from 3-D-to-2-D matches (between fkand Xk 1 )2.4 Triangulate all new features between Ik 1 and Ik and get Xk2.5 Iterate from 2.1L. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201630 / 39

Outline1IntroductionWhat is SLAMMotivations2Visual Odometry (VO)Problem FormulationVO AssumptionsVO AdvantagesVO PipelineVO DriftVO or SFM3Visual SLAMVO vs Visual SLAML. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201631 / 39

Visual SLAMVisual Odometry DriftVO drift1 The errors introduced by each new frame-to-frame motionaccumulate over time2 This generates a drift of the estimated trajectory from the real oneNOTE: the uncertainty of the camera pose at Ck is a combination of theuncertainty at Ck 1 (black solid ellipse) and the uncertainty of the transformationTk,k 1 (gray dashed ellipse)L. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201632 / 39

Outline1IntroductionWhat is SLAMMotivations2Visual Odometry (VO)Problem FormulationVO AssumptionsVO AdvantagesVO PipelineVO DriftVO or SFM3Visual SLAMVO vs Visual SLAML. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201633 / 39

Visual SLAMVO or SFM 1/2VO or SFM1SFM is more general than VO and tackles the problem of 3Dreconstruction of both the structure and camera poses fromunordered image sets2The final structure and camera poses are typically refined with anoffline optimization (i.e., bundle adjustment), whose computationtime grows with the number of imagesvideo SFML. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201634 / 39

Visual SLAMVO or SFM 2/2VO or SFMVO is a particular case of SFMVO focuses on estimating the 3D motion of the camera sequentially(as a new frame arrives) and in real time.Bundle adjustment can be used (but its optional) to refine the localestimate of the trajectoryL. Freda (University of Rome ”La Sapienza”)Visual SLAMMay 3, 201635 / 39 pag

Apple (acquisition of Metaio and Primesense, driverless car) Dyson (funded Dyson Robotics Lab, Research lab at Imperial College in London) Microsoft (Hololens and its app marketplace) Magic Leap (funded by Google with 542M) How many apps related to machine learning and pattern recognition? L. Freda (University of Rome "La Sapienza") Visual SLAM May 3, 2016 8 / 39 . Why working on Visual