Tracking Without Bells And Whistles - CVF Open Access

Transcription

Tracking without bells and whistlesPhilipp Bergmann Tim Meinhardt Laura Leal-TaixeTechnical University of MunichAbstractThe problem of tracking multiple objects in a video sequence poses several challenging tasks. For tracking-bydetection, these include object re-identification, motion prediction and dealing with occlusions. We present a tracker(without bells and whistles) that accomplishes trackingwithout specifically targeting any of these tasks, in particular, we perform no training or optimization on trackingdata. To this end, we exploit the bounding box regression ofan object detector to predict the position of an object in thenext frame, thereby converting a detector into a Tracktor.We demonstrate the potential of Tracktor and provide a newstate-of-the-art on three multi-object tracking benchmarksby extending it with a straightforward re-identification andcamera motion compensation.We then perform an analysis on the performance andfailure cases of several state-of-the-art tracking methodsin comparison to our Tracktor. Surprisingly, none of thededicated tracking methods are considerably better in dealing with complex tracking scenarios, namely, small andoccluded objects or missing detections. However, our approach tackles most of the easy tracking scenarios. Therefore, we motivate our approach as a new tracking paradigmand point out promising future research directions. Overall, Tracktor yields superior tracking performance than anycurrent tracking method and our analysis exposes remaining and unsolved tracking challenges to inspire future research directions.or data association, is a challenging task on its own, dueto missing and spurious detections, occlusions, and targetinteractions in crowded environments. To address these issues, research in this area has produced increasingly complex models achieving only marginally better results, e.g.,multiple object tracking accuracy has only improved 2.4%in the last two years on the MOT16 [45] benchmark.In this paper, we push tracking-by-detection to the limitby using only an object detection method to perform tracking. We show that one can achieve state-of-the-art trackingresults by training a neural network only on the task of detection. As indicated by the blue arrows in Figure 1, theregressor of an object detector such as Faster-RCNN [52]is sufficient to construct object trajectories in a multitudeof challenging tracking scenarios. This raises an interesting question that we discuss in this paper: If a detector cansolve most of the tracking problems, what are the real situations where a dedicated tracking algorithm is necessary?We hope our work and the presented Tracktor allows researchers to focus on the still unsolved critical challengesof multi-object tracking.This paper presents four main contributions: We introduce the Tracktor which tackles multi-objecttracking by exploiting the regression head of a detectorto perform temporal realignment of object boundingboxes. We present two simple extensions to Tracktor, a reidentification Siamese network and a motion model.The resulting tracker yields state-of-the-art performance in three challenging multi-object trackingbenchmarks.1. IntroductionScene understanding from video remains one of the bigchallenges of computer vision. Humans are often the centerof attention in a scene, which leads to the fundamental problem of detecting and tracking them in a video. Tracking-bydetection has emerged as the preferred paradigm to solvethe problem of tracking multiple objects as it simplifies thetask by breaking it into two steps: (i) detecting object locations independently in each frame, (ii) form tracks by linking corresponding detections across time. The linking step, Contributed We conduct a detailed analysis on failure cases andchallenging tracking scenarios, and show none of thededicated tracking methods perform substantially better than our regression approach. We propose our method as a new tracking paradigmwhich exploits the detector and allows researchers tofocus on the remaining complex tracking challenges.This includes an extensive study on promising futureresearch directions.equally. Correspondence to: tim.meinhardt@tum.de1941

sha1 base64 "dtNS1wsLpZAS14aseC2vA5 A9 8 " NkOmmHTi7MnAgl5DHc CpuXCjitjvfxmmaRW39YeDnO dw5vx E2cBCsl0hGQl wrj 8ukkZBsEMHMSgn98yaVbcK4VVjl6aGSrU8c 1348v4nreuGeXMCfojY/oLXoOghw /latexit sha1 base64 "dtNS1wsLpZAS14aseC2vA5 A9 8 " NkOmmHTi7MnAgl5DHc CpuXCjitjvfxmmaRW39YeDnO dw5vx E2cBCsl0hGQl wrj 8ukkZBsEMHMSgn98yaVbcK4VVjl6aGSrU8c 1348v4nreuGeXMCfojY/oLXoOghw /latexit latexitsha1 base64 "dtNS1wsLpZAS14aseC2vA5 A9 8 " NkOmmHTi7MnAgl5DHc CpuXCjitjvfxmmaRW39YeDnO dw5vx E2cBCsl0hGQl wrj 8ukkZBsEMHMSgn98yaVbcK4VVjl6aGSrU8c 1348v4nreuGeXMCfojY/oLXoOghw /latexit latexitsha1 base64 "dtNS1wsLpZAS14aseC2vA5 A9 8 " NkOmmHTi7MnAgl5DHc CpuXCjitjvfxmmaRW39YeDnO dw5vx E2cBCsl0hGQl wrj 8ukkZBsEMHMSgn98yaVbcK4VVjl6aGSrU8c 1348v4nreuGeXMCfojY/oLXoOghw /latexit latexit latexitsha1 base64 "8 LsIkjQBAyAPLW3r2VvOu0fD3k " YCQ4ynYeA TH6ASdIRedozq6Qg3URBQ9omf0it6sJ vFerc 5q0FK585RH9gff4Ai4CVFQ /latexit sha1 base64 "8 LsIkjQBAyAPLW3r2VvOu0fD3k " YCQ4ynYeA TH6ASdIRedozq6Qg3URBQ9omf0it6sJ vFerc 5q0FK585RH9gff4Ai4CVFQ /latexit latexitsha1 base64 "8 LsIkjQBAyAPLW3r2VvOu0fD3k " YCQ4ynYeA TH6ASdIRedozq6Qg3URBQ9omf0it6sJ vFerc 5q0FK585RH9gff4Ai4CVFQ /latexit latexitsha1 base64 "8 LsIkjQBAyAPLW3r2VvOu0fD3k " YCQ4ynYeA TH6ASdIRedozq6Qg3URBQ9omf0it6sJ vFerc 5q0FK585RH9gff4Ai4CVFQ /latexit latexit latexitsha1 base64 "jGHMqrvPFGPsqBzpKRvDFNozkpU " yXLWzYbuvuKwlp B1ePGiMV3 MN/ NW hBwUk2mcy8lzc7QSKFQdf9dkobm1vbO Xdyt7 weFR9fikY 9yM6UiIUjKKV/H5EccyozG7nAxxUa27dXYCsE68gNSjQGlS/ Y/laMkpdk7hD5zPHwHski0 /latexit sha1 base64 "aVjFPQydqUSKL9TtpMDKgLFJojk " S8jgIlb0WKU9 nbJnoGsEmdBipVs7fvj/u6t2i18dnohi30eIJNU67ZjR gmVKFgkk/znVjziLIRHfC2oQH1uXaTWegpOTVKj/RDZV6AZKb 8ozlBODKFMCZOVsCFVlKHpKW9KcJa/vEoa5ZJjl5yaaaMMc fwBuiZY3 /latexit latexitsha1 base64 "aVjFPQydqUSKL9TtpMDKgLFJojk " S8jgIlb0WKU9 nbJnoGsEmdBipVs7fvj/u6t2i18dnohi30eIJNU67ZjR gmVKFgkk/znVjziLIRHfC2oQH1uXaTWegpOTVKj/RDZV6AZKb 8ozlBODKFMCZOVsCFVlKHpKW9KcJa/vEoa5ZJjl5yaaaMMc fwBuiZY3 /latexit latexitsha1 base64 "lFvEJ01ekW7jt90bz3mQILAAvGw " xFIYdN0vZ2Nzazu3k98t7O0fHB4Vj0 MbcD P7v9iCUhV8gkNabjuTH6KdUomOSzQjcxPKZsTIe8Y6miITd Pz6rwtRjec5c4p/IHz/gP3yZRf /latexit latexit latexitsha1 base64 "zes3vBFXxjNR5abVqS2GqVmU2R0 " YnvcmY2QczvUII QIvHhTx6id582 P DCWoRScrNSgfqnsVtwF2DrxclKGHPV 6as3SEQWYUxCcWO6npuSP DhdwBR7cQg3uoQ5NEIDwDK/w5jw6L86787Fs3XDymTP4A fzB90njO4 /latexit sha1 base64 "zes3vBFXxjNR5abVqS2GqVmU2R0 " YnvcmY2QczvUII QIvHhTx6id582 P DCWoRScrNSgfqnsVtwF2DrxclKGHPV 6as3SEQWYUxCcWO6npuSP DhdwBR7cQg3uoQ5NEIDwDK/w5jw6L86787Fs3XDymTP4A fzB90njO4 /latexit latexitsha1 base64 "zes3vBFXxjNR5abVqS2GqVmU2R0 " YnvcmY2QczvUII QIvHhTx6id582 P DCWoRScrNSgfqnsVtwF2DrxclKGHPV 6as3SEQWYUxCcWO6npuSP DhdwBR7cQg3uoQ5NEIDwDK/w5jw6L86787Fs3XDymTP4A fzB90njO4 /latexit latexitsha1 base64 "zes3vBFXxjNR5abVqS2GqVmU2R0 " YnvcmY2QczvUII QIvHhTx6id582 P DCWoRScrNSgfqnsVtwF2DrxclKGHPV 6as3SEQWYUxCcWO6npuSP DhdwBR7cQg3uoQ5NEIDwDK/w5jw6L86787Fs3XDymTP4A fzB90njO4 /latexit latexitlatexit sha1 base64 "zes3vBFXxjNR5abVqS2GqVmU2R0 " YnvcmY2QczvUII QIvHhTx6id582 P DCWoRScrNSgfqnsVtwF2DrxclKGHPV 6as3SEQWYUxCcWO6npuSP DhdwBR7cQg3uoQ5NEIDwDK/w5jw6L86787Fs3XDymTP4A fzB90njO4 /latexit sha1 base64 "zes3vBFXxjNR5abVqS2GqVmU2R0 " YnvcmY2QczvUII QIvHhTx6id582 P DCWoRScrNSgfqnsVtwF2DrxclKGHPV 6as3SEQWYUxCcWO6npuSP DhdwBR7cQg3uoQ5NEIDwDK/w5jw6L86787Fs3XDymTP4A fzB90njO4 /latexit latexitsha1 base64 "zes3vBFXxjNR5abVqS2GqVmU2R0 " YnvcmY2QczvUII QIvHhTx6id582 P DCWoRScrNSgfqnsVtwF2DrxclKGHPV 6as3SEQWYUxCcWO6npuSP DhdwBR7cQg3uoQ5NEIDwDK/w5jw6L86787Fs3XDymTP4A fzB90njO4 /latexit latexitsha1 base64 "zes3vBFXxjNR5abVqS2GqVmU2R0 " YnvcmY2QczvUII QIvHhTx6id582 P DCWoRScrNSgfqnsVtwF2DrxclKGHPV 6as3SEQWYUxCcWO6npuSP DhdwBR7cQg3uoQ5NEIDwDK/w5jw6L86787Fs3XDymTP4A fzB90njO4 /latexit latexittlatexitsha1 base64 "/fTSQf SfARWoipJYr60EFcz1iQ " AAAB kobm1vbO Xdyt7 wWHVPjru6CRTjLdZIhPVC6jmUsS8jQIl76WK0yiQvBtMbuZ 95ErLZL4Hqcp9yI6ikUoGEUj Jte67TopeThUKJvmsMsg0Tymb0BHvGxrTiGsvXxw I dGGZIwUaZiJAv190ROI62nUWA6I4pjverNxf VYVE4K7 rxXq3PpatJauYOYE/sD5/AD3Nkr0 /latexit sha1 base64 "ubd8xVaxQUPvWoxeUfH1hxlVLwE " AAAB Gd3Qy 4ZuShRwKliW7ySaxYQOSZ 1DZUkZLqbTg/P8KlRejiIlCkJeKr nkhJqPUo9E1nSGCgF72J BZJU3IbiLLy 9ar9TZrXbHmM4foD6z3H8GHltg /latexit latexitsha1 base64 "ubd8xVaxQUPvWoxeUfH1hxlVLwE " AAAB Gd3Qy 4ZuShRwKliW7ySaxYQOSZ 1DZUkZLqbTg/P8KlRejiIlCkJeKr nkhJqPUo9E1nSGCgF72J BZJU3IbiLLy 9ar9TZrXbHmM4foD6z3H8GHltg /latexit latexitsha1 base64 "okL4prYJuqY7IAYUhgebUSI2g8U " AAAB t0MkkzEyEGvIlblwo4tZPceeun JQL8JDwUJGsDaSb5ezfhCiIH8Y 5m dHPfrjg1Zw60TtwlqTROq9MpADR9 L66Rdr7lOzb0zadRhgSKcwTlUwYUraMAtNKEFBFJ4hld4s56sF vd li0FqzlzAn8gfX5A0ZylP0 /latexit latexit latexitbkt 1 sha1 base64 "MyQlsAAGzXFtv3GTeSC6lseIHZQ " AAAB7HicbVBNT8JAEJ3iF 6g/y5r9xgR4UfMkkL M5QzSyjTwt5K2IRqytDmU7EheOsvb5JOo 65de/erbUaRRxluIBLuAYPbqAFd9AGHxgIeIZXeHOU8 K8Ox r1pJTzJzDHzifP Dpjqc /latexit sha1 base64 "/Wm/2PYvBwfP1oNECNOdQ0csRZI " /Zgw17e qRKBCsHYwvZ377ninNY3mDk4T5ERlKHnJK0Eievh33sV qODVnDnuduEtSaZSr3w8fd8Vmv/TZG8Q0jZhEKojWXddJ0M ZFI60kUmM6I4EivejPxP6 GF0taT9ar9bZozVnLmWP4A v9B/dBkbE /latexit latexitsha1 base64 "/Wm/2PYvBwfP1oNECNOdQ0csRZI " /Zgw17e qRKBCsHYwvZ377ninNY3mDk4T5ERlKHnJK0Eievh33sV qODVnDnuduEtSaZSr3w8fd8Vmv/TZG8Q0jZhEKojWXddJ0M ZFI60kUmM6I4EivejPxP6 GF0taT9ar9bZozVnLmWP4A v9B/dBkbE /latexit latexitsha1 base64 "WvsDktUrJR22nvyprY4proAJFCE " AAAB7HicbVA9SwNBEJ2LXzF U/yE78M24 n3WSIT3Q2p4VIo7qNAybup5jQOJe arNSu37HQBa/fLn3SBhWcwVMkmN6XluikFONQom 4tJ0xxZFZ9Wbif14vw gyyIVKM Rt1z696NTaMBCxThFM6gBh5cQBOuoQU MBDwCM/w4ijnyXl13hatBWc5cwJ/4Hz8AISvkJw /latexit latexit latexitsha1 base64 "oIFpz68VxKy7JAky7yzjV9KdcPs " mkd9rQTGZIMkoZ h9uXCji1n9x59 YaQfR1gOBk3Puvbk5fiy4Nq775aytb2xubRd2irt7 SPJJ3ZhpjP6QjyQPOqLHSfV1QrX r1EY0zZhI6wa6mkIep y19eJa1qxXMr3m21XLvM4yjAKZzBBXhwBTW4gQY0gYGCJ3iBV fReXbenPdF6ZqT95zAHzgf38ZUkpo /latexit sha1 base64 "oIFpz68VxKy7JAky7yzjV9KdcPs " mkd9rQTGZIMkoZ h9uXCji1n9x59 YaQfR1gOBk3Puvbk5fiy4Nq775aytb2xubRd2irt7 SPJJ3ZhpjP6QjyQPOqLHSfV1QrX r1EY0zZhI6wa6mkIep y19eJa1qxXMr3m21XLvM4yjAKZzBBXhwBTW4gQY0gYGCJ3iBV fReXbenPdF6ZqT95zAHzgf38ZUkpo /latexit latexitsha1 base64 "oIFpz68VxKy7JAky7yzjV9KdcPs " mkd9rQTGZIMkoZ h9uXCji1n9x59 YaQfR1gOBk3Puvbk5fiy4Nq775aytb2xubRd2irt7 SPJJ3ZhpjP6QjyQPOqLHSfV1QrX r1EY0zZhI6wa6mkIep y19eJa1qxXMr3m21XLvM4yjAKZzBBXhwBTW4gQY0gYGCJ3iBV fReXbenPdF6ZqT95zAHzgf38ZUkpo /latexit latexitsha1 base64 "oIFpz68VxKy7JAky7yzjV9KdcPs " mkd9rQTGZIMkoZ h9uXCji1n9x59 YaQfR1gOBk3Puvbk5fiy4Nq775aytb2xubRd2irt7 SPJJ3ZhpjP6QjyQPOqLHSfV1QrX r1EY0zZhI6wa6mkIep y19eJa1qxXMr3m21XLvM4yjAKZzBBXhwBTW4gQY0gYGCJ3iBV fReXbenPdF6ZqT95zAHzgf38ZUkpo /latexit latexit latexitsha1 base64 "/fTSQf SfARWoipJYr60EFcz1iQ " AAAB kobm1vbO Xdyt7 wWHVPjru6CRTjLdZIhPVC6jmUsS8jQIl76WK0yiQvBtMbuZ 95ErLZL4Hqcp9yI6ikUoGEUj Jte67TopeThUKJvmsMsg0Tymb0BHvGxrTiGsvXxw I dGGZIwUaZiJAv190ROI62nUWA6I4pjverNxf VYVE4K7 rxXq3PpatJauYOYE/sD5/AD3Nkr0 /latexit sha1 base64 "ubd8xVaxQUPvWoxeUfH1hxlVLwE " AAAB Gd3Qy 4ZuShRwKliW7ySaxYQOSZ 1DZUkZLqbTg/P8KlRejiIlCkJeKr nkhJqPUo9E1nSGCgF72J BZJU3IbiLLy 9ar9TZrXbHmM4foD6z3H8GHltg /latexit latexitsha1 base64 "ubd8xVaxQUPvWoxeUfH1hxlVLwE " AAAB Gd3Qy 4ZuShRwKliW7ySaxYQOSZ 1DZUkZLqbTg/P8KlRejiIlCkJeKr nkhJqPUo9E1nSGCgF72J BZJU3IbiLLy 9ar9TZrXbHmM4foD6z3H8GHltg /latexit latexitsha1 base64 "okL4prYJuqY7IAYUhgebUSI2g8U " AAAB t0MkkzEyEGvIlblwo4tZPceeun JQL8JDwUJGsDaSb5ezfhCiIH8Y 5m dHPfrjg1Zw60TtwlqTROq9MpADR9 L66Rdr7lOzb0zadRhgSKcwTlUwYUraMAtNKEFBFJ4hld4s56sF vd li0FqzlzAn8gfX5A0ZylP0 /latexit latexit latexitsha1 base64 "a3zPzj852AusnfoGgARuK7O bNw " ymN 3QySTMTIQS 2yrt7 vrDmKURSsME1brnuYnxM6oMZwJn5X6qMaFsQkfYs1TSCLWfzQ ekXOrDEkYK/ukIXP190ZGI62nUWAnI2rGetnLxf 8wyu8Ocp5cd6dj8XomlPsnMAfOJ8/24aQWw /latexit sha1 base64 "a3zPzj852AusnfoGgARuK7O bNw " ymN 3QySTMTIQS 2yrt7 vrDmKURSsME1brnuYnxM6oMZwJn5X6qMaFsQkfYs1TSCLWfzQ ekXOrDEkYK/ukIXP190ZGI62nUWAnI2rGetnLxf 8wyu8Ocp5cd6dj8XomlPsnMAfOJ8/24aQWw /latexit latexitsha1 base64 "a3zPzj852AusnfoGgARuK7O bNw " ymN 3QySTMTIQS 2yrt7 vrDmKURSsME1brnuYnxM6oMZwJn5X6qMaFsQkfYs1TSCLWfzQ ekXOrDEkYK/ukIXP190ZGI62nUWAnI2rGetnLxf 8wyu8Ocp5cd6dj8XomlPsnMAfOJ8/24aQWw /latexit latexitsha1 base64 "a3zPzj852AusnfoGgARuK7O bNw " ymN 3QySTMTIQS 2yrt7 vrDmKURSsME1brnuYnxM6oMZwJn5X6qMaFsQkfYs1TSCLWfzQ ekXOrDEkYK/ukIXP190ZGI62nUWAnI2rGetnLxf 8wyu8Ocp5cd6dj8XomlPsnMAfOJ8/24aQWw /latexit latexit latexitsha1 base64 "Y/PwATZUN3dDOvWehA/fB2Zuzj4 " jet OFNAoEdoLJTe53nlBpHssHM03Qj hI8pAzaqz0eI8jhTp3B5WqW3PnIKvEK0gVCjQHla/ MGZphNIwQbXueW5i/Iwqw5nAWbmfakwom9AR9iyVNELtZ/OLZ 87FoXXOKmRP4A fzB8m6kOc /latexit sha1 base64 "Y/PwATZUN3dDOvWehA/fB2Zuzj4 " jet OFNAoEdoLJTe53nlBpHssHM03Qj hI8pAzaqz0eI8jhTp3B5WqW3PnIKvEK0gVCjQHla/ MGZphNIwQbXueW5i/Iwqw5nAWbmfakwom9AR9iyVNELtZ/OLZ 87FoXXOKmRP4A fzB8m6kOc /latexit latexitsha1 base64 "Y/PwATZUN3dDOvWehA/fB2Zuzj4 " jet OFNAoEdoLJTe53nlBpHssHM03Qj hI8pAzaqz0eI8jhTp3B5WqW3PnIKvEK0gVCjQHla/ MGZphNIwQbXueW5i/Iwqw5nAWbmfakwom9AR9iyVNELtZ/OLZ 87FoXXOKmRP4A fzB8m6kOc /latexit latexitsha1 base64 "Y/PwATZUN3dDOvWehA/fB2Zuzj4 " jet OFNAoEdoLJTe53nlBpHssHM03Qj hI8pAzaqz0eI8jhTp3B5WqW3PnIKvEK0gVCjQHla/ MGZphNIwQbXueW5i/Iwqw5nAWbmfakwom9AR9iyVNELtZ/OLZ 87FoXXOKmRP4A fzB8m6kOc /latexit latexitbkt 1 latexitRegression1.1. Related workSeveral computer vision tasks such as surveillance, activity recognition or autonomous driving rely on object trajectories as input. Despite the vast literature on multiobject tracking [42, 38], it still remains a challenging problem, especially in crowded environments where occlusionsand false detections are common. Most state-of-the-artworks follow the tracking-by-detection paradigm whichheavily relies on the performance of the underlying detection method.Recently, neural network based detectors have clearlyoutperformed all other methods for detection [33, 52, 50].The family of detectors that evolved to Faster-RCNN [52],and further detectors such as SDP [63], rely on object proposals which are passed to an object classification and abounding box regression head of a neural network. The latter refines bounding boxes to fit tightly around the object.In this paper, we show that one can rethink the use of thisregressor for tracking purposes.Tracking as a graph problem. The data association problem deals with keeping the identity of the tracked objectsgiven the available detections. This can be done on aframe by frame basis for online applications [5, 15, 48] ortrack-by-track [3]. Since video analysis can be done offline, batch methods are preferred since they are more robust to occlusions. A common formalism is to representthe problem as a graph, where each detection is a node,and edges indicate a possible link. The data association942sha1 base64 "BZWOIhb328d 9UtkvCRnowrAjQk " gOfq74mcRFpnUWA6IwJjvezNxP Zekav6M16sl6sd tj0bpmFTMV9AfW5w80m5PE /latexit sha1 base64 "BZWOIhb328d 9UtkvCRnowrAjQk " gOfq74mcRFpnUWA6IwJjvezNxP Zekav6M16sl6sd tj0bpmFTMV9AfW5w80m5PE /latexit latexitDetectionsha1 base64 "BZWOIhb328d 9UtkvCRnowrAjQk " gOfq74mcRFpnUWA6IwJjvezNxP atsQnCXX14lnUbdderuXaPa

Tracking without bells and whistles . Humans are often the center -lem of detecting and tracking them in a video. Tracking-by-detection has emerged as the preferred paradigm to solve the problem of