Accurate And Reliable Detection Of Traffic Lights Using .

Transcription

Accurate and ReliableDetection of Traffic LightsUsing Multiclass Learningand Multiobject TrackingZhilu Chen and Xinming HuangSenior Member, IEEEThe authors are with the Department of Electrical and Computer Engineering,Worcester Polytechnic Institute, Worcester, MA 01609, USAE-mail: xhuang@wpi.eduAbstract—Automatic detection of traffic lights has great importance to road safety. This paper presents a novel approach that combines computer vision and machine learning techniques for accurate detection and classification of different typesof traffic lights, including green and red lights both in circular and arrow forms.Initially, color extraction and blob detection are employed to locate the candidates.Subsequently, a pretrained PCA network is used as a multiclass classifier to obtainframe-by-frame results. Furthermore, an online multiobject tracking technique isapplied to overcome occasional misses and a forecasting method is used to filter outDigital Object Identifier 10.1109/MITS.2016.2605381Date of publication: 25 October 2016IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE r 28 r WINTER 20161939-1390/16 2016IEEE

IMAGE LICENSED BY INGRAM PUBLISHINGfalse positives. Several additional optimization techniques areemployed to improve the detector performance and handlethe traffic light transitions. When evaluated using the testvideo sequences, the proposed system can successfully detectthe traffic lights on the scene with high accuracy and stableresults. Considering hardware acceleration, the proposedtechnique is ready to be integrated into advanced driver assistance systems or self-driving vehicles. We build our own dataset of traffic lights from recorded driving videos, includingcircular lights and arrow lights in different directions. Ourexperimental data set is available at http://computing.wpi.edu/Data set.html.I. Introductionutomatic detection of traffic lights is an essentialfeature of an advanced driver assistance system orself-driving vehicle. Today it is a critically important road safety issue that many traffic accidents occurred at intersection are caused by drivers running redlights. Recent data from Insurance Institute of HighwaySafety (IIHS) show that in the year of 2012 on US roads,red-light-running crashes caused about 133,000 injuriesand 683 deaths [1]. Introduction of automatic traffic lightdetection system, especially red light detection, has important social and economic impacts.In addition to detecting traffic lights, it is also importantto recognize the lights as they appear in circular or as directional arrow lights. For example, a red left arrow lightand a green circular light can appear at the same time.Without recognition, the detection systems can get confused because valuable information has been lost. Thereare few papers in the literature that combine detection andrecognition of traffic lights together.Based on our survey, there are very few data sets available for traffic lights. The Traffic Lights Recognition (TLR)public benchmarks [2] contain image sequences with traffic lights and ground truth. However, the images in the dataset do not have high resolution, and the number of physi-Acal traffic lights is limited due to the fact that the imagesequences are converted from a short video. In addition,this data set only contains circular traffic lights, which isnot always the case in real applications. Therefore, we optto build our own data set for traffic light detection, including circular lights and arrow lights in all three directions.Our data set of traffic lights can be used by many otherresearchers in computer vision and machine learning.In this paper, we propose a new method that combinescomputer vision and machine learning techniques. Color extraction and blob detection are used to locate thecandidates, followed by the PCA network (PCANet) [3]classifiers. The PCANet classifier consists of a PCANetand a linear Support Vector Machine (SVM). Our experimental results suggest the proposed method is highly effective for detecting both green and red traffic lights ofmany types.Despite of the effectiveness of PCANet and many outstanding achievements made by computer vision researchers, object detection from an image still makes frequenterrors, which may cause huge problems in the real-worldcritical applications such as Advanced Driver AssistanceSystems (ADAS). Traditional frame-by-frame detectionmethods ignore the inter-frame information in the video.Since the objects in a video are normally in continuous motion, their identities and trajectories are valuable information that can improve the frame-based detection results.Unlike a pure tracking problem that tracks a marked objectfrom the first frame, tracking-by-detection algorithms involves frame-by-frame detection, inter-frame tracking anddata association. In addition, multiobject tracking (MOT)algorithms can be employed to distinguish differentobjects and keep track of their identities and trajectories. When it becomes a multiclass problem such as recognizing different types oftraffic lights, additional procedure such as avoting scheme is often applied. In addition,the method needs to address the situationIEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE r 29 r WINTER 2016

that traffic light status can change suddenly during the detection process.The rest of the paper is organized as follows. Section IIprovides a summary of related work. Section III describes ourdata collection and experimental setup. In Section IV, we propose a method that combines computer vision and machinelearning techniques for traffic light detection using PCANet.In Section IV-C, we propose a MOT-based method that stabilizes the detection and improves the recognition results. Performance evaluations is presented in Section V, followed bysome discussion in Section VI and conclusions in Section VII.II. Related WorkThere are several existing works on traffic light detection.Spot light detection [4], [5] is a method based on the factthat a traffic light is much brighter than the lamp holderusually in black color. A morphological top-hat operator isused to extract the bright areas from gray-scale images,followed by a number of filtering and validating steps. In[6], an interactive multiple-model filter is used in conjunction with the spot light detection. More information is usedto improve its performance, such as status switching probability, estimated position and size. Fast radial symmetrytransform is a fast variation of the circular Hough transform, which can be used to detect circular traffic lights asdemonstrated in [7].Several other methods also combine the vehicle GPSinformation. A geometry-based filtering method is proposed to detect traffic lights using mobile devices at lowcomputational cost [8]. The GPS coordinates of all trafficlights are presumably available, and a camera projectionmodel is used. Mapping traffic light locations is introduced in [9] by using tracking, back-projection and triangulation. Google also presented a mapping and detectionmethod in [10] which is capable of recognizing different types of traffic lights. It predicts when traffic lightsshould become visible with the help of GPS data, followedby classifying possible candidates. Geometric constraintsand temporal filtering are then applied during the detection. The inter-frame information is also helpful for detecting traffic lights. A method that uses Hidden MarkovModel to improve the accuracy and stability of the resultsis demonstrated in [11]. The state transition probabilityof traffic lights is considered and information from several previous frames is used. Reference [12] introducesa traffic light detector based on template matching. Theassumption is that the two off lamps in the traffic lightholder are similar to each other and neither of themlook similar with the surrounding background. In ourearlier work [13], this method is extended by applyingmachine learning techniques and adding additional information during the process. However, only red lightsare considered and its approach is different from that inthis paper in many ways.Deep learning [14], [15] is a class of machine learningalgorithms that has many layers to extract hidden features.Unlike hand-crafted features such as Histograms of OrientedGradients (HOG) features [16], it learns features from training data. PCANet is a simple, yet effective deep learning network proposed by [3]. Principal Component Analysis (PCA) isemployed to learn the filter banks. It can be used to extractfeatures of faces, hand written digits and object images. Ithas been tested on several data sets and delivers surprisinglygood results [3]. Using PCANet in traffic light detection orother similar applications has not been researched thus far.Integration of detection and tracking has been used ina few ADAS-related works. The trajectory of traffic light isused to validate the theoretical result in [6]. Kalman filteris employed to predict the traffic sign positions. It claimsthat tracking algorithm is able to improve the overall system reliability [17], [18]. Utilizing accumulated classifierdecisions from a tracked speed limit sign, a majority votingscheme is proven to be very robust against accidental misclassifications [19].Meanwhile, multiobject tracking (MOT) is aimed attracking every individual object while maintaining theiridentities. MOT does not address the detection problem.Even if an object is mistakenly identified, MOT continuesto track that object [6]. Multiobject tracking builds the trajectories of the objects and then associates the tracking results with the detection results. There are two categories oftracking-by-detection methods: batch methods and onlinemethods. Batch methods [20], [21] usually requires the information of the whole sequence to be available before association, which is not applicable to real-time traffic lightdetection. Online methods [22], [23] do not have such requirements. They utilize the information accumulated upto the current frame and make the estimations, thereforeonline methods can be applied to real-time applications. Itshares some similarities with time series analysis and forecasting [24], whose goal is to predict the future state or valuebased on the past and current observations. The comparisonof our approach and related work is discussed in VI.III. Data Collection and Experimental SetupIn this paper, we focus on the detection of red/green trafficlights and the recognition of their types. The amber lightscan be detected using similar techniques, but we do notconsider amber lights here due to lack of data. The recognition of arrow lights requires that the input frames mustbe high resolution images. Otherwise all lights are justcolored dots or balls in the frame, and it is impossible torecognize them.We mount a smartphone behind the front windshieldand record videos when driving on the road. Several hoursof videos are recorded around the city of Worcester, Massachusetts, USA, during both summer and winter seasons.Subsequently, we select a subset of video frames to buildIEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE r 30 r WINTER 2016

the data set since most of the frames do not contain trafficlights. In addition, passing an intersection only takes a fewseconds in case of the green lights. At red lights, the framesare almost identical as the vehicle is stopped. Thus thelength of selected video for each intersection is very short.Several minutes of traffic-light-free frames are retainedin our data set for assessment of false positives. Each image has a resolution of 1920 1080 pixels. To validate theproposed approach and to avoid overlapping of trainingand test data, the data collected in the summer is used fortraining and the data collected in the winter is used fortesting. Our traffic light data set is made available onlineat http://computing.wpi.edu/Data set.html.GN-1GAL-1GAR-1GAF-1GC-1FIG 1 Examples of 5 classes of Green ROI-1.A. Training DataAll the training samples are taken from the data collectedduring the summer. Input data to the classifier are obtained from the candidate selection procedure described inA, and the classifier output goes to the tracking algorithmfor further processing. Thus evaluation of the classifier isindependent to the candidate selection or the post-processing (tracking). The classifier is trained to distinguish trueand false traffic lights, and to recognize the types of thetraffic lights. OpenCV [25] is used for SVM training, whichchooses the optimal parameters by performing 10-foldcross-validation.The positive samples, which contain the traffic lights,are manually labeled and extracted from the data set images. The negative samples, such as segments of trees andvehicle tail lights, are obtained by applying the candidateselection procedure over the traffic-light-free images. Thegreen lights and red lights are classified separately. Forgreen lights, there are three types base on their aspect ratios. The first type is called Green ROI-1, which containsone green light in each image and its aspect ratio is approximately 1:1. The second type is called Green ROI-3. Itcontains the traffic light holder area which has one greenlight and two off lights, and its aspect ratio is approximately 1:3. The third type is called Green ROI-4. It contains thetraffic light holder area which has one green round light,one green arrow light, and two off lights, and its aspectratio is approximately 1:4.Each type of sample images has several classes. TheGreen ROI-1 and Green ROI-3 both has five classes including negative samples as shown in Fig. 1 and Fig. 2. These5 classes from top to bottom are Green Negative (GN-1;GN-3), Green Arrow Left (GAL-1; GAL-3), Green ArrowRight (GAR-1; GAR-3), Green Arrow Forward (GAF-1; GAF-3)and Green Circular (GC-1; GC-3).The Green ROI-4 also has five classes including negative samples as shown in Fig. 3. The five classes from topto bottom are Green Negative (GN-4), Green Circular andGreen Arrow Left (GCGAL-4), Green Circular and GreenArrow Right (GCGAR-4), Green Arrow Forward and LeftGN-3GAL-3GAR-3GAF-3GC-3FIG 2 Examples of 5 classes of Green ROI-3.(GAFL-4) and Green Arrow Forward and Right (GAFR-4).The Green Negative samples are obtained from trafficlights-free videos by using the color extraction method discussed in Section IV-A.For red lights, there are two types of sample images baseon their aspect ratios. The first type is called Red ROI-1 asshown in Fig. 4. It contain one red light in each image andits aspect ratio is approximately 1:1. The other type is calledRed ROI-3 as shown in Fig. 5. It contains the traffic lightholder which contains one red light and two off lights, andits aspect ratio is approximately 1:3. Each type of sampleIEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE r 31 r WINTER 2016

RN-3GN-4RAL-3GCGAL-4RC-3GCGAR-4FIG 5 Examples of 3 classes of Red ROI-3.Table 1. Number of training samples of Green ROI-n andRed ROI-n.GAFL-4GAFR-4FIG 3 Examples of 5 classes of Green ROI-4.RN-1Classn 1n 3n 4GN-n132181321813213GAL-n1485835 GAR-n1717617 GAF-n24891018 GC-n39093662 GCGAL-n 369GCGAR-n 281GAFL-n 749GAFR-n 1005RN-n77887619 RAL-n12141235 RC-n47685035 RAL-1RC-1FIG 4 Examples of 3 classes of Red ROI-1.images has three classes: Red Negative (RN-1; RN-3), RedArrow Left (RAL-1; RAL-3) and Red Circular (RC-1; RC-3).The Red Negative samples are obtained from traffic-lightsfree videos by using the color extraction method mentionedin IV-A. The red light do not have ROI-4 data because thered light is on top followed by an amber light and one or twogreen lights at the bottom. If the red light is on, the amberand green lights beneath must be off. These three lightsare in ROI-3 vertical setting, regardless of the status of the4th light at the very bottom.Table 1 shows the number of training samples of GreenROI-n and Red ROI-n, where n is 1, 3 or 4.Features of a traffic light itself may not be as rich as other objects such as a human or a car. For example, a circularlight is just a colored blob that looks similar to other objectsin the same color. Therefore, it is difficult to distinguishthe true traffic lights from other false candidates solelybased on color analysis. The ROI-3 and ROI-4 samples areimages of the holders, which provide additional information for detection and classification. The approach of combing all these information together is explained in IV-B2.IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE r 32 r WINTER 2016

Table 2. Information of 23 test sequences.Seq IDFramesTraffic LightsTypes of Traffic hts in all frames.Lights in all frames.Lights in all frames.Lights in all frames.Lights in all frames.Lights at start, then move out.Lights in all frames.Lights in all frames.Lights at start, then move out.Lights at start, then move out.Red lights at start,then green lights.Lights in all 8593630580416550759303581122421860000004207Green circular 2.Green circular 2.Green arrow left 3.Green circular 3.Red circular 2.Green circular 2.Green circular 2.Red circular 2.Green circular 2.Green circular 2.Red circular 3; green arrow left; green arrow right;green arrow forward; green circular.Red arrow left; green arrow right 2;green arrow forward 2.Green circular 2.Green circular 2.Red arrow left; green arrow right 2;green arrow forward 2.Green circular 2.Red circular 2.None.None.None.None.None.None. Lights at start, then move out.Lights in all frames.Lights in all frames.Lights at start, then move out.Lights in all frames.No traffic lights.No traffic lights.No traffic lights.No traffic lights.No traffic lights.No traffic lights. B. Test DataAll test images are taken from the data set that we collected in the winter. The ground truths are manually labeledand are used for validating the results. In our proposedmethod, tracking technique is used to further improve theperformance. However, traffic lights can move out of theimage or change states during the tracking process. Therefore the test sequences need to cover many possible scenarios for all types of lights. Detailed information of the testsequences is shown in Table 2.Input ImageColor Extraction and Candidates SelectionClassification Using PCANet and SVMTracking and ForecastingIV. Proposed Method of TrafficLight Detection and RecognitionOutputFig. 6 shows the flowchart of our proposed method of traffic light detection and recognition, which consists of threestages. Firstly, color extraction and candidates selectionare performed over the input image. Secondly, to determine whether the selected candidates are traffic lightsand what types of lights, they are processed by PCANetand SVM. Finally, tracking and forecasting techniquesare applied to improve the performance and stabilize thefinal output.FIG 6 Flowchart of the proposed method of traffic light detection and recognition.A. Locating Candidates Based on Color ExtractionTo locate the traffic lights, color extraction is applied to locate the Region of Interest (ROI), i.e., the candidates. The images are converted to hue, saturation, and value (HSV) colorspace. Comparing with RGB color space, HSV color spaceis more robust against illumination variation and is moreIEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE r 33 r WINTER 2016

FIG 7 Color extraction, blob detection and closing operation.FIG 8 A sample frame from our traffic light data set.suitable for segmentation [26]. The desired color is extractedfrom an image mainly based on the hue values, which resultsa binary image. Suppose the HSV value of the ith pixel in animage isHSVi " h i, s i, v i ,(1)In order to extract green pixels, we set the color thresholdsbased on the empirical data:40 # h i # 90(2)60 # s i # 255(3)110 # v i # 255(4)In order to extract red pixels, besides (3) and (4), one of thefollowing conditions mu

ty. This paper presents a novel approach that combines computer vision and ma-chine learning techniques for accurate detection and classification of different types of traffic lights, including green and red lights both in circular and arrow forms. Initially, color extraction