This repository hosts a pipeline for the detection, tracking and classification of Traffic Signs as part of the IEEE Video & Image Processing Cup 2017. The dataset used in this project is the CURE-TSD Dataset which consists video sequences of Traffic signs under real and augmented conditions. This project won the place of 2nd runner-up in the IEEE Video and Image Processing Challenge 2017.
Traffic sign recognition is a multi-class classification problem where the class frequencies are practically random. It deals with a real-time computer vision problem of high practical interest. Now-a-days, it has become an essential component of the Driver Assistance System (DAS) and Unmanned Ground Vehicle (UGV).
This traffic sign detection architecture is proposed where the detection is to be done from the CURE-TSD Dataset.
The video sequences in the CURE-TSD dataset are grouped into two classes:
-
Real data correspond to processed versions of sequences acquired from real world. 49 real sequences.
-
Unreal data corresponds to synthesized sequences generated in a virtual environment. 49 unreal sequences.
-
Train-Test Split: 70-30 (34 Training Videos, 15 Test Videos)
-
300 frames/sequence
-
12 types of effects
-
5 different challenge levels
-
Total of 2,989 (49125+49) Real video sequences
-
Total of 2,744 (49115+49) Synthesized video sequences
-
Total No. of Frames: Around 1.72 Million
An overview of the Traffic Sign Recognition and Classification Pipeline is illustrated in the following figure:
The input to the System is the video frames. The video frames are extracted using a Frame-extractor. The frame extractor then feeds the frames to the Challenge Classifier which utilizes an RCNN to classify the challenge type of the video. The decision of the frame extractor is used to select the appropriate sign-type classifier.
The extracted frames are also fed to the Bounding Box Detector Module which uses an F-RCNN to generate Raw Bounding Boxes of the Region of Interest. A Hybrid Tracker Module is used to keep track of the bounding boxes frame-by-frame.
The region proposals are fed to the Sign Type Classifier which classifies the Traffic Signs into the relevant categories or provides a negative output if the proposal does not contain a sign. This negative feedback is also fed to the Sign Type Classifier.
The frame extractor utilizes OpenCV to extract frames from the video sequence. These frames are utilized for training all the subsequent deep learning blocks of the system.
Objectives:
- Classify the Challenge Type
- Aid Decision Making of Sign Classifier and Tracker Module
- Allow application of dynamic image processing on image to remove artifacts
Network Used: RCNN (Recurrent Convolutional Neural Network)
- 6 Recurrent Layers with Pooling & Dropout in between
Recurrent Layer Structure:
- 4 Convolutional Layers
- Activation Layers & Normalization Layers
- Output of differnt Conv. Layers summed
A few of the 12 challenge type classes were combined to yield higher accuracy in classification:
- No Challenge + Codec Error + Decolor
- Gaussian Blur + Lens Blur
- Rain + Noise
The challenge classifier was trained for a total of 9 classes instead of 12
Objective:
- To Locate Region of Interest (ROI)
- Provide Bounding Boxes around Possible Regions
- Provide Potential Locations for Tracking
The Bounding Box Detector uses FRCNN (Faster Recurrent Convolutional Neural Networks) for the detection of regions of interest. FRCNNs are used instead of typical RCNNs because they consume less time. The FRCNN contains two modules, the Region Proposal Network (RPN) and fast-RCNN detector. The region proposal network gives some rectangular object proposal and their objectness scores. It tells the detector module where to look at. A Resnet50 network is used both in the RPN and the RCNN as backbone. Maximum overlap threshold used for the RPN is 0.7 and RPN stride is 16.
The following figures illustrates the working principle of the Bounding Box Detector:
- Only top half of frames are searched
- Logical Assumption: Traffic Signs placed in upper field of view of driver
- cropped frame divided into two halves: Left & Right
- Halves separately passed to FRCNN network
Convolution:
- Creates Feature Maps
- Reveals sub-surface features
Output of final convolution passed to the Region Proposal Network (RPN)
- Sliding Window (3x3) Moves across feature map, Generating Feature array
- Feature array fed to paired, fully connected networks:
- Box Regression Network: Outputs co-ordinates of bounding boxes
- Box Classification Network: Decides whether proposed boxes are ROI
Objective:
- Contingency Plan for FRCNN failure
- Keeps track of potential Regions of Interest
Two Separate Tracker modules are used to improve Robustness:
- Lucas-Kanade Tracker
- Kalman Filtering
- Tracker System compensates for FRCNN dropping boxes
- Green Box is tracker-predicted position of Box
- Prediction based on
- Optical Flow (Pixel Motion)
- Kalman Filtering
Kalman filter is used for tracking traffic signs in the system. Kalman filter works by predicting and correcting the states of a wide range of linear processes.
In the dataset sometimes there are multiple signs in one frame. As FRCNN detects multiple signs at one instance, there are multiple predictions and multiple measurements. Here the challenge is to assign the measurement of the current state to the prediction of the current state which is based upon the estimate of the previous state. The nearest neighbour concept is adopted to solve the problem. As the positions of the traffic signs do not change abruptly from frame to frame, the Euclidean distance is calculated between the points of prediction and measurement of the current frame. It has been assumed that if any distance between a prediction and a measurement is less than 50 pixels, then that measurement belongs to that prediction. And that measurement-prediction pair is used for updating a priori estimate to get a posteriori estimate.
The Lucas-Kanade method assumes that the displacement of the iamge contenst between two nearby instants (frames) is small and aprroximately constant within a neighborhood of the point p under consideration. Thus the optical flow equations can be assumed to hold for all pixels within a centered window.
The optical flow equations contain more equations than unknowns. The Lucas-Kanade method obtains a compromise solution by the least square principle.
Harris corner detection method has been used to detect corners in images and Lucas-Kanade method is used to track those points and the optical flow vectors of those points are received. Then, from FRCNN, the positins of the signs i.e. region of interests (ROIs) in the image are obtained. Then the nearest points for each ROI from Harris corner detection are detected and used their optical flow vectors to estimate the new positions of the ROIs in the next frame.
Objective:
- Determine whether a region contains a sign or not
- Classify the incoming bounding boxes
- Be able to deal with undesired output from the FRCNN�
Network: CNN (Convolutional Neural Net)
Classes: 14 Sign Types + 9 Extra Classes
Traffic Signs:
- Speed Limit
- Goods Vehicles
- No Overtaking
- No Stopping
- No Parking
- Stop
- Bicycle
- Hump
- No Left
- No Right
- Priority To
- No entry
- Yield
- Parking
- FRCNN has a tendency to detect unwanted elements as potential ROI (doors, windows, rims etc.)
- Certain road signs / sign like objects exist which are not labeled in the ground truth
- Finally, the CNN needs to decide whether a frame is ROI or not to help the tracker take decision.
Extra Classes:
- Tree Leaves
- Miscellaneous Road Signs
- Vertical Pole-like Structures
- Car Parts and windows
- Car Tires
- House windows
- Texture Fills
- Horizontal Pole-like structures
- Diagonal Pole-like structures
- Implement Pre-Processing: Reduce Challenging artifacts
- FRCNN (Bounding Box Detector):
- Use Single FRCNN instead of splitting: Effectively speed up the program
- More Anchor Box Sizes
- Take Larger Training Sample
- Tune FRCNN Parameters
- Hard Negative Mining: Retrain Model with more negative samples, reinforces Model
- Tracker System:
- Implement advanced Kalman Filter that takes into account varying acceleration
- Tune parameters (Life Expectancy, Box Decay rate: Will filter bad boxes)
- Sign Classifier:
- More Sign Classes: to account for sign like objects that were detected