Skip to content

Latest commit

 

History

History
75 lines (43 loc) · 3.98 KB

README_updated.md

File metadata and controls

75 lines (43 loc) · 3.98 KB

MobileNet v1 + SkipNet

The purpose of the project is to create a model capable of real-time semantic segmentation for autonomous driving tasks. This particular model focuses mainly on the drivable surface of the drive scene, trained on two different datasets (Cityscapes, BDD100k).

metrics measuring, demo videos ( segmentation, lidar), .sh start - things to run it, not the training stuff)

PUT DEMO GIF HERE 3D POINT CLOUD AND LANE SEGMENTATION

https://drive.google.com/drive/folders/1VrDMp4ggWMrSRs_SPlASP4Z8Hwmq_ubn?usp=sharing

Table of Contents

The Goal

Most of the research on semantic segmentation focuses on increasing the accuracy of the models while computationally efficient solutions are hard to come by and even then, huge compromises have to be made.

Since the goal is clear that we are looking at basic autonomous driving, the model focuses on adequate accuracy on the road, and its boundaries (this can be expanded) so we can estimate our drivable surface or the individual lanes.

To achieve this goal, I implemented a custom architecture with separate feature extraction and decoder modules. The inspiration for this work comes from this paper.

Optimization is done with TensorRT to achieve the fastest inference speed possible for the model so it can be used for testing in autonomous vehicles. Since the repository uses TensorFlow

Installation

pip3 install -r requirements.txt

Usage

Here goes the commands to start with / without tensorrt -> individual image, video stream, capture video, ROS etc. [.sh file to start it on our car!!! (+ instruction for these to starting from the repo + weights DL, with the metrics shown)!!!]

Evaluate Metrics

3D Point Cloud

put this into results to

Results

video, images

Pre-trained Weights

The model has been trained on the Berkeley Deep Drive and the Cityscapes dataset. Using the former weights results in lane segmentation, while the latter achieves drivable surface segmentation without differentiating the lanes. The weights trained on different input sizes can be found in the table below.

| Dataset | Size (H x W) | | :--- | :----: | ---: | | Cityscapes | 288 x 512 (Google Drive), 376 x 672 (Google Drive) | | BDD100k | 288 x 512 (Google Drive), 376 x 672 (Google Drive) |

Acknowledgements

"RTSeg: Real-time Semantic Segmentation Comparative Study" by Mennatullah Siam, Mostafa Gamal, Moemen Abdel-Razek, Senthil Yogamani, Martin Jagersand

"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications" by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam

"Rethinking the Inception Architecture for Computer Vision" by Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna

"Fully Convolutional Networks for Semantic Segmentation" by Jonathan Long, Evan Shelhamer, Trevor Darrell

"Revisiting Distributed Synchronous SGD" by Jianmin Chen, Xinghao Pan, Rajat Monga, Samy Bengio, Rafal Jozefowicz