This repository contains the codes written during the Summer Internship at ARTPARK, IISC Banglaore. The project is builing the PERCEPTION for a self driving car. Tasks such as Object detection, Multi-object tracking and Agent trajectory prediction are implemented.
Perception is a central problem for any autonomous agent, be it humans, robots or self-driving vehicles. This module helps for a smoother and more reliable control of the car using the path-planning module of the autonomous agent. It can also aid in pose estimation. For our project, we have included the following sub-modules for the perception:
- Multi-object detection using the YOLOv5 algorithm
- Multi-object tracking using the Deep Sort algorithm
- Trajectory prediction using the PEC Net algorithm
The codes have been tried on Windows 10 and Windows 11 with Python 3.8, Torch 1.9.0 and Cuda 11.1 on NVIDIA GeForce RTX 3060 GPU.
The models were tried on Lyft level 5 dataset and the KITTI dataset. The results obtained on the model are as follows:
Original Video | YOLO v5 Predictions |
---|---|
![]() |
![]() |
Original Video | YOLO v5 Predictions |
---|---|
![]() |
![]() |
The FPS obtained after object detection are as follows:
Lyft Level 5 Dataset | KITTI Dataset | |||
Avg FPS | Min FPS | Avg FPS | Min FPS | |
NVIDIA GeForce RTX 3060 mobile GPU | 22.19 | 14.33 | 25.86 | 16.13 |
NVIDIA Telsa T4 GPU | 30.15 | 21.88 | 31.35 | 19.27 |
Original Video | Deep Sort Tracking |
---|---|
![]() |
![]() |
Original Video | Deep Sort Tracking |
---|---|
![]() |
![]() |
The FPS obtained after object detection are as follows:
Lyft Level 5 Dataset | KITTI Dataset | |||
Avg FPS | Min FPS | Avg FPS | Min FPS | |
NVIDIA GeForce RTX 3060 mobile GPU | 12.96 | 8.72 | 13.14 | 7.21 |
NVIDIA Telsa T4 GPU | 14.19 | 11.69 | 14.04 | 9.36 |
The FPS obtained for the above result is 40.
- Clone the repository
git clone --recurse-submodules https://github.com/TheShiningVampire/PERCEPTION_ARTPARK
- Install the required dependencies
cd PERCEPTION_ARTPARK
pip install -r requirements.txt
- Run the object detection model
cd Object_detection\yolov5
python detect.py --source ... --show-vid # Show live inference as a video
- The --source argument can be either of these:
- Video:
--source file.mp4
- Webcam:
--source 0
- RTSP stream:
--source rtsp://170.93.143.139/rtplive/470011e600ef003a004ee33696235daa
- HTTP stream:
--source http://wmccpinetop.axiscam.net/mjpg/video.mjpg
- Video:
- Run the object tracking model
cd Object_tracking\Yolov5_DeepSort_Pytorch
python track.py --source ... --show-vid # Show live inference as a video
- The --source argument can be either of these:
- Video:
--source file.mp4
- Webcam:
--source 0
- RTSP stream:
--source rtsp://170.93.143.139/rtplive/470011e600ef003a004ee33696235daa
- HTTP stream:
--source http://wmccpinetop.axiscam.net/mjpg/video.mjpg
- Video:
There is a clear trade-off between model inference speed and accuracy. In order to make it possible to fulfill your inference speed/accuracy needs you can select a Yolov5 family model for automatic download
python track.py --source 0 --yolo_weights yolov5s.pt --img 640 # smallest yolov5 family model
python track.py --source 0 --yolo_weights yolov5x6.pt --img 1280 # largest yolov5 family model
- mikel-brostrom (Codes for Object tracking are modified from the original codes by mikel-brostrom)
- HarshayuGirase (Codes for PECNet are modified from the original codes by HarshayuGirase)
All the results obtained can be found in the Google Drive