Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

SceneSeg

Self-driving cars are usually trained to detect specific object types, such as cars, pedestrians, buses, etc. Such approaches are prone to failure cases when a self-driving car encounters an unusual object that it hasn't seen before, e.g. a rickshaw, or if a self-driving car encounters a strange presentation of a known object, e.g. a cyclist that has fallen over. In these scenarios, self-driving cars are unable to detect these critical objects leading to dangerous driving outcomes. To address this, we have developed SceneSeg, a neural network that is able to segment all important foreground objects, irrespective of what that object is. SceneSeg is able to implicitly learn the visual features of foreground objects such as cars, buses, vans, pedestrians, cyclists, animals, rickshaws, trucks and other similar objects, even though it has not been explicitly trained to detect these object types. SceneSeg is also able to detect objects that are outside of its training data, such as tyres rolling down a highway, or a loose trailer. SceneSeg can also detect objects in unusual presentations that it hasn't seen during training. SceneSeg performs robustly across challenging weather and lighting conditions, including during heavy rain, snow and low light driving. SceneSeg performs out of the box on roads across the world without any parameter tuning. SceneSeg provides self-driving cars with a core safety layer, helping to address 'long-tail' edge cases which plauge object-level detectors. SceneSeg is part of the AutoSeg Foundation Model which forms the core of the vision-pipeline of the Autoware Privately Owned Vehicle Autonomous Highway Pilot System.

SceneSeg GIF SceneSeg GIF Rain

During training, SceneSeg estimates three semantic classes

  • Foreground Objects
  • Background Elements
  • Drivable Road Surface

However, during inference, we only use the outputs from the Foreground Objects class.

Watch the explainer video

Please click the video link to play - Video link

Demo, Training, Inference, Visualization

Please see the Models folder to access the pre-trained network weights for SceneSeg as well as scripts for network training, inference and visualization of network predictions.

Performance Results

SceneSeg was trained on a diverse dataset comprised of multiple open-source datasets, including ACDC, MUSES, IDDAW, Mapillary Vistas and the Comma10K datset. These datasets provide challenging training data covering a wide range of countries, road types, lighting conditions and weather conditions. The BDD100K dataset was not used during training and served as a broad and diverse test set.

Mean Intersection Over Union (mIoU) scores are provided for both validation and test data. Validation results are provided for each of the datasets which comprise the complete validation set, alongside the results for the entire validation set, which are presented in the Cross Dataset column. Per-class mIoU scores are provided, alongside mIoU averaged across classes, as well as an Overall mIoU score which calculates the mIoU between the full multi-class prediction and multi-class ground truth.

Validation Set Performance - mIoU Scores

Cross Dataset Mapillary MUSES ACDC IDDAW Comma10K
Overall 90.7 91.1 83.7 89.3 87.2 92.5
Background Objects 93.5 93.7 89.1 93.2 90.0 95.1
Foreground Objects 58.2 60.9 35.7 46.9 58.6 58.9
Drivable Road Surface 84.2 85.7 70.8 74.4 81.8 86.3
Class Average 78.6 80.1 65.2 71.5 76.8 80.1

Test Set Performance - mIoU Scores

BDD100K
Overall 91.5
Background Objects 94.3
Foreground Objects 69.8
Drivable Road Surface 71.3
Class Average 78.5

Inference Speed

Inference speed tests were performed on a laptop equipped with an RTX3060 Mobile Gaming GPU, and an AMD Ryzen 7 5800H CPU.

FP32 Precision

At FP32 precision, SceneSeg achieved 18.1 Frames Per Second inference speed

FP16 Precision

At FP16 precision, SceneSeg achieved 26.7 Frames Per Second inference speed