Apollo Vision Net

This repository introduces the Apollo Vision Network, a new 3D detection and occupancy prediction network for autonomous driving.

Overview

We feed multi-camera images to the backbone network, and obtain the features of diffetrent camera views.
Use Transformer encoder to generate the bev features.
- The encoder layer contains grid-shaped BEV queries, temporal self-attention, and spatial cross-attention.
- In spatial cross- attention, each BEV query only interacts with image features in the regions of interest.
- In temporal self-attention, each BEV query interacts with two features: the BEV queries at the current timestamp and the BEV features at the previous timestamp.
Taking the BEV features as input, the multi-task head predicts the preception results.
- The 3D detection head predicts the 3D bounding box and the class probability like BEVFormer.
- The occupancy head first upsamples the BEV features to original resolution and then use linear layer to predict the occupancy probability of each voxel.

Highlights

Our Apollo vision Net proposes several works as follows, significantly improving the performance of 3D detection and occupancy prediction.

Image backbone: Replacing ResNet-50 with pre-trained DLA-34 using depth estimation data (Toyota DDAD15M) reduces model complexity while improving performance.
Image neck: Replacing the single scale FPN network with a SecondFPN network improves the performance of the model.
Detection head: Using GroupDETR instead of DETR significantly improves object detection performance without increasing time consumption.
Occ head: Using low resolution bev queries (50 * 50) in the Transformer encoder, then upsampling to high resolution (200 * 200) in the occ head, significantly improve inference speed.
OCC loss: Increase the weight of OCC focal loss from 1.0 to 10.0, introduce affinity loss and lovasz softmax loss, and significantly improve the miou of OCC detection.

Getting Started

Results and Pre-trained Models

Methods	3d detection mAP	occupancy miou
bevformer-tiny (2022 ECCV)	25.2%	-
	-	19.48%
Apollo-vision-Net (Ours)	31.94% ($ \uparrow $ 6.74%)	21.87% ($ \uparrow $ 2.39%)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
ckpts		ckpts
docs		docs
projects		projects
tools		tools
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Apollo Vision Net

Overview

Highlights

Getting Started

Results and Pre-trained Models

Related resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

ApolloAuto/Apollo-Vision-Net

Folders and files

Latest commit

History

Repository files navigation

Apollo Vision Net

Overview

Highlights

Getting Started

Results and Pre-trained Models

Related resources

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages