Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
sathiiii authored Dec 16, 2023
1 parent 75af8be commit 109d64d
Showing 1 changed file with 19 additions and 10 deletions.
29 changes: 19 additions & 10 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,28 @@
layout: home
permalink: index.html

# Please update this with your repository name and title
repository-name: eYY-4yp-project-template
title:
title: S2TPVFormer Landing Page
---

# 3D Occpuancy Prediction for End-to-End Autonomous Driving
# S2TPVFormer: Improving 3D Semantic Occupancy Prediction using Spatiotemporal Transformers

In the field of autonomous driving, accurate representation of the 3D space surrounding the vehicle is crucial for various tasks such as prediction, planning, and motion control. While lidar-based approaches have been used for precise 3D object detection, they have limitations in terms of cost and sensitivity to adverse weather conditions. As a result, achieving reliable and accurate 3D perception using only one or multiple RGB images has become the **holy grail** for autonomous driving. Additionally, temporal reasoning, along with spatial reasoning, plays a vital role in autonomous driving models. Therefore, our project aims to combine the strengths of various approaches to **develop a vision-based End-to-End autonomous driving model** that can compete with existing architectures. However, our **primary goal during the FYP is to _focus on 3D occupancy prediction that can enhance downstream driving and vision tasks_**.
> [Sathira Silva](https://sathiiii.github.io/)*, [Savindu Wannigama](https://savinduwannigama.github.io/)\*, [Prof. Roshan Ragel](https://scholar.google.com/citations?user=UTYj8usAAAAJ&hl=en) $\dagger$, [Gihan Jayatilaka](https://scholar.google.com/citations?user=ZsJpIO8AAAAJ&hl=en) $\ddagger$
\* Equal contribution $\dagger$ Project Supervisor $\ddagger$ Project Co-supervisor

## Introduction

Temporal reasoning holds equal importance to spatial reasoning in a cognitive perception system. In human perception, temporal information is crucial for identifying occluded objects and determining the motion state of entities. A system proficient in spatiotemporal reasoning excels in making inferences with high temporal coherence. While previous works emphasize the significance of temporal fusion in 3D object detection, earlier attempts at 3D Semantic Occupancy Prediction (3D SOP) often overlooked the value of incorporating temporal information. The current state-of-the-art in 3D SOP literature seldom exploits temporal cues. This is evident in [TPVFormer](https://github.com/wzzheng/tpvformer)’s SOP visualizations, where adjacent prediction frames lack temporal coherence as they rely solely on the current time step for semantic predictions.

This work introduces S2TPVFormer, a variant of [TPVFormer](https://github.com/wzzheng/tpvformer), which utilizes a spatiotemporal transformer architecture inspired by [BEVFormer](https://github.com/fundamentalvision/BEVFormer), for dense and temporally coherent 3D semantic occupancy prediction. Leveraging TPV (Top View and Voxel) representation, the model’s spatiotemporal encoder generates temporally rich embeddings, fostering coherent predictions. The study proposes a novel **Temporal Cross-View Hybrid Attention** mechanism, enabling the exchange of spatiotemporal information across different views. To illustrate the efficacy of temporal information incorporation and the potential of the new attention mechanism, the research explores three distinct temporal fusion paradigms.

### Overview of our Contributions

To summarize, this work contributes in the following ways,
- We pioneer the use of TPV representation for embedding spatiotemporal information in 3D scenes within the domain of vision-centric SOP and the broader 3D perception literature.
- We introduce a novel temporal fusion workflow for TPV representation, analyzing how CVHA facilitates the sharing of spatiotemporal information across the three planes.
- The lower parameter model of our method achieves a significant **3.1%** improvement in mIoU for 3D SOP when evaluated on the [nuScenes](https://www.nuscenes.org/nuscenes/) validation dataset with [TPVFormer](https://github.com/wzzheng/tpvformer)’s sparse pseudo-voxel ground truth, compared to TPVFormer.

### Team

Expand All @@ -23,6 +37,7 @@ In the field of autonomous driving, accurate representation of the 3D space surr
- {{ supervisor.name }} - [{{ supervisor.email }}](mailto:{{ supervisor.email }})
{% endfor %}

---

### \[⭐Bookmarks\] Related Articles, Blogs
- [CVPR2023-3D-Occupancy-Prediction](https://github.com/CVPR2023-3D-Occupancy-Prediction/CVPR2023-3D-Occupancy-Prediction)
Expand All @@ -40,7 +55,6 @@ In the field of autonomous driving, accurate representation of the 3D space surr
- [Master the overall construction process of MMDetection](https://zhuanlan.zhihu.com/p/341954021)
- [Awesome-Occupancy-Prediction-Multi-Cameras](https://github.com/chaytonmin/Awesome-Surrounding-Semantic-Occupancy-Prediction)


### Timeline

- **[Apr 27th, 2023]** Literature review started (reading the papers [Attention is All you Need](https://arxiv.org/abs/1706.03762), [NEAT](https://arxiv.org/abs/2109.04456) and [TCP](https://arxiv.org/abs/2206.08129)).
Expand All @@ -66,11 +80,6 @@ In the field of autonomous driving, accurate representation of the 3D space surr
- [WANDB.AI](https://wandb.ai/fyp-3d-occ)
- [Experiment Results](https://docs.google.com/spreadsheets/d/1i-JYNyIdzsA-OY0sdwwf0jy-vq9h5CuZkAKE_B4S8iU/edit#gid=610804817)

<!-- ### Other Links
- [Project Page](https://url.ce.pdn.ac.lk/e17-4yp-3D-Occ-for-AD)
- [University of Peradeniya](https://eng.pdn.ac.lk/)
- [Department of Computer Engineering](http://www.ce.pdn.ac.lk/) -->

### Important Links

- [Zoom Meetings](https://learn.zoom.us/j/61977437413?pwd=SEdRSHNsQUQ0OGcwakxVNkhlOGt4Zz09)
Expand Down

0 comments on commit 109d64d

Please sign in to comment.