From 109d64d5b6877ebc5a4e956f793e6cebd4bb839c Mon Sep 17 00:00:00 2001 From: Sathira Silva Date: Sat, 16 Dec 2023 09:35:01 +0530 Subject: [PATCH] Update README.md --- docs/README.md | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/docs/README.md b/docs/README.md index cd33e21..faceba6 100755 --- a/docs/README.md +++ b/docs/README.md @@ -2,14 +2,28 @@ layout: home permalink: index.html -# Please update this with your repository name and title repository-name: eYY-4yp-project-template -title: +title: S2TPVFormer Landing Page --- -# 3D Occpuancy Prediction for End-to-End Autonomous Driving +# S2TPVFormer: Improving 3D Semantic Occupancy Prediction using Spatiotemporal Transformers -In the field of autonomous driving, accurate representation of the 3D space surrounding the vehicle is crucial for various tasks such as prediction, planning, and motion control. While lidar-based approaches have been used for precise 3D object detection, they have limitations in terms of cost and sensitivity to adverse weather conditions. As a result, achieving reliable and accurate 3D perception using only one or multiple RGB images has become the **holy grail** for autonomous driving. Additionally, temporal reasoning, along with spatial reasoning, plays a vital role in autonomous driving models. Therefore, our project aims to combine the strengths of various approaches to **develop a vision-based End-to-End autonomous driving model** that can compete with existing architectures. However, our **primary goal during the FYP is to _focus on 3D occupancy prediction that can enhance downstream driving and vision tasks_**. +> [Sathira Silva](https://sathiiii.github.io/)*, [Savindu Wannigama](https://savinduwannigama.github.io/)\*, [Prof. Roshan Ragel](https://scholar.google.com/citations?user=UTYj8usAAAAJ&hl=en) $\dagger$, [Gihan Jayatilaka](https://scholar.google.com/citations?user=ZsJpIO8AAAAJ&hl=en) $\ddagger$ + +\* Equal contribution $\dagger$ Project Supervisor $\ddagger$ Project Co-supervisor + +## Introduction + +Temporal reasoning holds equal importance to spatial reasoning in a cognitive perception system. In human perception, temporal information is crucial for identifying occluded objects and determining the motion state of entities. A system proficient in spatiotemporal reasoning excels in making inferences with high temporal coherence. While previous works emphasize the significance of temporal fusion in 3D object detection, earlier attempts at 3D Semantic Occupancy Prediction (3D SOP) often overlooked the value of incorporating temporal information. The current state-of-the-art in 3D SOP literature seldom exploits temporal cues. This is evident in [TPVFormer](https://github.com/wzzheng/tpvformer)’s SOP visualizations, where adjacent prediction frames lack temporal coherence as they rely solely on the current time step for semantic predictions. + +This work introduces S2TPVFormer, a variant of [TPVFormer](https://github.com/wzzheng/tpvformer), which utilizes a spatiotemporal transformer architecture inspired by [BEVFormer](https://github.com/fundamentalvision/BEVFormer), for dense and temporally coherent 3D semantic occupancy prediction. Leveraging TPV (Top View and Voxel) representation, the model’s spatiotemporal encoder generates temporally rich embeddings, fostering coherent predictions. The study proposes a novel **Temporal Cross-View Hybrid Attention** mechanism, enabling the exchange of spatiotemporal information across different views. To illustrate the efficacy of temporal information incorporation and the potential of the new attention mechanism, the research explores three distinct temporal fusion paradigms. + +### Overview of our Contributions + +To summarize, this work contributes in the following ways, +- We pioneer the use of TPV representation for embedding spatiotemporal information in 3D scenes within the domain of vision-centric SOP and the broader 3D perception literature. +- We introduce a novel temporal fusion workflow for TPV representation, analyzing how CVHA facilitates the sharing of spatiotemporal information across the three planes. +- The lower parameter model of our method achieves a significant **3.1%** improvement in mIoU for 3D SOP when evaluated on the [nuScenes](https://www.nuscenes.org/nuscenes/) validation dataset with [TPVFormer](https://github.com/wzzheng/tpvformer)’s sparse pseudo-voxel ground truth, compared to TPVFormer. ### Team @@ -23,6 +37,7 @@ In the field of autonomous driving, accurate representation of the 3D space surr - {{ supervisor.name }} - [{{ supervisor.email }}](mailto:{{ supervisor.email }}) {% endfor %} +--- ### \[⭐Bookmarks\] Related Articles, Blogs - [CVPR2023-3D-Occupancy-Prediction](https://github.com/CVPR2023-3D-Occupancy-Prediction/CVPR2023-3D-Occupancy-Prediction) @@ -40,7 +55,6 @@ In the field of autonomous driving, accurate representation of the 3D space surr - [Master the overall construction process of MMDetection](https://zhuanlan.zhihu.com/p/341954021) - [Awesome-Occupancy-Prediction-Multi-Cameras](https://github.com/chaytonmin/Awesome-Surrounding-Semantic-Occupancy-Prediction) - ### Timeline - **[Apr 27th, 2023]** Literature review started (reading the papers [Attention is All you Need](https://arxiv.org/abs/1706.03762), [NEAT](https://arxiv.org/abs/2109.04456) and [TCP](https://arxiv.org/abs/2206.08129)). @@ -66,11 +80,6 @@ In the field of autonomous driving, accurate representation of the 3D space surr - [WANDB.AI](https://wandb.ai/fyp-3d-occ) - [Experiment Results](https://docs.google.com/spreadsheets/d/1i-JYNyIdzsA-OY0sdwwf0jy-vq9h5CuZkAKE_B4S8iU/edit#gid=610804817) - - ### Important Links - [Zoom Meetings](https://learn.zoom.us/j/61977437413?pwd=SEdRSHNsQUQ0OGcwakxVNkhlOGt4Zz09)