This repository contains the official implementation of the proposed framework in paper, accepted by ICRA 2023.
Semi-supervised to Omni-supervised Room Layout Estimation Using Point Clouds
Authors: Huan-ang Gao, Beiwen Tian, Pengfei Li, Xiaoxue Chen, Hao Zhao, Guyue Zhou, Yurong Chen and Hongbin Zha
Institute for AI Industry Research (AIR), Tsinghua University
Room layout estimation is a long-existing robotic vision task that benefits both environment sensing and motion planning. However, layout estimation using point clouds (PCs) still suffers from data scarcity due to annotation difficulty. As such, we address the semi-supervised setting of this task based upon the idea of model exponential moving averaging. But adapting this scheme to the state-of-the-art (SOTA) solution for PC-based layout estimation is not straightforward. To this end, we define a quad set matching strategy and several consistency losses based upon metrics tailored for layout quads. Besides, we propose a new online pseudo-label harvesting algorithm that decomposes the distribution of a hybrid distance measure between quads and PC into two components. This technique does not need manual threshold selection and intuitively encourages quads to align with reliable layout points. Surprisingly, this framework also works for the fully-supervised setting, achieving a new SOTA on the ScanNet benchmark. Last but not least, we also push the semi-supervised setting to the realistic omni-supervised setting, demonstrating significantly promoted performance on a newly annotated ARKitScenes testing set. Our codes, data and models are released in this repository.
Our code needs python=3.6
and CUDA>=10.1
to run. We recommend you to use conda
to create a new environment and install the required packages by running the following command:
conda create -n omni-pq python=3.6
conda activate omni-pq
Then we can install PyTorch and CUDAToolKit by:
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 \
-c pytorch -c conda-forge
We can install the required packages by running the following command:
pip install -r requirements.txt
Also after you install the pre-requisites, you need to build PointNet++ locally by running the following command:
cd pointnet2
python3 setup.py install
For ScanNet dataset, please follow these instructions:
-
Download ScanNet v2 data from here. Move/link the
scans
folder such that underscans
there should be folders with names such asscene0001_01
. -
Extract point clouds and annotations (semantic seg, instance seg etc.) by running
python3 batch_load_scannet_data.py
, which will create a folder namedscannet_train_detection_data
here. -
Download plane annotation for ScanNet v2 dataset from here and extract the
scannet_planes
folder to the same directory as the previous step. -
In
scannet
directory, runpython3 compute_normal_for_pc.py
to pre-compute the normal for each point in the point cloud.
After this, you will have a scannet
folder looking like described in docs/scannet_directory.txt
.
For ARKitScenes dataset, please follow these instructions:
-
Download ARKitScenes dataset from here. We only need the
3dod
dataset in the aformentioned repository. Then extract theARKitScenes
folder to the same directory as the previous step. -
Follow instructions here to prepare whole scene data offline.
-
Step into
ARKitScenes/dataset
and runpython3 compute_normal_for_pc.py
to pre-compute the normal for each point in the point cloud.
We provide a script train.sh
for quick start. You can run the following command to train the model:
bash train.sh --checkpoint_path=pretrained_model/T10-base.pth --rate 0.10
We first train the original PQ-Transformer model with 10% labeled data and save the checkpoint with name T10-base.pth
. Then for our semi-supervised training, we specify the checkpoint path of the original model to resume and set the rate
to control the amount of labeled data.
We provide T10-base.pth
and T100-base.pth
for quick start. You can download them in the Model Zoo section below.
For training models for ARKitScenes dataset, simply put --arkit
as a flag in the command line.
For evaluation we also provide a script eval.sh
for you. You can run the following command to evaluate the model:
bash eval.sh --checkpoint_path pretrained_model/T10.pth
Here you only need to specify the checkpoint path of the model you want to evaluate.
For evaluation models for ARKitScenes dataset, also simply put --arkit
as a flag in the command line.
We provide you with the bold-styled models in the following table:
Method | 5% | 10% | 20% | 30% | 40% | 100% |
---|---|---|---|---|---|---|
PQ-Transformer | 22.43 | 29.26 | 39.60 | 46.02 | 48.08 | 56.64 |
Ours | 29.08 | 36.85 | 48.68 | 54.35 | 56.92 | 60.75 |
Method | Recall (%) | Precision (%) | F1-score (%) |
---|---|---|---|
PQ-Transformer | 6.72 | 25.81 | 10.66 |
Ours | 23.00 | 29.50 | 25.85 |
Note that in our paper, we report the median performance of each experiment setting over three runs. Here we provide you with checkpoints with the same random seed 0.
You can download these models at [ Google Drive | Tsinghua Cloud Storage ] and place them under pretrained_model
directory. If the directory does not exist, you can create one.
If you find this work useful for your research, please cite our paper:
@article{gao2023semi,
title={From Semi-supervised to Omni-supervised Room Layout Estimation Using Point Clouds},
author={Gao, Huan-ang and Tian, Beiwen and Li, Pengfei and Chen, Xiaoxue and Zhao, Hao and Zhou, Guyue and Chen, Yurong and Zha, Hongbin},
journal={arXiv preprint arXiv:2301.13865},
year={2023}
}
We build our codebase on PQ-Transformer, a 3D point cloud transformer for joint object detection and layout estimation. We also give credits to Mean Teacher and SESS.