[ArXiv] [OpenReview] [Poster] [Slides] [Code]
This is the official implementation of our work presented at CoRL 2024, titled "Toward General Object-level Mapping from Sparse Views with 3D Diffusion Priors."
We propose an object-level mapping framework that fuses both multi-view observations and a pre-trained diffusion shape prior model. It generalizes to multicategories objects, and multiple multi-modalities observations without the need of fine-tuning.
The gradient fields from two sources, a diffusion prior originally for Generation, and multi-view non-linear observation constraints, are effectively fused into a shape and pose optimization formulation for Mapping.
Our methods do not require training or fine-tuning any neural networks, thus demanding relatively small GPU memory.
Our code runs on an Ubuntu system with an Nvidia GPU and CUDA support. A single GPU with at least 10GB of memory is enough. We have tested the implementation on the following GPUs: 1080 Ti, V100, and A100.
Create a conda environment:
conda create -n gom python==3.10
conda activate gom
Install Torch. Below is an example for Torch 1.12
and CUDA 11.3
. Please refer to this link for instructions based on your hardware.
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
Clone the repository:
git clone [email protected]:TRAILab/GeneralObjectMapping.git
Install the remaining Python packages. (Note: Install pytorch3d
only for CO3D.)
pip install -r requirements.txt
We use Shap-E as the pre-trained multi-category shape prior model. For convenience, we have included a Shap-E repository in ./src/shap-e
, which contains utility code to assist with our project. Please ensure that you use it in accordance with Shap-E's copyrights.
Install shap_e
in your conda environment:
pip install src/shap-e
Shap-E will automatically download the pre-trained weights during its initial run.
Generate prior meshes for the categories we will use. These meshes will be utilized to initialize pose estimation with ICP to align with depth data. The categories are defined in configs/categories.txt
and will be saved to ./shap_e_prior_meshes
.
python generate_prior_meshes.py
We provide examples using the CO3D
and ScanNet
datasets.
The CO3D dataset includes sequences from 10 categories: Toy Truck, Bench, Donut, Broccoli, Toy Train, Apple, Teddy Bear, Hydrant, Book, and Toaster.
Download the Single-sequence dataset subset
of the CO3D Datasets for demos by following the instructions:
git clone [email protected]:facebookresearch/co3d.git
cd co3d
python ./co3d/download_dataset.py --download_folder DOWNLOAD_FOLDER --single_sequence_subset
Please also install pytorch3d
for the CO3D data loader.
pip install git+https://github.com/facebookresearch/[email protected]
To run demos with a specific category, please specify --dataset_category
as teddybear, donut, hydrant, broccoli, toaster, toytruck, bench, toytrain, apple, or book:
python test_co3d.py --config configs/GOM_CO3D.json \
--sequence_dir data/co3d \
--save_root ./output/co3d/ \
--dataset_category teddybear
This script will estimate an initial pose using Iterative Closest Point (ICP) matching against an average shape generated from Shap-E. It will then optimize both shapes and poses using diffusion priors and differentiable rendered images from 10 views.
We provide more detailed guidance on the outputs and hyperparameters of the experiments in a document.
The ScanNet dataset contains indoor environments featuring large furniture items.
We provide a mini ScanNet set for a quick demo. Download the ScanNet mini dataset following here and unzip it to data/scannet_mini
.
The script will run the pose and shape estimation algorithm on a chair
instance from 10 views.
python test_scannet.py --config configs/GOM_ScanNet.json \
--sequence_dir data/scannet_mini/ \
--save_root ./output/scannet/
Please download the full ScanNet dataset, along with the Scan2CAD object annotations and ShapeNet models, from here.
Download the association file here, and put it under data/scannet/data/Scan2CAD/indices_to_scannet
. It is used to extract the GT Masks provided by the ScanNet using instance ID from Scan2CAD.
The final data structure is:
- data/
- scannet/
- data/
- posed_images/
- Scan2CAD/
- indices_to_scannet/
- scans/
- ShapeNetCore.v2/
- scannetv2_val.txt
- src/
The ScanNet dataset includes thousands of scenes and objects. We create splits for frames, instances, and scenes based on object visibility, ensuring that each split includes at least 10 frames and an observation ratio of 0.2. Run the provided script to generate the split file:
python src/tools/dataset_process/generate_scannet_splits.py \
--category chair \
--dataset_root ./data/scannet/ \
--save_dir ./output/splits/
We offer a merge tool to combine all categories into a single file. Once that's done, you can run experiments on all datasets using:
python test_scannet.py --config configs/GOM_ScanNet.json \
--dataset_subset_package PATH/TO/SPLITS \
--sequence_dir data/scannet/ \
--save_root ./output/scannet/
For more detailed information on the experiment outputs and hyperparameters, please refer to the document.
Feel free to contact the authors or raise any issues if you have further questions.
Please cite our paper if you use this codebase:
@inproceedings{liaotoward,
title={Toward General Object-level Mapping from Sparse Views with 3D Diffusion Priors},
author={Liao, Ziwei and Xu, Binbin and Waslander, Steven L},
booktitle={8th Annual Conference on Robot Learning}
}
If you're interested in learning more, feel free to explore our previous related work on object-level mapping:
- Uncertainty-aware 3D Object-Level Mapping with Deep Shape Priors, ICRA'24
- Multi-View 3D Object Reconstruction and Uncertainty Modelling With Neural Shape Prior, WACV'24
- SO-SLAM: Semantic Object SLAM with Scale Proportional and Symmetrical Texture Constraints, RA-L and ICRA'22.
Our repository is released under the MIT License. Shap-E is also released under the MIT License. The ICP initialization code is referenced from our previous codebase, UncertainShapePose.