[NeurIPS 2024] OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images.

Ye Mao, Junpeng Jing, Krystian Mikolajczyk

[Paper] [Project Website]

[News] [23/06/2024] OpenDlign pre-trained models and datasets have been released. 🔥🔥🔥

[News] [25/04/2024] The OpenDlign paper is released on Arxiv. 🔥🔥🔥

Official implementation of OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images

Top: Comparison of OpenDlign with traditional open-world 3D learning models. Depth-based (a) and point-based (b) methods employ additional depth or point encoders for pre-training to align with CAD-rendered images. Conversely, OpenDlign (c) fine-tunes only the image encoder, aligning with vividly colored and textured depth-aligned images for enhanced 3D representation. Bottom: Visual comparison between multi-view CAD-rendered and corresponding depth-aligned images in OpenDlign.

Overview of OpenDlign. OpenDlign converts point clouds into multi-view depth maps using a contour-aware projection, which then helps generate depth-aligned RGB images with diverse textures, geometrically and semantically aligned with the maps. A transformer block, residually connected to the CLIP image encoder, is fine-tuned to align depth maps with depth-aligned images for robust 3D representation.

Project Summary

OpenDlign is a multimodal framework for learning open-world 3D representations. It leverages depth-aligned images generated from point cloud-projected depth maps. Unlike CAD-rendered images, our generated images provide rich, realistic color and texture diversity while preserving geometric and semantic consistency with the depth maps. Our experiments demonstrate OpenDlign's superior performance in zero-shot and few-shot classification, 3D object detection, and cross-modal retrieval, especially with real-scanned 3D objects.

Install environments

We pre-train OpenDlign on 1 Nvidia A100 GPU, the code is tested with CUDA==11.3 and pytorch==1.11.0

conda create -n OpenDlign python=3.8
conda activate OpenDlign
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt

Datasets

The processed evaluation data (i.e., ModelNet40, ScanObjectNN, OmniObject3D) can be found here

Pretrained Models

The pre-trained OpenDlign models, which are integrated with various CLIP variants (e.g., ViT-H-14, ViT-L-14, ViT-B-16, ViT-B-32), are available here

Inference

Update the root path of your downloaded evaluation dataset before running the following command:

bash scripts/zero_shot.sh

Training

Update the root path of your downloaded training dataset before running the following command:

bash scripts/model_training.sh

Citation

If you find our code is helpful, please cite our paper:

@article{mao2024opendlign,
  title={OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images},
  author={Mao, Ye and Jing, Junpeng and Mikolajczyk, Krystian},
  journal={arXiv preprint arXiv:2404.16538},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
configs		configs
docs		docs
img		img
model_configs		model_configs
open_clip		open_clip
scripts		scripts
utils		utils
.DS_Store		.DS_Store
README.md		README.md
dataset.py		dataset.py
labels.json		labels.json
losses.py		losses.py
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
templates.json		templates.json
train.py		train.py
zero_shot.py		zero_shot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[NeurIPS 2024] OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images.

Project Summary

Install environments

Datasets

Pretrained Models

Inference

Training

Citation

About

Releases

Packages

Contributors 2

Languages

Yebulabula/OpenDlign

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS 2024] OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images.

Project Summary

Install environments

Datasets

Pretrained Models

Inference

Training

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages