Ye Mao, Junpeng Jing, Krystian Mikolajczyk
[Paper
] [Project Website
]
[News] [23/06/2024] OpenDlign pre-trained models and datasets have been released. 🔥🔥🔥
[News] [25/04/2024] The OpenDlign paper is released on Arxiv. 🔥🔥🔥
Official implementation of OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images
Top: Comparison of OpenDlign with traditional open-world 3D learning models. Depth-based (a) and point-based (b) methods employ additional depth or point encoders for pre-training to align with CAD-rendered images. Conversely, OpenDlign (c) fine-tunes only the image encoder, aligning with vividly colored and textured depth-aligned images for enhanced 3D representation. Bottom: Visual comparison between multi-view CAD-rendered and corresponding depth-aligned images in OpenDlign.
Overview of OpenDlign. OpenDlign converts point clouds into multi-view depth maps using a contour-aware projection, which then helps generate depth-aligned RGB images with diverse textures, geometrically and semantically aligned with the maps. A transformer block, residually connected to the CLIP image encoder, is fine-tuned to align depth maps with depth-aligned images for robust 3D representation.
OpenDlign is a multimodal framework for learning open-world 3D representations. It leverages depth-aligned images generated from point cloud-projected depth maps. Unlike CAD-rendered images, our generated images provide rich, realistic color and texture diversity while preserving geometric and semantic consistency with the depth maps. Our experiments demonstrate OpenDlign's superior performance in zero-shot and few-shot classification, 3D object detection, and cross-modal retrieval, especially with real-scanned 3D objects.
We pre-train OpenDlign on 1 Nvidia A100 GPU, the code is tested with CUDA==11.3 and pytorch==1.11.0
conda create -n OpenDlign python=3.8
conda activate OpenDlign
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
The processed evaluation data (i.e., ModelNet40, ScanObjectNN, OmniObject3D) can be found here
The pre-trained OpenDlign models, which are integrated with various CLIP variants (e.g., ViT-H-14, ViT-L-14, ViT-B-16, ViT-B-32), are available here
Update the root path of your downloaded evaluation dataset before running the following command:
bash scripts/zero_shot.sh
Update the root path of your downloaded training dataset before running the following command:
bash scripts/model_training.sh
If you find our code is helpful, please cite our paper:
@article{mao2024opendlign,
title={OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images},
author={Mao, Ye and Jing, Junpeng and Mikolajczyk, Krystian},
journal={arXiv preprint arXiv:2404.16538},
year={2024}
}