[CVPR'24 Highlight] HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video
👉I plan to enter the job market in Summer/Fall 2025. If you have an openning, feel free to email!👈
[ Project Page ] [ Paper ] [ SupMat ] [ ArXiv ] [ Video ] [ HOLD Account ] [ ECCV'24 HOLD+ARCTIC Challenge ]
Authors: Zicong Fan, Maria Parelli, Maria Eleni Kadoglou, Muhammed Kocabas, Xu Chen, Michael J. Black, Otmar Hilliges
🚀 Register a HOLD account here for news such as code release, downloads, and future updates!
- 2024.07.04: Join our ECCV competition: Two hand + rigid object using HOLD on ARCTIC!
- 2024.07.04: HOLD beta is released!
- 2024.04.04: HOLD is awarded CVPR highlight!
- 2024.02.27: HOLD is accepted to CVPR'24! Working on code release!
This is a repository for HOLD, a method that jointly reconstructs hands and objects from monocular videos without assuming a pre-scanned object template.
HOLD can reconstruct 3D geometries of novel objects and hands:
- Template-free bimanual hand-object reconstruction
- Textureless object interaction with hands
- Multiple objects interaction with hands
- Instructions to download in-the-wild videos from HOLD as well as preprocessed data
- Scripts to preprocess and train on custom videos
- A volumetric rendering framework to reconstruct dynamic hand-object interaction
- A generalized codebase for single and two hand interaction with objects
- A viewer to interact with the prediction
- Code to evaluate and compare with HOLD in HO3D
- Tips on good reconstruction
- Clean the code further
- Support arctic for two-hand + rigid object setting
- Setup environment and downloads: see
docs/setup.md
- Training, evaluation, and visualization on preprocessed sequences: see
docs/usage.md
- Preprocess custom sequences: see
docs/custom.md
- Data documentation (checkpoints, dataset, log folder): see
docs/data_doc.md
- Instructions for using HOLD on ARCTIC: see
docs/arctic.md
Get a copy of the code:
git clone https://github.com/zc-alexfan/hold.git
cd hold; git submodule update --init --recursive
-
Setup environments
- Follow the instructions here:
docs/setup.md
. - You may skip external dependencies for now.
- Follow the instructions here:
-
Train on a preprocessed sequence
- Start with one of our preprocessed in-the-wild sequences, such as
hold_bottle1_itw
. - Familiarize yourself with the usage guidelines in
docs/usage.md
for this preprocessed sequence. - This will enable you to train, render HOLD, and experiment with our interactive viewer.
- At this stage, you can also explore the HOLD code in the
./code
directory.
- Start with one of our preprocessed in-the-wild sequences, such as
-
Set up external dependencies and process custom videos
- After understanding the initial tools, set up the "external dependencies" as outlined in
docs/setup.md
. - Preprocess the images from the
hold_bottle1_itw
sequence by following the instructions indocs/custom.md
. - Train on this sequence to learn how to build a custom dataset.
- You can capture your own custom video and reconstruct it in 3D at this point.
- Most preprocessing artifact files are documented in
docs/data_doc.md
, which you can use as a reference.
- After understanding the initial tools, set up the "external dependencies" as outlined in
-
Two-hand setting: Bimanual category-agnostic reconstruction
- At this point, you can preprocess and train on a custom single-hand sequence.
- Now you can take on the bimanual category-agnostic reconstruction challenge!
- Following the instruction in
docs/arctic.md
to reconstruct two-hand manipulation of ARCTIC sequences.
@inproceedings{fan2024hold,
title={{HOLD}: Category-agnostic 3d reconstruction of interacting hands and objects from video},
author={Fan, Zicong and Parelli, Maria and Kadoglou, Maria Eleni and Kocabas, Muhammed and Chen, Xu and Black, Michael J and Hilliges, Otmar},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={494--504},
year={2024}
}
✨CVPR 2023: ARCTIC is a dataset that includes accurate body/hand/object poses, multi-view RGB videos for articulated object manipulation. See our project page for details.
For technical questions, please create an issue. For other questions, please contact the first author.
The authors would like to thank: Benjamin Pellkofer for IT/web support; Chen Guo, Egor Zakharov, Yao Feng, Artur Grigorev for insightful discussion; Yufei Ye for DiffHOI code release.
Our code benefits a lot from Vid2Avatar, aitviewer, VolSDF, NeRF++ and SNARF. If you find our work useful, consider checking out their work.