This repository contains the code of "SC3K: Self-supervised and Coherent 3D Keypoints Estimation from Rotated, Noisy, and Decimated Point Cloud Data", which has been accepted in the ICCV23.
Authors: Mohammad Zohaib, Alessio Del Bue
Abstract:
This paper proposes a new method to infer keypoints from arbitrary object categories in practical scenarios where point cloud data (PCD) are noisy, down-sampled and arbitrarily rotated. Our proposed model adheres to the following principles: i) keypoints inference is fully unsupervised (no annotation given), ii) keypoints position error should be low and resilient to PCD perturbations (robustness), iii) keypoints should not change their indexes for the intra-class objects (semantic coherence), iv) keypoints should be close to or proximal to PCD surface (compactness). We achieve these desiderata by proposing a new self-supervised training strategy for keypoints estimation that does not assume any a priori knowledge of the object class, and a model architecture with coupled auxiliary losses that promotes the desired keypoints properties. We compare the keypoints estimated by the proposed approach with those of the state-of-the-art unsupervised approaches.
The experiments show that our approach outperforms by estimating keypoints with improved coverage (
We highly recommend running this project with a conda environment. The code is tested on Python 3.6.12
, Torch 1.10.1
, Torchvision 0.11.2
. We recommend using Python 3.6+
and installing all the dependencies as provided in "sc3k_environment.yml". Create a conda environment by following:
git clone https://github.com/IIT-PAVIS/SC3K.git
cd SC3K
conda env create -f sc3k_environment.yml
conda activate sc3k
Set the training parameters using the configuration file "config/config.yaml". For example:
split: train # train the network
task: generic # transform the PCDs to generic poses before training
batch_size: 26 # batch size
class_name: airplane # category to train
Run training as python train.py
. Follow the training progress using train/class_name/. The best weights will be saved in the same folder with the name Best_class_name_10kp.pth.
Set the parameters using the configuration file "config/config.yaml" as:
split: test # train the network
task: generic # **generic** to test on random pose, and **canonical** to test performance on canonical PCDs
batch_size: 1 # batch size
class_name: airplane # category to train
save_results: True # if you want to save qualitative results, otherwise keep it False
data:
best_model_path: '**please set the path of the best weights**'
Please set the "best_model_path" as train/class_name/Best_class_name_10kp.pth, and inference as python test.py
. Follow the progress using test/class_name/*
We use the KeypointNet dataset that can be downloaded from the link. We use the following structure:
dataset/
|_ annotation/airplane.json (all.json that is available in the KeypointNet dataset can also be used)
|_ pcds
|__ 02691156
|__ *.pcd (containing the PCDs in canonical pose)
...
|__ id of the second class
.
.
|__ id of the last class
|_ poses
|__ 02691156
|__ *.npz (randomly generated 24 poses to transform the input PCDs for training)
...
|__ id of the second class
.
.
|__ id of the last class
|_ splits
|__ train.txt
|__ val.txt
|__ test.txt
Please download the data and keep it in the above structure. Poses containing intrinsic and extrinsic matrixes of the cameras can be generated by following any of the methods presented in PointView-GCN or Occupency network. We provide a sample set in the folder "dataset" for an initial start.
To save the qualitative results (visualizations), you can run test.py
, keeping save_results: True
in the "config.yml" file. You will find the output files in the "test/class_id/*_visualizations". The results should look like:
If you use this project for your research, please cite as:
@inproceedings{zohaib2023sc3k,
title={SC3K: Self-supervised and Coherent 3D Keypoints Estimation from Rotated, Noisy, and Decimated Point Cloud Data},
author={Zohaib, Mohammad and Del Bue, Alessio},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={22509--22519},
year={2023}
}
- 3D Key-Points Estimation from Single-View RGB Images
- CDHN: Cross-Domain Hallucination Network for 3D Keypoints Estimation
We would like to acknowledge Milind Gajanan Padalkar, Matteo Taiana and Pietro Morerio for fruitful discussions, and Seyed Saber Mohammadi and Maryam Saleem for their support during the experimental phase. This work has been supported by the projects “RAISE-Robotics and AI for Socio-economic Empowerment” and “European Union-NextGenerationEU”.