This is the official PyTorch implementation code for MoGenTS. For technical details, please refer to:
MoGenTS: Motion Generation based on Spatial-Temporal Joint Modeling
Weihao Yuan, Weichao Shen, Yisheng HE, Xiaodong Gu, Zilong Dong, Liefeng Bo, Qixing Huang
NeurIPS 2024
[Project Page] |
[Paper]
If you find this code useful in your research, please cite:
@inproceedings{yuan2024mogents,
title={MoGenTS: Motion Generation based on Spatial-Temporal Joint Modeling},
author={Weihao Yuan and Weichao Shen and Yisheng HE and Yuan Dong and Xiaodong Gu and Zilong Dong and Liefeng Bo and Qixing Huang},
booktitle = {Neural Information Processing Systems (NeurIPS)},
pages={},
year={2024},
}
- Install Conda Environment
conda env create -f environment.yml
conda activate momask
pip install git+https://github.com/openai/CLIP.git
- Or Install with Pip Install
conda create -n mogents python=3.8
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
Download the models and place at ./logs/humanml3d/
Model | FID |
---|---|
HumanML3D | 0.028 |
KIT-ML | 0.135 |
- Follow previous method to prepare the evaluation models and gloves. Or directly download from here and place to
./checkpoints
-
HumanML3D - Follow the instruction in HumanML3D, then place the result dataset to
./dataset/HumanML3D
. -
KIT - Download from HumanML3D, then place the dataset in
./dataset/KIT-ML
python demo_mogen.py --gpu_id 0 --ext exp1 --text_prompt "A person is walking on a circle." --checkpoints_dir logs --dataset_name humanml3d --mtrans_name pretrain_mtrans --rtrans_name pretrain_rtrans
Some parameters explanation:
--repeat_times
: number of replications for generation, default1
.--motion_length
: specify the number of poses for generation.
Output explanation in ./outputs/exp1/
:
numpy files
: generated motions with shape of (nframe, 22, 3), under subfolder./joints
.video files
: stick figure animation in mp4 format, under subfolder./animation
.bvh files
: bvh files of the generated motion, under subfolder./animation
.
Then you can follow MoMask to retarget the generated motion to other 3D characters for visualization.
- Train the VQVAE
bash run_rvq.sh vq 0 humanml3d --batch_size 256 --num_quantizers 6 --max_epoch 50 --quantize_dropout_prob 0.2 --gamma 0.1 --code_dim2d 1024 --nb_code2d 256
- Train the Mask Transformer
bash run_mtrans.sh mtrans 4 humanml3d --vq_name vq --batch_size 384 --max_epoch 2000 --attnj --attnt
- Train the Residual Transformer
bash run_rtrans.sh rtrans 2 humanml3d --batch_size 64 --vq_name vq --cond_drop_prob 0.01 --share_weight --max_epoch 2000 --attnj --attnt
- Evaluate the VQVAE
python eval_vq.py --gpu_id 0 --name pretrain_vq --dataset_name humanml3d --ext eval --which_epoch net_best_fid.tar
- Evaluate the Mask Transformer
python eval_mask.py --dataset_name humanml3d --mtrans_name pretrain_mtrans --gpu_id 0 --cond_scale 4 --time_steps 10 --ext eval --which_epoch fid
- Evaluate Mask + Residual Transformer
Humanml3D:
python eval_res.py --gpu_id 0 --dataset_name humanml3d --mtrans_name pretrain_mtrans --rtrans_name pretrain_rtrans --cond_scale 4 --time_steps 10 --ext eval --which_ckpt net_best_fid.tar --which_epoch fid --traverse_res
KIT-ML:
python eval_res.py --gpu_id 0 --dataset_name kit --mtrans_name pretrain_mtrans_kit --rtrans_name pretrain_rtrans_kit --cond_scale 4 --time_steps 10 --ext eval --which_ckpt net_best_fid.tar --which_epoch fid --traverse_res
We sincerely thank the open-sourcing of these excellent works where our code is based on:
This code is distributed under an MIT LICENSE.
Note that our code depends on other libraries, including SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.