stylistic-gesture

Official repository for the paper Stylistic Co-Speech Gesture Generation: Modeling Personality and Communicative Styles in Virtual Agents.

Preparing environment

Git clone this repo
Enter the repo and create docker image using

docker build -t stylistic-gesture .

Run container using

nvidia-docker run --rm -it -e NVIDIA_VISIBLE_DEVICES={GPU} --runtime=nvidia --userns=host --shm-size 64G -v {LOCAL_DIR}:{CONTAINER_DIR} -p {PORT} --name {CONTAINER_NAME} stylistic-gesture:latest /bin/bash

for example:

docker run --rm -it --gpus device=0 --userns=host --shm-size 64G -v C:\ProgramFiles\stylistic-gesture:/workspace/stylistic-gesture -p '8888:8888' --name stylistic-gesture-container stylistic-gesture:latest /bin/bash

Activate cuda environment:

source activate stylistic-env

Data pre-processing

Get the BRG-Unicamp dataset following the instructions from here and put it into ./dataset/
Download the WavLM Base + and put it into the folder /wavlm/
In the container with the active environment, enter the folder /workspace/stylistic-gesture, run

python -m data_loaders.gesture.scripts.ptbrgesture_prep

This will convert the bvh files to npy representations, downsample wav files to 16k and save them as npy arrays, and convert these arrays to wavlm representations. The VAD data must be processed separetely due to python libraries incompatibility.

(Optional) Process VAD data

BRG-Unicamp provides the speech activity information (from speechbrain's VAD) data, but if you wish to process them yourself you should redo the steps of "Preparing environment" as before, but for the speechbrain environment: Build the image using the Dockerfile inside speechbrain (docker build -t speechbrain .), run the container (docker run ... --name CONTAINER_NAME speechbrain:latest /bin/bash) and run:

python -m data_loaders.gesture.scripts.ptbrgesture_prep_vad

Train model

To train the model described in the paper use the following command inside the repo:

python -m train.train_mdm --save_dir save/my_model_run --dataset ptbr --step 10  --use_vad True --use_wavlm True --use_style_enc True

Gesture Generation

Generate motion using the trained model by running the following command. If you wish to generate gestures with the pretrained model of the Genea Challenge, use --model_path ./save/stylistic-gesture/model000600000.pt

python -m sample.ptbrgenerate --model_path ./save/my_model_run/model000XXXXXX.pt

Render

In our perceptual evaluation, we used the render procedure from the official GENEA Challenge 2023 visualizations. Instructions provided here

Cite

If you with to cite this repo or the paper

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data_loaders		data_loaders
dataset/BRG-Unicamp		dataset/BRG-Unicamp
diffusion		diffusion
eval		eval
evaluation_metric		evaluation_metric
model		model
sample		sample
save/stylistic-gesture		save/stylistic-gesture
speechbrain		speechbrain
train		train
utils		utils
wavlm		wavlm
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
environment.yml		environment.yml
speechbrain_vad_container.sh		speechbrain_vad_container.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stylistic-gesture

Preparing environment

Data pre-processing

Train model

Gesture Generation

Render

Cite

About

Releases 2

Packages

Contributors 2

Languages

AI-Unicamp/stylistic-gesture

Folders and files

Latest commit

History

Repository files navigation

stylistic-gesture

Preparing environment

Data pre-processing

Train model

Gesture Generation

Render

Cite

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages