Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
wuhaozhe committed Nov 2, 2021
1 parent 11e62c1 commit 2739b72
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 7 deletions.
29 changes: 22 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,28 @@
A repository for generating stylized talking 3D faces and 2D videos.
This is the repository for paper *Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis, MM 2021*

![framework](framework.jpg)


**we are still clearing up the code, a full version of code will be released soon**

------

### Quick start

#### Installation
- `Python 3.6`
```
conda create -n python36 python=3.6
conda activate python36
```
- Install necessary packages through `pip install -r requirements.txt`
- Download the deepspeech pretrained model from the [Link](https://github.com/mozilla/DeepSpeech/releases/download/v0.9.2/deepspeech-0.9.2-checkpoint.tar.gz), and then unzip the zipped file to `./deepspeech` folder.
- Download the pretrained deepspeech model from the [Link](https://github.com/mozilla/DeepSpeech/releases/download/v0.9.2/deepspeech-0.9.2-checkpoint.tar.gz), and then unzip the zipped file to `./deepspeech` folder.
- Same as the instructions of [Deep 3D Face Reconstruction](https://github.com/microsoft/Deep3DFaceReconstruction).
- Download the Basel Face Model. Due to the license agreement of Basel Face Model, you have to download the BFM09 model after submitting an application on its [home page](https://faces.dmi.unibas.ch/bfm/main.php?nav=1-2&id=downloads). After getting the access to BFM data, download "01_MorphableModel.mat" and put it into ./deep_3drecon/BFM subfolder.
- Download Download the Expression Basis provided by [Guo et al](https://github.com/Juyong/3DFace). You can find a link named "CoarseData" in the first row of Introduction part in their repository. Download and unzip the Coarse_Dataset.zip. Put "Exp_Pca.bin" into ./deep_3drecon/BFM subfolder. The expression basis are constructed using [Facewarehouse](http://kunzhou.net/zjugaps/facewarehouse/) data and transferred to BFM topology.
Download the pre-trained [reconstruction network](https://drive.google.com/file/d/176LCdUDxAj7T2awQ5knPMPawq5Q2RUWM/view), unzip it and put "FaceReconModel.pb" into ./deep_3drecon/network subfolder.
- Download the pretrained [audio2motion model](), put it into `./audio2motion/model`
- Download the pretrained [rendering model](), put it into `./render/model`
- Download the pretrained [audio2motion model](https://cloud.tsinghua.edu.cn/f/acb6d482a26e4eb8b116/?dl=1), put it into `./audio2motion/model`
- Download the pretrained [texture encoder](https://cloud.tsinghua.edu.cn/f/c60a3466016948c48951/?dl=1) and [render](https://cloud.tsinghua.edu.cn/f/106023055772444f8f15/?dl=1), put it into `./render/model`

#### Run
```
Expand Down Expand Up @@ -64,29 +70,38 @@ Our project organizes the files as follows:
```

#### Data process
The data process folder contains processing code of several datasets.

#### DeepSpeech

We leverage the [DeepSpeech](https://github.com/mozilla/DeepSpeech) project to extract audio related features. Please download the pretrained deepspeech model from the [Link](https://github.com/mozilla/DeepSpeech/releases/download/v0.9.2/deepspeech-0.9.2-checkpoint.tar.gz). In `deepspeech/evaluate.py`, we implement the funtion `get_prob` to get the latent deepspeech features with input audio path. The latent deepspeech features have 50 frames per second. We should align the deepspeech features to 25 fps videos in subsequent implementations.

#### Face Alignment

#### Deep 3D Reconstruction
We modify [Face Alignment](https://github.com/1adrianb/face-alignment) for data preprocess. Different from the original project, we enforce the face alignment to detect only the largest face in each frame for speed-up.

#### Deep 3D Face Reconstruction

We modify [Deep 3D Face Reconstruction](https://github.com/microsoft/Deep3DFaceReconstruction) for data preprocess. We add batch-api, uv-texture unwarpping api and uv coodinate image generation api in `deep_3drecon/utils.py`.

#### Render

We implement our texture encoder and rendering model in the render folder. We also implement some other renders like [neural voice puppertry](https://arxiv.org/abs/1912.05566).

#### Audio to Motion

We implement our stylized audio to facial motion model in audio2motion folder.

------
### Data

#### Ted-HD data
We leverage `lmdb` to store the fragmented data. The data can be downloaded from [link](). You can obtain the train/test video with the code bellow. We use the Ted-HD data to train the audio2motion model. We also provide the reconstructed 3D param and landmarks in the lmdb.
We leverage `lmdb` to store the fragmented data. The data can be downloaded from [link](https://cloud.tsinghua.edu.cn/d/b046cc3ee1774beb9511/), and then run `cat xa* > data.mdb`. You can obtain the train/test video with the code bellow. We use the Ted-HD data to train the audio2motion model. We also provide the reconstructed 3D param and landmarks in the lmdb.
```python
import lmdb

def test():
lmdb_path = "../data/ted_hd/lmdb"
lmdb_path = "./lmdb"
env = lmdb.open(lmdb_path, map_size=1099511627776, max_dbs = 64)

train_video = env.open_db("train_video".encode())
Expand Down
Binary file added framework.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.

0 comments on commit 2739b72

Please sign in to comment.