Assume GMPI_ROOT
represents the path to this repo:
cd /path/to/this/repo
export GMPI_ROOT=$PWD
We need MTCNN, Deep3DFaceRecon_pytorch, and DeepFace to complete the data processing and evaluation steps.
We provide the conda
environment yaml files for MTCNN and DeepFace:
- mtcnn_env.yaml for MTCNN;
- deepface_env.yaml for DeepFace.
conda env create -f mtcnn_env.yaml # mtcnn_env
conda env create -f deepface_env.yaml # deepface
Note: we made small modifications to the original repo. Please use our modified version. Please follow the official instruction to setup the virtual environments and to download the pretrained models. There are two major steps:
Assume the code repo locates at Deep3DFaceRecon_PATH
:
export Deep3DFaceRecon_PATH=/path/to/Deep3DFaceRecon_pytorch
Download StyleGAN2's pretrained checkpoints:
mkdir -p ${GMPI_ROOT}/ckpts/stylegan2_pretrained/transfer-learning-source-nets/
cd ${GMPI_ROOT}/ckpts/stylegan2_pretrained
wget https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/transfer-learning-source-nets/ffhq-res256-mirror-paper256-noaug.pkl ./transfer-learning-source-nets # FFHQ256
wget https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/transfer-learning-source-nets/ffhq-res512-mirror-stylegan2-noaug.pkl ./transfer-learning-source-nets # FFHQ512
wget https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/transfer-learning-source-nets/ffhq-res1024-mirror-stylegan2-noaug.pkl ./transfer-learning-source-nets # FFHQ1024
wget https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/afhqcat.pkl . # AFHQCat
wget https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl . # MetFaces
We assume all data is placed under ${GMPI_ROOT}/runtime_dataset
.
We provide scripts in data_preprocess for steps described below.
-
Please follow StyleGAN2's guidance to extract images from the raw TFRecords. After this step, you will obtain zip files with
ffhq256x256.zip
(~13G),ffhq512x512.zip
(~52G), andffhq1024x1024.zip
(~206G). Place them under${GMPI_ROOT}/runtime_dataset
. -
We utilize MTCNN to detect facial landmarks.
-
We use Deep3DFaceRecon to estimate poses for FFHQ.
export RES=256
# landmark detection with MTCNN
conda activate mtcnn_env
python ${GMPI_ROOT}/data_preprocess/prepare_landmarks_ffhq.py --input_zipf ${GMPI_ROOT}/runtime_dataset/ffhq${RES}x${RES}.zip --save_dir ${GMPI_ROOT}/runtime_dataset/mtcnn_ffhq_${RES}
# run pose detection with Deep3DFaceRecon
conda activate deep3d_pytorch
cd ${Deep3DFaceRecon_PATH}
python estimate_pose_ffhq.py --name=pretrained --epoch=20 --img_folder=${GMPI_ROOT}/runtime_dataset/dummy --gmpi_img_res ${RES} --gmpi_root ${GMPI_ROOT}
# move pose results to GMPI
mv ${Deep3DFaceRecon_PATH}/checkpoints/pretrained/results/ffhq${RES}x${RES}/epoch_20_000000 ${GMPI_ROOT}/runtime_dataset/ffhq${RES}_deep3dface_coeffs
mv ${GMPI_ROOT}/runtime_dataset/mtcnn_ffhq_${RES}/detections/fail_list.txt ${GMPI_ROOT}/runtime_dataset/ffhq256_deep3dface_coeffs/
We use the same processed AFHQCat dataset from EG3D. We thank Eric Ryan Chan for providing the processed data. Please follow the instructions to obtain the AFHQCat dataset and rename the resulting folder to afhq_v2_train_cat_512
.
-
Please download the aligned-and-cropped version from the official MetFaces website.
-
We utilize MTCNN to detect facial landmarks. Meanwhile, we augment the dataset by horizontally flipping the image.
-
We use Deep3DFaceRecon to estimate poses for MetFaces.
# we assume the raw images are stored in a folder with name metfaces1024x1024
mv ${GMPI_ROOT}/runtime_dataset/metfaces1024x1024 ${GMPI_ROOT}/runtime_dataset/metfaces1024x1024_xflip
conda activate mtcnn_env
# generate flipped dataset
python ${GMPI_ROOT}/data_preprocess/prepare_landmarks_metfaces.py --data_dir ${GMPI_ROOT}/runtime_dataset/metfaces1024x1024_xflip --save_dir ${GMPI_ROOT}/runtime_dataset/metfaces_detect --xflip 1
# detect landmarks wtih MTCNN
python ${GMPI_ROOT}/data_preprocess/prepare_landmarks_metfaces.py --data_dir ${GMPI_ROOT}/runtime_dataset/metfaces1024x1024_xflip --save_dir ${GMPI_ROOT}/runtime_dataset/metfaces_detect
# run pose detection with Deep3DFaceRecon
conda activate deep3d_pytorch_fork
cd ${Deep3DFaceRecon_PATH}
python estimate_pose_metfaces.py --name=pretrained --epoch=20 --img_folder=${GMPI_ROOT}/runtime_dataset/dummy --gmpi_root ${GMPI_ROOT}
# move data back to GMPI
mkdir -p ${GMPI_ROOT}/runtime_dataset/metfaces_xflip_deep3dface_coeffs
mv ${Deep3DFaceRecon_PATH}/checkpoints/pretrained/results/metfaces1024x1024_xflip/epoch_20_000000 ${GMPI_ROOT}/runtime_dataset/metfaces_xflip_deep3dface_coeffs/coeffs
mv ${GMPI_ROOT}/runtime_dataset/metfaces_detect/detections/fail_list.txt ${GMPI_ROOT}/runtime_dataset/metfaces_xflip_deep3dface_coeffs
We provide processed poses for FFHQ and Metfaces in the release page and this link.
If everyting goes well, you should observe the following folder structure:
.
+-- ckpts
| +-- stylegan2_pretrained # folder
| | +-- afhqcat.pkl # file
| | +-- metfaces.pkl # file
| | +-- transfer-learning-source-nets # folder
+-- runtime_dataset
| +-- ffhq256x256.zip # file
| +-- ffhq256_deep3dface_coeffs # folder
| +-- ffhq512x512.zip # file
| +-- ffhq512_deep3dface_coeffs # folder
| +-- ffhq1024x1024.zip # file
| +-- ffhq1024_deep3dface_coeffs # folder
|
| +-- afhq_v2_train_cat_512 # folder
|
| +-- metfaces1024x1024_xflip # folder
| +-- metfaces_xflip_deep3dface_coeffs # folder
Run the following command to start training GMPI. Results will be saved in ${GMPI_ROOT}/experiments
. We use 8 Tesla V100 GPUs in our experiments. We recommend 32GB GPU memory if you want to train at a resolution of 1024x1024.
python launch.py \
--run_dataset FFHQ1024 \
--nproc_per_node 1 \
--task-type gmpi \
--run-type train \
--master_port 8370
run_dataset
can be in["FFHQ256", "FFHQ512", "FFHQ1024", "AFHQCat", "MetFaces"]
.- Set
nproc_per_node
to be the number of GPUs you want to use.
This repo supports the following variants of the generator:
- Vanilla version without alpha maps condition on depth or learnable token: set
torgba_cond_on_pos_enc: "none"
andtorgba_cond_on_pos_enc_embed_func: "none"
inconfigs/gmpi.yaml
; - Alpha maps conditions on normalized_depth: set
torgba_cond_on_pos_enc: "normalize_add_z"
andtorgba_cond_on_pos_enc_embed_func: "modulated_lrelu"
inconfigs/gmpi.yaml
; - Alpha maps comes from learnable tokens: set
torgba_cond_on_pos_enc: "normalize_add_z"
andtorgba_cond_on_pos_enc_embed_func: "learnable_param"
inconfigs/gmpi.yaml
; - Alpha maps come from predicted depth map: set
torgba_cond_on_pos_enc: "depth2alpha"
andtorgba_cond_on_pos_enc_embed_func: "modulated_lrelu"
inconfigs/gmpi.yaml
.
In the paper, we use the second variant: alpha maps conditions on normalized_depth.
The command to evaluate the trained model is in eval.sh. We provide scripts to compute the following:
- FID/KID,
- Identity metrics,
- Depth metrics,
- Pose accuracy metrics.
In the paper, all results come from checkpoints at 5000 iterations. Run the following command to evalute the model:
bash ${GMPI_ROOT}/gmpi/eval/eval.sh \
${GMPI_ROOT} \
FFHQ512 \ # this can be FFHQ256, FFHQ512, FFHQ1024, AFHQCat, or MetFaces
exp_id \ # this is your experiment ID
${Deep3DFaceRecon_PATH} \
nodebug # set this to "debug" to test your path for computing FID/KID is correct