This is the official implementation of MIFAE-Forensics for DeepFake detection.
- Visualization demo.
- Pre-training code.
- Fine-tuning code.
Two pretext tasks, i.e. facial region guided masking in the spatial domain and high-frequency components masking in the frequency domain.
Original image -> High-frequency components masking -> Network prediction -> Full reconstruction
- We first visualizae the MAE with facial region guiaded masking strategy in our paper.
Original image -> Facial region guided masking -> Network prediction -> Full reconstruction
- We also visualize the vanilla MAE reconstruction without facial region guided masking strategy as comparison.
Original image -> Random masking -> Network prediction -> Full reconstruction
To pre-train ViT-B/16 (recommended default) with multi-node distributed training, run the following on 8 nodes with 8 GPUs each:
python submitit_pretrain.py \
--job_dir ${JOB_DIR} \
--nodes 8 \
--use_volta32 \
--batch_size 64 \
--model mae_vit_base_patch16 \
--norm_pix_loss \
--mask_ratio 0.75 \
--mask_radius 16 \
--epochs 800 \
--warmup_epochs 40 \
--blr 1.5e-4 --weight_decay 0.05 \
--data_path ${IMAGENET_DIR}
You can choose different reconstruction strategies through:
- args.recon_real (reconstruction of real faces only),
- args.recon_dual (positive reconstruction on real faces and negative construction on fake faces)
- direct fine-tuning without reconstruction.
python partial_finetuning_with_reconstruction.py \
--finetune ""\
--decoder ""\
--recon_real
This repository is built on MAE.