Foivos Paraperas Papantoniou1 Alexandros Lattas1 Stylianos Moschoglou1
Jiankang Deng1 Bernhard Kainz1,2 Stefanos Zafeiriou1
1Imperial College London, UK
2FAU Erlangen-Nürnberg, Germany
This is the official implementation of Arc2Face, an ID-conditioned face model:
✅ that generates high-quality images of any subject given only its ArcFace embedding, within a few seconds
✅ trained on the large-scale WebFace42M dataset offers superior ID similarity compared to existing models
✅ built on top of Stable Diffusion, can be extended to different input modalities, e.g. with ControlNet
- [2024/08/16] 🔥 Accepted to ECCV24 as an oral!
- [2024/08/06] 🔥 ComfyUI support available at caleboleary/ComfyUI-Arc2Face!
- [2024/04/12] 🔥 We add LCM-LoRA support for even faster inference (check the details below).
- [2024/04/11] 🔥 We release the training dataset on HuggingFace Datasets.
- [2024/03/31] 🔥 We release our demo for pose control using Arc2Face + ControlNet (see instructions below).
- [2024/03/28] 🔥 We release our Gradio demo on HuggingFace Spaces (thanks to the HF team for their free GPU support)!
- [2024/03/14] 🔥 We release Arc2Face.
conda create -n arc2face python=3.10
conda activate arc2face
# Install requirements
pip install -r requirements.txt
- The models can be downloaded manually from HuggingFace or using python:
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="arc2face/config.json", local_dir="./models")
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="arc2face/diffusion_pytorch_model.safetensors", local_dir="./models")
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="encoder/config.json", local_dir="./models")
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="encoder/pytorch_model.bin", local_dir="./models")
-
For face detection and ID-embedding extraction, manually download the antelopev2 package (direct link) and place the checkpoints under
models/antelopev2
. -
We use an ArcFace recognition model trained on WebFace42M. Download
arcface.onnx
from HuggingFace and put it inmodels/antelopev2
or using python:
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="arcface.onnx", local_dir="./models/antelopev2")
- Then delete
glintr100.onnx
(the default backbone from insightface).
The models
folder structure should finally be:
. ── models ──┌── antelopev2
├── arc2face
└── encoder
Load pipeline using diffusers:
from diffusers import (
StableDiffusionPipeline,
UNet2DConditionModel,
DPMSolverMultistepScheduler,
)
from arc2face import CLIPTextModelWrapper, project_face_embs
import torch
from insightface.app import FaceAnalysis
from PIL import Image
import numpy as np
# Arc2Face is built upon SD1.5
# The repo below can be used instead of the now deprecated 'runwayml/stable-diffusion-v1-5'
base_model = 'stable-diffusion-v1-5/stable-diffusion-v1-5'
encoder = CLIPTextModelWrapper.from_pretrained(
'models', subfolder="encoder", torch_dtype=torch.float16
)
unet = UNet2DConditionModel.from_pretrained(
'models', subfolder="arc2face", torch_dtype=torch.float16
)
pipeline = StableDiffusionPipeline.from_pretrained(
base_model,
text_encoder=encoder,
unet=unet,
torch_dtype=torch.float16,
safety_checker=None
)
You can use any SD-compatible schedulers and steps, just like with Stable Diffusion. By default, we use DPMSolverMultistepScheduler
with 25 steps, which produces very good results in just a few seconds.
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
pipeline = pipeline.to('cuda')
Pick an image and extract the ID-embedding:
app = FaceAnalysis(name='antelopev2', root='./', providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))
img = np.array(Image.open('assets/examples/joacquin.png'))[:,:,::-1]
faces = app.get(img)
faces = sorted(faces, key=lambda x:(x['bbox'][2]-x['bbox'][0])*(x['bbox'][3]-x['bbox'][1]))[-1] # select largest face (if more than one detected)
id_emb = torch.tensor(faces['embedding'], dtype=torch.float16)[None].cuda()
id_emb = id_emb/torch.norm(id_emb, dim=1, keepdim=True) # normalize embedding
id_emb = project_face_embs(pipeline, id_emb) # pass through the encoder
Generate images:
num_images = 4
images = pipeline(prompt_embeds=id_emb, num_inference_steps=25, guidance_scale=3.0, num_images_per_prompt=num_images).images
LCM-LoRA allows you to reduce the sampling steps to as few as 2-4 for super-fast inference. Just plug in the pre-trained distillation adapter for SD v1.5 and switch to LCMScheduler
:
from diffusers import LCMScheduler
pipeline.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
Then, you can sample with as few as 2 steps (and disable guidance_scale
by using a value of 1.0, as LCM is very sensitive to it and even small values lead to oversaturation):
images = pipeline(prompt_embeds=id_emb, num_inference_steps=2, guidance_scale=1.0, num_images_per_prompt=num_images).images
Note that this technique accelerates sampling in exchange for a slight drop in quality.
You can start a local demo for inference by running:
python gradio_demo/app.py
We provide a ControlNet model trained on top of Arc2Face for pose control. We use EMOCA for 3D pose extraction. To run our demo, follow the steps below:
Download the ControlNet checkpoint manually from HuggingFace or using python:
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="controlnet/config.json", local_dir="./models")
hf_hub_download(repo_id="FoivosPar/Arc2Face", filename="controlnet/diffusion_pytorch_model.safetensors", local_dir="./models")
git submodule update --init external/emoca
This is the most tricky part. You will need PyTorch3D to run EMOCA. As its installation may cause conflicts, we suggest to follow the process below:
- Create a new environment and start by installing PyTorch3D with GPU support first (follow the official instructions).
- Add Arc2Face + EMOCA requirements with:
pip install -r requirements_controlnet.txt
- Install EMOCA code:
pip install -e external/emoca
- Finally, you need to download the EMOCA/FLAME assets. Run the following and follow the instructions in the terminal:
cd external/emoca/gdl_apps/EMOCA/demos
bash download_assets.sh
cd ../../../../..
You can start a local ControlNet demo by running:
python gradio_demo/app_controlnet.py
The test images used for comparisons in the paper (Synth-500, AgeDB) are available here. Please use them only for evaluation purposes and make sure to cite the corresponding sources when using them.
- Demo link by @camenduru.
- Pinokio implementation by @cocktailpeanut (runs locally on all OS - Windows, Mac, Linux).
- Thanks to the creators of Stable Diffusion and the HuggingFace diffusers team for the awesome work ❤️.
- Thanks to the WebFace42M creators for providing such a million-scale facial dataset ❤️.
- Thanks to the HuggingFace team for their generous support through the community GPU grant for our demo ❤️.
- We also acknowledge the invaluable support of the HPC resources provided by the Erlangen National High Performance Computing Center (NHR@FAU) of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), which made the training of Arc2Face possible.
If you find Arc2Face useful for your research, please consider citing us:
@inproceedings{paraperas2024arc2face,
title={Arc2Face: A Foundation Model for ID-Consistent Human Faces},
author={Paraperas Papantoniou, Foivos and Lattas, Alexandros and Moschoglou, Stylianos and Deng, Jiankang and Kainz, Bernhard and Zafeiriou, Stefanos},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2024}
}