Unofficial Implementation of StyleLipSync: Style-based Personalized Lip-sync Video Generation. Additional explanation.
The recent item is sorted upper side.
I implement same architecture of StyleLipSync model follows original paper
- Generator with SaMF
- Encoders(face, ref, aud)
I implement same architecture of StyleLipSync model follows original paper
I test the result using HDTF(Cross-ID) for 300 videos
I use pre-trained StyleGAN2 model of rosinality/stylegan2-pytorch which is 550k trained using FFHQ dataset.
I use same 3D face reconstruction model but implement different way.
My pipeline consists of the following steps: Estimate 3D parameters and vertices -> Extract Open/Close mesh by adjusting expression parameters -> Normalize to a neutral pose -> Remove points where (y >= 0) -> Revert to the original estimated pose and project to 2D.