Skip to content

Grayscale SAEHD model and mode for training deepfakes. Notes, tests, experience, tools, study and explanations of the source code.

Notifications You must be signed in to change notification settings

Twenkid/DeepFaceLab-SAEHDBW

Repository files navigation

DeepFaceLab-SAEHDBW or Arnoldifier, ArnoldDFR

New Grayscale SAEHDBW Model for higher performance, then Colorization of the result; Integration with Wav2Lip etc. and Code Review / Documentation of the source files (future work)

image

Notes, experience, tools, deepfakes

Some of the points are goals, TBD.

News

Ambulgul II

1.6.2024: Премиера

image

https://youtu.be/KXC68MbczMg

Реалистично фентъзи, вдъхновено от Властелинът на пръстените - по пътя към Хилядолетния рай на светлото бъдеще. Сатира, комедия, фентъзи, анимация. Произведен с "Arnoldifier", системата за дийпфейк кино, Twenkid FX Studio (кадрите от "Звездна симфония в Чепеларе") - създадени в Пловдив от Тош/ "Свещеният сметач". Използва кадри от "Властелинът на пръстените III", реж. Питър Джаксън. OpenDalle11 за заглавната картина. DaVinci Resolve (краен монтаж). С участието на модели на Мария Габриел и Григор Сарийски https://github.com/Twenkid/DeepFaceLab-SAEHDBW http://eim.twenkid.com http://artificial-mind.blogspot.com #българия #еврозона #политика Сценарий, звук, монтаж, актьор в ролите, програмис, дийпфейк система: Тош. ... 19.5.2024: Трейлър на сатиричния дийпфейк филм "България влиза в Еврозоната". С двама дийпфейк озвучени герои. Пълният филм има действие 4:хх мин (кадри от Властелинът..). https://youtu.be/xmsVBVaQPX0?si=zWlmMFwCWVzakfJG

image image image

image

The new film is mostly produced, finalizing/deciding for some parts and the voice acting is to be recorded. The movie is another political satire, a follow-up, now with complete voice acting/dubbing for the entire action (no lip syncing) and way more dramatic. For now there are two 3 characters, two of them with changed faces.

30.1.2024: New film produced with Arnoldifier: featuring 8 face models

image

image

https://youtu.be/VPj9L61R_Ak

It is 3-minute long and has an intro and outro part which features sequences from my animated fantasy experiment "Star Symphony in Chepelare", 2018 which was created with my own video editor Twenkid FX with specially developed computer vision and image processing extension functions to generate the starfields and the meteors over a live action video and to convert it to look like drawn and painted. Note the little blobs which float in the reflections in the "Lord of the Rings" sequence, they blended with the rotating starfield.

The movie applies 8 models at 192x192, LIAE-UD-192-96-32-32-16-PRE-20-9-2022_SAEHDBW (the pretrained general face model was from 9-2022). Minor updates in code, paramters for default blur of the face mask etc., I thought of a few more major functions:

  1. Selective sharpening and noise generation during the merging itself for a more natural blending;
  2. Temporal smoothing of the masks for healing some of the shaky or wrong landmarks detections (sometimes manual extractions reduce it, but not enough);
  3. recognition and correction of the masks that go out of the borders of the face of the replaced character;
  4. selective painting, blending to the target using actual data from the dataset, not just the generated model (an old idea); ... (some of the image enhancements may be possible with the built-in enhancer function/super resolution, but I don't use it in Arnoldifier, I think it had to be converted to work with grayscale or to add a conversion of the grayscale image to color before enhancement and then feedback converted back to grayscale - but I need to delve into that part of the code and see how it works);
  5. Utils for working with multiple facial models, batch training with less human-intervention in the middle and editing of the final movie, which faces to draw with which masks when many in one frame etc.
  6. Simple utils for merging with start/end frames (not all in the folder - this is the easiest from these, one simple solution: start/end frame from a list to trigger/stop, a lit of skipped frames).
  7. Automatic or semi-automatic/assisted download and creation of datasets, search of relevant video sources, download and extraction etc. 9) Continuation of the colorization mode and doing without pix2pix model, by smart painting 10) Integration of a Wav2Lip workflow ...
  8. Etc.

More info on the workflow and the training process- maybe later, stay tuned...

11.1.2024: Playing with DFL again

  1. Pseudo hanging
  2. "Emergent behaviior"

1. Pseudo hanging (delay) of nn.tf_sess_config

I recalled one TF bug and its solution when running with CUDA on my still rolling 750 Ti.

       if nn.tf_sess is None:
            print("if nn.tf_sess is None:") #ERROR HERE? HANGS? #13-2-2023, WHY?
            nn.tf_sess = tf.Session(config=nn.tf_sess_config)

It doesn't happen with the DX12 version, and it actually didn't really hang, but it needs a minute and something before starting (it needed so) for the first run after something, then it starts immediately. (Some caches or so). So just wait a bit. Another bug though is that sometimes the CUDA version runs slower than it's supposed to run, now it started with ~560-570-530 ms per iteration, but then slowed down below the DX12 version (>800-1000ms) or starts like that. After several restarts (at the end I "Entered" the wait period to change the parameters, but didn't change anything, just went through) and it got fast again.

A few hours later: I had a theory - possibly part of the model is moved to RAM and some kind of swapping begins, as it's barely fitting in GPU RAM and once I dare to open more stuff in a browser etc. the iterations slow down. Unless it's something about specific optimization hurdles.

Starting. Press "Enter" to stop training and save model.
[00:09:45][#007322][0531ms][0.8211][0.7711]

Re the tf.Session bug:

https://www.google.com/search?client=opera&q=tf.Session+hanging&sourceid=opera&ie=UTF-8&oe=UTF-8

https://stackoverflow.com/questions/52680435/tensorflow-1-11-hangs-on-sess-tf-session

tensorflow/tensorflow#18652

2. "Emergent behavior"

I haven't trained for a long time and when I saw the black screens in the beginning I thought that there was some bug that has slipped in the display part of training process, but I let it run because the loss graph seemed to slightly go down. Then the faces appeared and the loss function jumped to a reasonable value. It was training of specific faces from a pretrained model on many faces - then I recalled that it's probably normal, there are initial "waves" of reshaping, it just was slower than I expected. I may check it out with other faces, the sharpenss of the decrease of the loss function is strange, like the "emergent phenomena" in the LLMs.

image

27.12.2023: New episode with Wav2LipHD:

New Year Address of the Prime Minister of the Change Kiril Petkov (Political Parody, Satire; Wav2LipHD Super, DALL-E 3, reedit of fragments of the previous episodes etc.... )

Untitled2 mp4_snapshot_00 05 001

https://www.youtube.com/watch?v=AojlaVnjOJY

3.4.2023: A script for creating the dataset needed for training a colorizing model for the grayscale faces. https://github.com/Twenkid/DeepFaceLab-SAEHDBW/blob/main/DeepFaceLab_DirectX12/_internal/DeepFaceLab/colorize/pix2pix_dataset.py A few other modified files have to be added for the two-pass merging process (first - grayscale faces generation and storing them to disk, second: a new pass without face generation, reading the stored faces, colorizing them with the other pix2pix model and merging with the color frames).

2.4.2023: Episode 0: The Pilot experimental version of Episode 1 with an earlier version of the model and the workflow, completed in May 2022, but not released up to now: Please scroll below the episodes published earlier:

23.3.2023: One of the scripts for colorization is added: the one that trains a pix2pix model: grayscale to color faces. I didn't upload it when I made it, because the workflow was a bit laborious, but I didn't have energy to make it easier for users back then.

https://github.com/Twenkid/DeepFaceLab-SAEHDBW/tree/main/DeepFaceLab_DirectX12/_internal/DeepFaceLab/colorize

The other pieces and an example to be added later. See the colorized example below.

21.2.2023: Recently I trained a new model to Arnold (retrained the Kiril model), an old idea to "puppet" the face myself, although the source wasn't specially "Arnold-related". It achieved very low error. There was an unknown hanging of the CUDA build during the initialization phase (I have fixed one bug of that kind in May last year), which I skipped fixing and used the DirectX version instead, up to today - after a bit of LRD training for two hours or so and managed to fix it - updating a few files. (I don't know whether that error is present in any other setups though, I haven't tested ArnoldDFR on other machines and nobody has reported, although there are 10 stars and 3 forks so far). DeepFaceLab_SAEHDBW_22_5_2022\core\leras\nn.py DeepFaceLab_SAEHDBW_22_5_2022\core\leras\device.py Sample training file:

An issue in DeepfaceLab repo about this project (2 July 2022): iperov/DeepFaceLab#5535

Manual

https://github.com/Twenkid/DeepFaceLab-SAEHDBW/blob/main/Manual.md

Watch the series: "Arnold Scwarzenegger: The Governor of Bulgaria"

Premiere! Part I

image

Български: Арнолд Шварценегер: Губернаторът на България

https://artificial-mind.blogspot.com/2022/06/arnold-schwarzenegger-governor-of.html

https://www.youtube.com/watch?v=n2JMGqxOYfA&feature=youtu.be

image

  • Part III: Combined with Wav2Lip - the mouth is synchronized; no finetuning; another pass of DFL to repair the artifacts (however there is some loss of contrast and sharpness) https://youtu.be/4F7PB7wBEXk

image

  • Part IV: Lip-syncing with finetuning repair and synthesized cloned voices (RealTimeVoiceCloning) over original script ("full directing") with Arnold, Lena Schwarzenegger and Stoltenberg https://youtu.be/X56QkNzkkVM

image

image

Суперхраната на Арнолд Шварценегер: Губернаторът на България - Част 5

  • Episode 0: The Pilot experimental version of Episode 1 with an earlier version of the model and the workflow, completed in May 2022, but not released up to now: Please scroll below the episodes published earlier: https://youtu.be/2CMmd494Dqw image

"Produced with Arnoldifier, a modified Deepfacelab which I made to train grayscale models with 3x higher performance/quality. This is trained on a GF 750 Ti 2 GB." #deepfake #arnold #twenkid #fitness #фитнес #културизъм #bodybuilding #Arnoldifier #deepfacelab

Latest news:

  • 30.9.2022 - Trained LIAE-UD 192-96-32-32-16, 246 MB

I wanted to see whether 96 dimensions of the autoencoder would be enough and it happened that they actually were. I pretrained on a modified DFL faceset to 500-some K it., then trained up to 256K Kiril-to-Arnold, uniform YAW - so far I didn't use it and that was a mistake. Now the profiles develop fine since the early stages. Due to another silly crash and lack of backup though (LOL) now I'm training again from a backup of the pretrained model from about 300K. Let's see whether the 200K-2xxK additional iterations of the pretraining contribute, that could be saved in the future.

  • Plans for improvements: "Profile-fixer" stage for bad borders of semi-profile and profile poses. It seems that the masks have to be convex and for profiles they are always bad and cut a big chunk in front of the forehead and the nose which results in contours attached to the face. Sometimes the recognized face/mask is smaller than the target and a "ghost" of the original nose etc. appear.

  • 4.10.2022 - Completed Kiril-to-Arnold LIAE-UD 192-96-32-32-16

One new "discovery": the model was trained with "Uniform Yaw" turned on, which lead to more balance and faster convergence of the profile faces. The pretraining for ~300K was enough, then about 300-347K (with final tens K it. with LRD). I didn't trained CT (usually I did sot-m for some 10Ks it. with LRD). It is slow, on the CPU. I just merged with sot-m and it seems OK (not perfect, but that rather needs additional layer of processing which is future work). So the slow color transfer mode seems to be avoidable with acceptable quality, and note that the test dataset is diverse, it is not a single clip from an interview etc., especially Arnold's set.

Interestingly, the smallest model so far, 246 MB displays higher quality and more realistic output than some 345 MB models (df-ud-t 192-128-48-32-16, maybe due to the asymmetrical encoder-decoder). I noticed a little "lighting up" of the nose etc. in one of the sequences (Parliament background), possibly the previous model (df-ud 256-128-32-32) was more stable, but I need to check it out and make a systematic comparison.

Note that the default dimensions for color DF and resolution of just 128x128 in DFL are 256-64-64-22.

Below: no additional sharpening applied

image

image

image

image

  • Sharpening, whole image (23, IrfanView)

image image image

Good profiles, except a glitch in the masks which sometimes leaves a trailing contour, the new face doesn't cover the original face etc., the mask is convex and doesn't follow the shape of the face. I have ideas for correction: future work.

image

image

image

...

image image image

image

  • Sharpening: 32, IrfanView - FullHD image. Note that the Video source has wrong focus: the background instead of the character.

image image

Lena: the model is not trained or finetuned on that face:

image image

  • Training, no sharpening

https://github.com/Twenkid/DeepFaceLab-SAEHDBW/blob/main/LIAE-UD-192-96-32-32-16.md

  • x.9.2022 - Training DF-UD 256x256 128-32-32-16, 258 MB after 170K it.

A discovery: it happened that the model trains well even with a batch size of 4 and just 32-32 dimensions. The iteration time is comparable to training the 192x192 models. Note that the dataset is not the best regarding sharpness, especially on the Kiril's side, and for now I've been training with the 192x192 faces of Kiril, many of which extracted from 640x360 videos or from 854x480, resized from 640x360, also blurred and not sharpened/super-resolution enhanced. A future test will use sharper dataset for finetuning.

The size of the model is way below the maximum that I could fit in 750 Ti, so far: 345 MB, both 192x192 df-ud and df-udt models, so 288x288 or even 320x320? could be possible - something to try.

I don't know if the batch 4 and using just 32-32-16 dimension will work so well on lower resolution, when the features will be smaller: check it out.

~517K, 15.9.2022

image

~439K, 14.9.2022

image

image image image image

0173000 Loss: [09:01:56][#172976][0745ms][0.8431][0.6401]

(Big values for it. time are due to he saving etc., fast ones are about 687-690 ms,
I don't push the CPU and GPU all the time and now (14.9) it goes around 716-723 ms)

[07:55:32][#295280][0701ms][0.5368][0.5851]
[08:05:08][#296115][0786ms][0.5318][0.5899]
[08:15:09][#296983][1349ms][0.5333][0.5828]
[08:25:08][#297849][0728ms][0.5277][0.5788]
[08:35:08][#298715][0765ms][0.5308][0.5818]
...
[16:55:09][#341542][0814ms][0.5063][0.5739]
[17:05:10][#342387][0830ms][0.5052][0.5729]
[17:15:09][#343227][0835ms][0.5107][0.5675]
[17:25:10][#344066][0825ms][0.5083][0.5663]
...
[21:55:10][#366704][0721ms][0.5008][0.5617]
[22:05:10][#367539][0714ms][0.4994][0.5653]
[22:15:10][#368374][0846ms][0.4997][0.5643]
...
[08:49:20][#381196][0957ms][0.4930][0.5595]
[08:58:29][#381927][0846ms][0.4921][0.5590]
[09:08:29][#382721][0929ms][0.4904][0.5592]
[09:18:29][#383520][0941ms][0.4837][0.5584]
...
[20:19:13][#435702][0743ms][0.4756][0.5525]
[20:28:45][#436491][0741ms][0.4776][0.5384]
[20:38:45][#437311][1672ms][0.4767][0.5538]
[20:48:46][#438128][1320ms][0.4766][0.5504]
[20:58:46][#438941][0727ms][0.4736][0.5382]
[21:08:45][#439747][0952ms][0.4713][0.5435]
...
[22:32:03][#515431][1127ms][0.4553][0.5283]
[22:41:06][#516131][0900ms][0.4537][0.5326]
[22:51:07][#516905][1147ms][0.4543][0.5371]
  • xx.8.2022 - Colorization of Arnold with the POC method with Pix2Pix (Image to image) translation

The first attempts weren't good. For Stoltenberg it was reasonable that it would work, because it's in similar conditions, the Arnold dataset is extremely diverse and apparently colorization models should be trained per sequence, which is too much.

I had ideas for attempting another solution which would not use NN, but instead would directly map color faces to the grayscale ones, taking into account the shape and the facial landmarks, but lately I lack time to start implementing it.

  • xx.8.2022

  • Training a DF-UD 192x192 128-48-48-16 model, Kiril to Arnold (pretrained earlier), now after 36K it. without flip dst and some changes in the dataset. I expected (hoped) the new model to train in less iterations than the previous that was DF-UDT 192x19, 128-48-32-16, because of the non-symmetric number of dimensions of the encoder and decoder, but for now it seems similar, and the dataset of the Kiril ("dst") is not exactly the same.

Note, 10.9.2022: The expectations were correct. After just about 200K-210K iterations it looked similarly (LRD and sot-m for a few K in the end), the loss was a bit higher, a few humdredth. Visually there was a sequences which was significantly better - the one in front of the "parliament" background, which had the area covering the nose blinking, now it was stable.

  • 26.8.2022: I'm considering a more pronouncable or/and "unique" name/alias (names/aliases) of the project. For now:
  1. Arnold-DFL or Arnaud-DFL or
  2. Arnoldator or Arnaudator [ArnOdator - "Арнодейтъ"] or
  3. Arnaudatar or Arnoldatar (the same pronounciation)
  4. All of the above
  5. Arnoldify, Arnoldifier, ArnoldDF, ArnolDF? [Arnol-D-F]
  • ~ 10.8.2022: Experimental feature: POC version of the colorization of the output from the grayscale models during merging with additionally trained dedicated pix2pix GAN: complete prototype and merging on 10.8.2022.

image

  • 19.8-20.8.2022: After investigation of the properties of the colorized faces, debugging of the merging, there was a successful application of an idea for stabilization of the colorized output and merging with precomputed faces (for other usages as well, e.g. prerendered 3D-models or synchronously performing faces etc.). In the video example below the output is also sharpened after merging (whole frame) - it needs to be per face only etc. or to have some antialiasing eventually.

See a merged and sharpened segment with Jens, whole frame: http://twenkid.com/v/32-stolten-color-20-8-20220.645217694677918.mp4

Only aligned faces:

The raw colorized face with pix2pix model without color stabilization was flickering; it was very bad, but still noticeable, especially in some moments. https://user-images.githubusercontent.com/23367640/185765054-c012ba01-8600-4b78-9a45-3f01270237e4.mp4

After color-gamma stabilization, that artifact was gone (only the aligned face, 146 KB): https://user-images.githubusercontent.com/23367640/185765072-bc8be151-3e7f-4758-8f5d-5d4a8f8255f9.mp4

The color-gamma stabilization is done by first probe-rendering all faces, computing their total pixel weight per frame and the average of all frames, then adjusting the gamma for each frame according to the average in order to flatten the fluctuations: if the face is too dark - it gets lighter and vice versa (Indeed, this phenomenon itself is to show some intrinsic properties of the pix2pix model.). Finally there is sharpening and then merging is performed using the gamma-corrected faces.

More info, results and code - later.

  • Future work: Integration with Wav2Lip and Wav2Lip-HQ for automated lip-sync and repair of the output from the lip-sync libraries.

  • Future work: Possibly integration with RealTimeVoiceCloning? etc./other TTS engines etc. [Note, 25.5.2023: I used RealTimeVoiceCloning as a Desktop program, I found a Colab notebook now: RealTimeVoiceCloning However this is probably outdated already. Find better systems.]

22.6.2022

Premiere! Part I

image

https://artificial-mind.blogspot.com/2022/06/arnold-schwarzenegger-governor-of.html

https://www.youtube.com/watch?v=n2JMGqxOYfA&feature=youtu.be

image

  • Part III: Combined with Wav2Lip - the mouth is synchronized; no finetuning; another pass of DFL to repair the artifacts (however there is some loss of contrast and sharpness) https://youtu.be/4F7PB7wBEXk

image

  • Part IV: Lip-syncing with finetuning repair and synthesized cloned voices (RealTimeVoiceCloning) over original script ("full directing") with Arnold, Lena Schwarzenegger and Stoltenberg https://youtu.be/X56QkNzkkVM

title-4-color-NATO-Arnold-reacts-to-Stoltenberg

image

image

image

image image

image

image

Technical details

  • The model was trained on a Geforce 750 Ti 2 GB on Windows 10, created with DFL-SAEHDBW, df-udt mf 192x192 128x48x32x16, mostly batch size=6. The size on disk at the start was about 345 MB. Training began with ~494K iterations pre-training, no color transfer (it wasn't adapted yet) on a customized version of the DFL faceset: I gradually removed "bad" samples with overlapped objects etc, in the end it was about 14551 items, instead of 15843*. I didn't turn off Random warp and no LRD. The pretrained model probably could improve more, but I wanted to switch to the actual faces*.

  • Then on the two-faces training, after 509K iterations, I turned on LRD (learning rate drop-out) and SOT-M color transfer for 10Ks iterations - SOT is slow, but it improved the loss with a few iterations. Some fine-tuning experiments for a few sequences which were introduced late (the BTA ones), or had too noticeable "flashes" in the face (the "parliament"-stamps wall), also the sequences from the interview which starts after the first part of the movie when the music ends with the EU star flag (an attempt to improve the borders in semi-profile views); only a few seconds were used from the latter, mostly because I noticed a better matching sequence, rather than due to much improved quality - the latter finetuning mostly added more contrast in the teeth, darker separation regions, but it changed the position of the eyes etc. and also the borders of the face).

image

  • Future work: In future pretrainings I may reduce the faceset more and may split it to different types of faces and remove more of the "bad" samples in order to pretrain the model faster."Bad" are e.g. beard and moustache, old faces with deep wrinkles - they all are not reconstructed well anyway at least with the dimension I've tried so far.

  • Note that the model had a natural "mask" on his face, reaching about the upper end of the mf-area, and when the model renders it darker - it actually is not a mistake and bad color transfer, LOL.

Kiril to Arnold

image

image image

History

~ 22.4.2022 -- Minor iterface changes (more keys for save, save preview periods and auto saving;later: possible forcing generation of new previews instead of keeping the same for the whole training etc.; reviewing the code

~ 25.4.2022? --> Started working on SAEHDBW - Grayscale deepfake model; research, experiments, modifications of the channel dimensions, studying the NN model.

Goals of the project:

  • Allow training of more "respectable" resolution models even on 2 GB GPUs, GeForce 750 Ti in particular, and on integrated GPUs

  • Achieve several times? higher performance: either smaller models, higher resolution and/or more detailed models, although grayscale.

  • Study the code, if possible modify the architectures and optimize more: simplify/reduce the depth of some networks to check if they would achieve similar quality due to the single channel with improved performance.

  • ~ 27.4.2022 --> SAEHDBW - SUCCESS!

First correctly training version (last error fixed masks getting bug after untested change, numpy). Initial mode: training from color input which is converted to grayscale during reading. Now the model can train on 224x224 and 192x192 images on a 2 GB Geforce 750 Ti. (The quality etc. is to be checked at various AE dims, encoder dims, decoder dims.)

Note: Initially working with the DirectX12 build with tensorflow-directml due to the CUDA version which didn't run (that issue was resolved later on 10.5.2022). The CPU is Core i5 6500. Iteration time varies and depends also on the CPU power mode (Economical/Balanced and their details) and the overclocking of the GPU. Initially I didn't use additional overclock (boost clock was up to 1176 MHz), from some moment I started using MSI Afterburner which allowed going above 1415 MHz for core clock and above 2900 MHz for RAM and that was not the maximum, but it doesn't sustain it all the time due to temperature and power limits set for safety and it may get unstable and interrupt, while the gain is small. When the batch size is bigger or there is heavy CPU processing, e.g. some color transfer modes such as SOT, the CPU load is higher.

liae-ud-r192-ae-96-48-48-12-bw_SAEHDBW x f, ~900ms (R192...-ae-64-48-48 - almost the same it time ~ 860 ms)

R224-AE64-48-48-12-BW_SAEHDBW x f liae-ud ~ 1500-1600ms

Training on converted to grayscale pretrain faceset, resized to 384x384 (from 768x768). Checking how much details would be captured with different dimensions.

2.5.2022:

DF-UDT-256-96-32-32-16_SAEHDBW --> batch 4: ~ 1200 ms, (~ 1150 ms slightly more overclock); batch 5 ~ 1500 ms (OOM errors occasionaly)

Some model sizes on disk and batch size (for 750 Ti/2 GB)

Model Sizes: MB

LIAE-UD

liae-ud-r192-ae-96-48-48-12-bw_SAEHDBW -- 362 MB  (12 is == 16 mask dim)

R192-AE80-48-48-16-LIAE-UD-SAEHD-BW-PRETR_SAEHDBW-- 315 M
R192-AE64-48-48-16-LIAE-UD-SAEHDBW-PRETR_SAEHDBW  -- 269 M

liae-ud-r96-64-64-22_SAEHDBW -- 313 M
R224-AE64-48-48-12-BW_SAEHDBW -- 297 M  (12 is == 16)

liae-ud-r96-24-24-12_SAEHDBW -- 45.6 M (12 is == 16)
liae-ud-r96-32-32-12-bw_SAEHDBW -- 96 M (12 is == 16)

LIAE-UDT-R128-96-32-32-16_SAEHDBW -- 209 M  B: 4,6,8 (B=8: it= 444-463 ms (530, Lower power mode) --> ~4K@4, 13K@6 --> 8), 4.5.2022 --> train at f (also do on mf)

LIAE-UDT-192-128-32-32-16-SAEHDBW_SAEHDBW_summary -- 270 MB
LIAE_UDT-192-96-32-48-16-SAEHDBW_SAEHDBW_summary -- 346 MB ==> would it be beneficial if having a lower dim. encoder than decoder (yet more parameters overall and thus more detail?)

LIAE-UDT 192-96-32-48-16 vs LIAE-UD 192-96-48-48-16 ?


"G:\SAEHDBW\liae-udt-192-96-32-32-16-SAEHDBW_SAEHDBW... - 234 МБ

"G:\SAEHDBW\liae-ud-192-128-32-32-16_SAEHDBW_src_dst_opt.npy" - 273 МБ

*** DF-UD ***

dfud-r96-32-32-12-bw_SAEHDBW_summary.txt -- 104 M
DF-UDT-256-96-32-32-16_SAEHDBW -- 281 M B: 4, 5 (OOM in minutes sometimes)
DF-UDT-R96-64-24-24-16-SAEHDBW_SAEHDBW -- 50 MB , train @mf 

"G:\SAEHDBW\df-udt-192-96-32-32-16_SAEHDBW - 285 МБ

"G:\SAEHDBW\df-udt-192-128-48-32-16_SAEHDBW - 345 МБ

df-ud-192-128-48-48-16_SAEHDBW_summary - 345 МБ

(Check the quality of df-ud and df-udt with the same number of params - if there's enough patience to train them. Does 48-32 is good enough, varying number of channels/dimensions for the encoder and decoder? Encoder > decoder ... Also: 128/32/32? Mapping the default 256/64/64 for color 128 pix)

== Какво ще е качеството? Сравнимо ли ще е? Доколко няма да достигнат размерностите?

df-udt-192-80-32-32-16_SAEHDBW -- 260 MB, batch 8 speed ~ or faster than  df-udt-192-128-48-32-16_SAEHDBW @  batch 6  #13-5-2022, ~21h;


Trying 't', searching for higher sharpness; various settings tried.

  • Only 1.45 GB available. Connecting the monitor to the integrated GPU, but OS still reserves that amount and sometimes even with just 77% usage when connecting a monitor to the integrated GPU, after trying to create a bigger batch the model doesn't start training. (Windows 10 issue.)

  • 10-5-2022 - Debugged the CUDA build so now I can use it. (I used the DirectX12 build so far, because the CUDA-one hanged with no output). The solution provided 33% speed-up! iperov/DeepFaceLab#5515 "device_lib.list_local_devices() doesn't return in the CUDA build up to 2080 #5515"

  • 11.5.2022 - After a series of GPU-related crashes when trying to run big models at the edge in the CUDA build, with sizes which previously were training in the DirectX12 version, it seems that the DX12 version, i.e. tensorflow-directml uses less memory than CUDA. It is possible to train: DF-UDT-256-96-32-32-16_SAEHDBW --> as recorded recently: batch 4: ~ 1200 ms, (~ 1150 ms slightly more overclock); batch 5 ~ 1500 ms (OOM errors occasionaly).

Sample pretraining and training experiments

  • SAEHDBW df-udt-mf-R192-128-48-32-16, batch 6 pretraining on the custom subset of the built-in faceset.pak, reextracted* at 384x384 grayscale and with removed many images, lastly about 14634. It still has a few "bad" samples with babies and hand in the mouth, some musician with an instrument etc, hair covering some of the eye etc. Portraits with glasses are kept, except extremely strange ones. Microphones, hands and other objects crossing the face are removed, expect a few and when it is slightly touching; etc.

  • 12.5.2022 Note: They had better be resized instead of reextraceted, using a modified DFL script for resizing, but I haven't reached that part of the code then.

See the process with more examples etc.: https://github.com/Twenkid/DeepFaceLab-SAEHDBW/blob/main/Pretraining-df-udt-mf-192-128-48-32-16.md

441K-

Eyelashes

image image image image

Teeth

image

  • SAEHDBW liae-ud-r96-32-32-16; no pretraining; mouth and eye priority

Iterations: about 220-230 ms for batch size 8 and about 320-330 for 12 after 370K, training from color images. This model is still progressing. The faceset is currently about 400 images for Biden and 2200 images for the other person (K.P.), where the Biden's are of a higher quality and sharper. Almost all of K's images are from videos and low resolution. Initially training only on about 200 stills of Biden, then about 200 frames from a clip of a lower quality. The model has small dimensions so very high quality can't be captured anyway.

image

  • Modified faceset - reduced to about 14600 images, removed many which I didn't "like", having occlusions - microphones, hands etc., grayscale 384x384

  • SAEHDBW liae-udt-r128-96-32-32-16 Pretraining; mouth and eye priority

image image image image image

...

Continuing working on the project.

Training with grayscale input (8-bit jpg, png) significant improvement (twice) for the pre-training dataset color images. Modify Extract to extract to grayscale. Unpacking the pretrain faceset, 768x768 color. Extract to 384x384 BW.

19.4.2022

preview_period = 334
preview_period_colab = 200

C:\DFL\DeepFaceLab_DirectX12\_internal\DeepFaceLab\models\Model_SAEHD\Model.py
  #override
    def should_save_preview_history(self):
       return (not io.is_colab() and self.iter % preview_period == 0) or (io.is_colab() and self.iter % preview_period_colab == 0)
    #    return (not io.is_colab() and self.iter % ( 10*(max(1,self.resolution // 64)) ) == 0) or \
    #           (io.is_colab() and self.iter % 100 == 0)

Also in ModelBase.py

Similar function, ... but it's overriden in the separate models.

VS: copy with line numbers

17.4.2022

Reduce noise, training at home/PC in the bedroom: Намаляване на шума при обучение вкъщи на компютър в спалнята (през нощта):

  • Win 10, Power Options/Опции на захранването: Икономичен режим, до 40-50% на ЦПУ - спира да бучи циклично (Core i5 6500 3.3 GHz).

DFL Colab - от оригиналния iperov ... e... drive - edge -->

9.4.2022

Try also Live

  • If downloading the github repo and running the main.py as with Colab, it will run only on CPU. It must be built.
  • If using the CPU version and newer OpenCV than 4.1, edit:
C:\Deepface\core\imagelib\warp.py

Locate:

def gen_warp_params ...

 random_transform_mat = cv2.getRotationMatrix2D((int(w // 2), int(w // 2)), rotation, scale) 
  random_transform_mat[:, 2] += (tx*w, ty*w)

#### (int(w // 2), int(w // 2)) #####
  • Download the Builds and run/edit the .bat files

  • After install, find trainer.py Change trainer thread:

 while True:
        try:
            start_time = time.time()

            save_interval_min = 241 # 25  -- or whatever

To whatever you like.

  • SAEHD 192
Running trainer.

Choose one of saved models, or enter a name to create a new model.
[r] : rename
[d] : delete

[0] : sae-192 - latest
[1] : ksaehd
 : 0
0
Loading sae-192_SAEHD model...

Choose one or several GPU idxs (separated by comma).

[CPU] : CPU
  [0] : NVIDIA GeForce GTX 750 Ti
  [1] : Intel(R) HD Graphics 530

[1] Which GPU indexes to choose? : 0
0

Press enter in 2 seconds to override model settings.
[0] Autobackup every N hour ( 0..24 ?:help ) :
0
[y] Write preview history ( y/n ?:help ) :
y
[n] Choose image for the preview history ( y/n ) : y
[3001] Target iteration : 10001
10001
[n] Flip SRC faces randomly ( y/n ?:help ) :
n
[y] Flip DST faces randomly ( y/n ?:help ) :
y
[4] Batch_size ( ?:help ) :
4
[y] Masked training ( y/n ?:help ) :
y
[y] Eyes and mouth priority ( y/n ?:help ) :
y
[n] Uniform yaw distribution of samples ( y/n ?:help ) :
n
[n] Blur out mask ( y/n ?:help ) :
n
[n] Place models and optimizer on GPU ( y/n ?:help ) :
n
[n] Use AdaBelief optimizer? ( y/n ?:help ) :
n
[n] Use learning rate dropout ( n/y/cpu ?:help ) :
n
[y] Enable random warp of samples ( y/n ?:help ) :
y
[0.0] Random hue/saturation/light intensity ( 0.0 .. 0.3 ?:help ) :
0.0
[0.0] GAN power ( 0.0 .. 5.0 ?:help ) :
0.0
[0.0] Face style power ( 0.0..100.0 ?:help ) :
0.0
[0.0] Background style power ( 0.0..100.0 ?:help ) :
0.0
[rct] Color transfer for src faceset ( none/rct/lct/mkl/idt/sot ?:help ) :
rct
[n] Enable gradient clipping ( y/n ?:help ) :
n
[y] Enable pretraining mode ( y/n ?:help ) :
y
Initializing models: 100%|###############################################################| 5/5 [00:01<00:00,  2.56it/s]
Loaded 15843 packed faces from C:\DFL\DeepFaceLab_DirectX12\_internal\pretrain_faces
Sort by yaw: 100%|##################################################################| 128/128 [00:00<00:00, 239.71it/s]
Sort by yaw: 100%|##################################################################| 128/128 [00:00<00:00, 238.36it/s]
Choose image for the preview history. [p] - next. [space] - switch preview type. [enter] - confirm.
=================== Model Summary ====================
==                                                  ==
==            Model name: sae-192_SAEHD             ==
==                                                  ==
==     Current iteration: 3001                      ==
==                                                  ==
==----------------- Model Options ------------------==
==                                                  ==
==            resolution: 192                       ==
==             face_type: wf                        ==
==     models_opt_on_gpu: False                     ==
==                 archi: liae-ud                   ==
==               ae_dims: 128                       ==
==                e_dims: 96                        ==
==                d_dims: 64                        ==
==           d_mask_dims: 16                        ==
==       masked_training: True                      ==
==       eyes_mouth_prio: True                      ==
==           uniform_yaw: True                      ==
==         blur_out_mask: False                     ==
==             adabelief: False                     ==
==            lr_dropout: n                         ==
==           random_warp: False                     ==
==      random_hsv_power: 0.0                       ==
==       true_face_power: 0.0                       ==
==      face_style_power: 0.0                       ==
==        bg_style_power: 0.0                       ==
==               ct_mode: rct                       ==
==              clipgrad: False                     ==
==              pretrain: True                      ==
==       autobackup_hour: 0                         ==
== write_preview_history: True                      ==
==           target_iter: 10001                     ==
==       random_src_flip: False                     ==
==       random_dst_flip: True                      ==
==            batch_size: 4                         ==
==             gan_power: 0.0                       ==
==        gan_patch_size: 16                        ==
==              gan_dims: 16                        ==
==                                                  ==
==------------------- Running On -------------------==
==                                                  ==
==          Device index: 0                         ==
==                  Name: NVIDIA GeForce GTX 750 Ti ==
==                  VRAM: 1.45GB                    ==
==                                                  ==
======================================================
Starting. Target iteration: 10001. Press "Enter" to stop training and save model.
[
[11:54:23][#003094][1766ms][2.5891][1.9999]

Pretraining

======

"G:\SAEHDBW\liae-udt-192-96-32-32-16-SAEHDBW_SAEHDBW_decoder.npy" "G:\SAEHDBW\liae-udt-192-96-32-32-16-SAEHDBW_SAEHDBW_inter_B.npy" "G:\SAEHDBW\liae-udt-192-96-32-32-16-SAEHDBW_SAEHDBW_encoder.npy" "G:\SAEHDBW\liae-udt-192-96-32-32-16-SAEHDBW_SAEHDBW_inter_AB.npy" "G:\SAEHDBW\liae-udt-192-96-32-32-16-SAEHDBW_SAEHDBW_data.dat" "G:\SAEHDBW\liae-udt-192-96-32-32-16-SAEHDBW_SAEHDBW_src_dst_opt.npy" 234 МБ

"G:\SAEHDBW\liae-ud-192-128-32-32-16_SAEHDBW_src_dst_opt.npy" "G:\SAEHDBW\liae-ud-192-128-32-32-16_SAEHDBW_encoder.npy" "G:\SAEHDBW\liae-ud-192-128-32-32-16_SAEHDBW_data.dat" "G:\SAEHDBW\liae-ud-192-128-32-32-16_SAEHDBW_decoder.npy" "G:\SAEHDBW\liae-ud-192-128-32-32-16_SAEHDBW_inter_AB.npy" "G:\SAEHDBW\liae-ud-192-128-32-32-16_SAEHDBW_inter_B.npy" 273 МБ

*** DF-UD ***

dfud-r96-32-32-12-bw_SAEHDBW_summary.txt -- 104 M DF-UDT-256-96-32-32-16_SAEHDBW -- 281 M B: 4, 5 (OOM in minutes sometimes) DF-UDT-R96-64-24-24-16-SAEHDBW_SAEHDBW -- 50 MB , train @mf

"G:\SAEHDBW\df-udt-192-96-32-32-16_SAEHDBW_inter.npy" "G:\SAEHDBW\df-udt-192-96-32-32-16_SAEHDBW_encoder.npy" "G:\SAEHDBW\df-udt-192-96-32-32-16_SAEHDBW_data.dat" "G:\SAEHDBW\df-udt-192-96-32-32-16_SAEHDBW_src_dst_opt.npy" "G:\SAEHDBW\df-udt-192-96-32-32-16_SAEHDBW_decoder_dst.npy" "G:\SAEHDBW\df-udt-192-96-32-32-16_SAEHDBW_decoder_src.npy" 285 МБ

"G:\SAEHDBW\ df-udt-192-128-48-32-16_SAEHDBW_decoder_src.npy" "G:\SAEHDBW\ df-udt-192-128-48-32-16_SAEHDBW_encoder.npy" "G:\SAEHDBW\ df-udt-192-128-48-32-16_SAEHDBW_inter.npy" "G:\SAEHDBW\ df-udt-192-128-48-32-16_SAEHDBW_summary.txt" "G:\SAEHDBW\ df-udt-192-128-48-32-16_SAEHDBW_data.dat" "G:\SAEHDBW\ df-udt-192-128-48-32-16_SAEHDBW_src_dst_opt.npy" "G:\SAEHDBW\ df-udt-192-128-48-32-16_SAEHDBW_decoder_dst.npy" 345 МБ

df-ud-192-128-48-48-16_SAEHDBW_summary 345 МБ

About

Grayscale SAEHD model and mode for training deepfakes. Notes, tests, experience, tools, study and explanations of the source code.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published