Training and PreTrained #4

metaphorz · 2022-05-10T20:53:57Z

metaphorz
May 10, 2022

2 questions but very related:

Since training can take days, are there any pre-trained models available to test?
Is there a way to train for a # of epochs? I tried --steps 50 instead of 1000 but it is currently outputting an integer followed presumably by the current loss? Currently going at " 250: " with no end in sight.

I am doing this through Colab and will be glad to share the notebook once done. I put a notebook together that does training (diffutrainer) and then diffudriver

Answered by htoyryla

May 11, 2022

The option steps means the diffusion steps inside the diffusion model, and best left at 1000.

Trainsteps is more like what you want, but it is not epochs but more like batches (I need to check how exactly it counts).

Roughly speaking we train accum batches, then update model. After saveEvery update rounds we store model and generate samples. That's what you should look at.

I have found that accum = 10, saveEvery = 100 and nsamples = 2 work well. I.e. after 1000 batches we store the model and generate two samples.

Trainsteps derives from the original code, perhaps we should count the number of such rounds (1000 batches in the example).

I have usually trained for from half a day to two days…

View full answer

htoyryla · 2022-05-11T06:59:03Z

htoyryla
May 11, 2022
Maintainer

The option steps means the diffusion steps inside the diffusion model, and best left at 1000.

Trainsteps is more like what you want, but it is not epochs but more like batches (I need to check how exactly it counts).

Roughly speaking we train accum batches, then update model. After saveEvery update rounds we store model and generate samples. That's what you should look at.

I have found that accum = 10, saveEvery = 100 and nsamples = 2 work well. I.e. after 1000 batches we store the model and generate two samples.

Trainsteps derives from the original code, perhaps we should count the number of such rounds (1000 batches in the example).

I have usually trained for from half a day to two days. My experience is that as we intend to use the model for text and image guided diffusion, not sampling from the model alone, it works without extended training.

As to a pretrained model, I have been thinking about it. When I have a suitable one, I will share. Any idea where to put it?

BTW, for diffudiver I would recommend using the newest version with tgt_image, ssimw and textw. Works much better than the very experimental seed_image.

1 reply

metaphorz May 11, 2022
Author

Ive copied this text into the notebook underneath "Training"

htoyryla · 2022-05-11T08:54:01Z

htoyryla
May 11, 2022
Maintainer

Here is a pretrained model https://drive.google.com/file/d/1bYJ67QJM5H4NRqlfrHTHizJbkZcYMprs/view?usp=sharing
Six layers, 512px size. Quite similar to the one I used here #3 (comment)

12 replies

htoyryla May 11, 2022
Maintainer

I tried your mults value and it looks like you are using X for display? With colab, I need only use Pillow or some other python display package. loaded /content/drive/MyDrive/pretrained/minidiffusion-pretrained-6layers-512px/model-1351.pt, correct mults: 1,1,2,4,4,8 sampling loop time step: 0% 0/1000 [00:00<?, ?it/s]Text loss:6.5625 : cannot connect to X server

Maybe I can just display the image(s) directly.

Using to --show option will display the image as it evolves in a window, IF run locally in a computer with a screen. So probably not something for a Colab. so best not to use that option.

If running locally, you just need OpenCV installed (pip install opencv-contrib-python). That will take care of the rest. Not fully sure how that works on a Mac though. My DL stuff is done on Ubuntu.

metaphorz May 11, 2022
Author

This is what I have now: !python diffudiver.py --text 'A detailed engraving of an oak tree' --dir /content/drive/MyDrive/denoising-diffusion-pytorch-ht/output_folder --name 'test' --image /content/drive/MyDrive/datasets/Strutt1024sq-1024/the-chandos-oak.png --mul 2 --lr 0.0004 --imageSize 1024 --modelSize 512 --load /content/drive/MyDrive/pretrained/minidiffusion-pretrained-6layers-512px/model-1351.pt --mults 1 1 2 4 4 8 --ema --saveEvery 50 --saveAfter 550 --weak 1 --model unetcn0

What do you recommend changing or adding since I am not running a local GPU ? There is no --show since I removed it. I am getting sampling loops timestep.... so this seems closer

metaphorz May 11, 2022
Author

I am using the above !python invocation and got this. Pretty cool.

htoyryla May 11, 2022
Maintainer

Don't know really... have very little experience with colabs myself. If you start getting images stored in the specified folder, that should be it. You now have saveAfter 550, which means that images from early steps are not stored, just good to know.

And the a repeat from above:

I recommend using --tgt_image, --ssimw and --textw (target image and SSIM loss) now instead of --image, -mul and -weak (a hack to initialise the diffusion with a noised image), allowing for a better balance between the image, text prompt and the patterns inherent in the model (as shown in the discussion on SSIM image target).

It is getting late here (Finland) so you'll be on you own for the rest of the day.

metaphorz May 11, 2022
Author

Very helpful. Thanks.

htoyryla · 2022-05-11T09:23:07Z

htoyryla
May 11, 2022
Maintainer

Checked the code. Trainsteps are incremented each time the model is updated, i.e. after accum batches.

As it is difficult to know how much training will be needed, I tend to set a big enough limit, then monitor how the samples look like, and the interrupt when I see it fit. Often to resume training later. I understand that with colab it is different. Anyhow, training requires time, but from a few hours to two days is still not really long time for model training.

Diffutrainer now outputs loss every time it is evaluated, i.e. after accum batches. That is probably not practical in a colab and needs to be changed, around here https://github.com/htoyryla/denoising-diffusion-pytorch-ht/blob/3d9c51f00ead14351181fa90531415454522af67/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py#L573

1 reply

metaphorz May 11, 2022
Author

Thanks - it works OK in Colab but now I know where to delete the print line.

htoyryla · 2022-05-15T08:37:35Z

htoyryla
May 15, 2022
Maintainer

I have been training a model first with 200+ facial photos of myself. A set I have used many times. I made it a few years ago, set a camera to take photos at one second intervals, placed myself by a white background, changed my position and expression.

Trained perhaps for 20 - 30 hours. First at 5e-4 to get training start fast, and it did. During the night it had however run astray, which is to be expected (too high lr in the long run). I selected a good checkpoint and resumed at a lower lr. Maybe 1e-4, maybe 5e-5. Looks like a good range is from 5e-5 to 1e-5.

Then I changed the dataset, script generated images like this.

Continued with a lower lr overnight, now it makes samples like this.

Here's an example for this model when used with an init image (with skip, mul and weak, yes, they work now too).

Full command to make this was:

python diffudiver2.py --text an abstract painting of an alley in an old town --dir t98 --name tall3b --lr 0.001 --h 960 --w 1280  --modelSize 512 --load hsyn/model-189.pt --mults 1 1 2 2 4 4 8 --ema --saveEvery 50   --model  unetcn0  --textw 0.5 --tdecay 0.999 --contrast 1 --brightness 0 --eqhist 1 --unsharp 1 --bil 0 --ovl0 0.2 --show --saveAfter 700 --image /work/dset/mini-tallinna-sel-mod/0/l-12575.jpg --skip 40 --mul 1.5 --weak 0.8

I was using diffudiver2, which includes an image post processing chain and an option to decay text weight so as to be able to use a higher text weight during the early steps without CLIP introducing too much detail in the long run.

I am releasing this latter model, which can be found here https://drive.google.com/file/d/1doJUqqJqrdQHIe7Dw4quqPA7q5KmBDn2/view?usp=sharing

0 replies

htoyryla · 2022-05-16T12:20:43Z

htoyryla
May 16, 2022
Maintainer

I then continued to train with another dataset, in which I had generated geometrical shapes upon photos of various surfaces.

After training for 24 hours on a 3090 at half power I got samples like this.

which work nicely in abstract works. This model I will keep for my own use, not release.

The point is... I think... that minimal datasets work best when images are selected on visual basis, relatively homogenous. The model will then be able to impart visual qualities inherent in the dataset.

0 replies

htoyryla · 2022-05-16T15:35:20Z

htoyryla
May 16, 2022
Maintainer

A note on network architectures. Lucidrains' repo used resnet blocks originally, then went to convnext. Here I have provided both, unet0 and unetcn0 respectively, and mainly used unetcn0, apart from some early tests on unet0.

Lucidrains' repo has not moved back to convnet blocks, so it may be a good idea to use unet0 here. My intuition from experience is likewise that unet0 is easier to get to learn. Others had reported in lucidrains' repo that convnext blocks failed to learn make proper lines. My experience, though limited, is similar.

I have just started a training run with the dataset referred to in the previous comment. In one hour (3090 running at 250 W, lr 5e-4, 7 layers unet0, L2) the samples started assuming the proper form (though yet without detail), see sample below:

While using lr as high as 5e-4 helps to get training started, it is likely to run amok after a few hours, in which case one should resume from a good checkpoint with a lower lr.

0 replies

htoyryla · 2022-05-17T07:36:33Z

htoyryla
May 17, 2022
Maintainer

Fine-tuned the model further, which eventually turned out not so great. But the transition was worth the effort. Samples like this.

using which diffudiver2 can make images like this

0 replies

metaphorz · 2022-05-17T12:18:38Z

metaphorz
May 17, 2022
Author

What is the difference between diffudiver regular vs 2 ? Paul Fishwick, PhD Distinguished University Chair of Arts, Technology, and Emerging Communication Professor of Computer Science Director, Creative Automata Laboratory The University of Texas at Dallas Arts & Technology 800 West Campbell Road, AT10 Richardson, TX 75080-3021 Home: https://atec.utdallas.edu/content/fishwick-paul/ Media: ***@***.*** Modeling: digest.sigsim.org Twitter: @PaulFishwick ONLINE: Webex,Collaborate, TEAMS, Zoom, Skype, Hangout

…

On May 17, 2022, at 3:36 AM, Hannu Töyrylä ***@***.***> wrote: Fine-tuned the model further, which eventually turned out not so great. But the transition was worth the effort. Samples like this. using which diffudiver2 can make images like this — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

1 reply

htoyryla May 17, 2022
Maintainer

V2 has the post processing chain. It is just that I did not want to mess up anything with the additions. Skip (step) was another thing I added, but I added it in V1 as well.

In the quote "using which diffudiver2 can make images like this" actually the point is that the model can make this. There should not be much difference if you use v1. Other than that using the post processing chain, it is possible to e.g. bring up more nuances sometimes.

Seems to me now that V2 is stable enough, could replace v1 altogether.

Oh... I forgot. V2 has text decay too, meaning than one have use higher text weight to begin with and decay it gradually. The reasoning being to use general guidance from the text while avoiding CLIP cluttering the image with too much detail. The decay is applied at every step (out of 1000) so use values quite close to one.

There is an example command using V2 in a comment above; where the link to the new model is also to be found.

metaphorz · 2022-05-20T16:08:49Z

metaphorz
May 20, 2022
Author

Back from a trip. Let me know if you make V2 the new default. Then I will not need to change the notebook

…

-p Paul Fishwick, PhD Distinguished University Chair of Arts, Technology, and Emerging Communication Professor of Computer Science Director, Creative Automata Laboratory The University of Texas at Dallas Arts & Technology 800 West Campbell Road, AT10 Richardson, TX 75080-3021 Home: utdallas.edu/atec/fishwick Media: ***@***.*** Modeling: digest.sigsim.org Twitter: @PaulFishwick ONLINE: Webex,Collaborate, TEAMS, Zoom, Skype, Hangout From: Hannu Töyrylä ***@***.***> Reply-To: htoyryla/minidiffusion ***@***.***> Date: Tuesday, May 17, 2022 at 7:23 AM To: htoyryla/minidiffusion ***@***.***> Cc: Paul Fishwick ***@***.***>, Author ***@***.***> Subject: Re: [htoyryla/minidiffusion] Training and PreTrained (Discussion #4) V2 has the post processing chain. It is just that I did not want to mess up anything with the additions. Skip (step) was another thing I added, but I added it in V1 as well. Seems to me now that V2 is stable enough, could replace v1 altogether, — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

1 reply

htoyryla May 20, 2022
Maintainer

I have been using V2 exclusively now with no problems, so I can drop the earlier one and rename V2 to diffudiver. It is just that I am not really a maintainer type :D

Other news: I did copy the new resnet model arch from the original repo as unet1. Looks absolutely the best alternative for training. Unet1 architecture, start with lr 5e-4 and drop to, say, 5e-5 after a few hours or when you notice that it getting unstable. Samples with a small (200+) homogenous dataset after a few hours:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training and PreTrained #4

{{title}}

Replies: 9 comments 16 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Training and PreTrained #4

metaphorz May 10, 2022

Replies: 9 comments · 16 replies

htoyryla May 11, 2022 Maintainer

metaphorz May 11, 2022 Author

htoyryla May 11, 2022 Maintainer

htoyryla May 11, 2022 Maintainer

metaphorz May 11, 2022 Author

metaphorz May 11, 2022 Author

htoyryla May 11, 2022 Maintainer

metaphorz May 11, 2022 Author

htoyryla May 11, 2022 Maintainer

metaphorz May 11, 2022 Author

htoyryla May 15, 2022 Maintainer

htoyryla May 16, 2022 Maintainer

htoyryla May 16, 2022 Maintainer

htoyryla May 17, 2022 Maintainer

metaphorz May 17, 2022 Author

htoyryla May 17, 2022 Maintainer

metaphorz May 20, 2022 Author

htoyryla May 20, 2022 Maintainer

metaphorz
May 10, 2022

Replies: 9 comments 16 replies

htoyryla
May 11, 2022
Maintainer

metaphorz May 11, 2022
Author

htoyryla
May 11, 2022
Maintainer

htoyryla May 11, 2022
Maintainer

metaphorz May 11, 2022
Author

metaphorz May 11, 2022
Author

htoyryla May 11, 2022
Maintainer

metaphorz May 11, 2022
Author

htoyryla
May 11, 2022
Maintainer

metaphorz May 11, 2022
Author

htoyryla
May 15, 2022
Maintainer

htoyryla
May 16, 2022
Maintainer

htoyryla
May 16, 2022
Maintainer

htoyryla
May 17, 2022
Maintainer

metaphorz
May 17, 2022
Author

htoyryla May 17, 2022
Maintainer

metaphorz
May 20, 2022
Author

htoyryla May 20, 2022
Maintainer