Training strategies #7

metaphorz · 2022-05-20T19:20:16Z

metaphorz
May 20, 2022

I have been training some tree etchings (about 30). It has been only a couple of hours so far, and the samples look like noise fields. Is a good strategy for adjusting LR to watch the loss value and if it starts a sustained rise in loss value, to lower the rate? It is 5e-4 now. Not sure if I need to alter it, and when.

Answered by htoyryla

May 20, 2022

Sounds like something is wrong. I usually see first the noise graining growing in size and then starting to arrange according to the material. I would recommend unet1, the new resnet model. Unetcn0 which uses ConvNext is more difficult, it seems, and the original repo has replaced with with what I have named unet1.

Lr 5e-4 should show results quite soon, and I usually find that around 10 saved models I need to stop and lower the rate. It seems that changing it once to 5e-5 has been enough.

What are your other parameters? Image size, number of layers / mults . There needs to be a certain number of layers in order for the model to "perceive" the image as a whole. I am currently using 7 laye…

View full answer

htoyryla · 2022-05-20T19:45:02Z

htoyryla
May 20, 2022
Maintainer

Sounds like something is wrong. I usually see first the noise graining growing in size and then starting to arrange according to the material. I would recommend unet1, the new resnet model. Unetcn0 which uses ConvNext is more difficult, it seems, and the original repo has replaced with with what I have named unet1.

Lr 5e-4 should show results quite soon, and I usually find that around 10 saved models I need to stop and lower the rate. It seems that changing it once to 5e-5 has been enough.

What are your other parameters? Image size, number of layers / mults . There needs to be a certain number of layers in order for the model to "perceive" the image as a whole. I am currently using 7 layers for 512px. Mults 1 1 2 2 4 4 8. The idea is that as we go from pixels to more abstract feature levels, we need more filters each looking for a specific feature.

Loss type then... have recently started to prefer L1. Maybe more a question of how the output looks like, both have been successful in training.

4 replies

metaphorz May 20, 2022
Author

Here is what I am using. I forgot to change mults from 1 1 2 2 4 8 to 1 1 2 4 4 8: !python diffutrainer.py --images /content/drive/MyDrive/datasets/Strutt1024sq-1024 --lr 5e-4 --steps 1000 --accum 10 --dir output --imageSize 512 --batchSize 2 --saveEvery 100 --nsamples 4 --mults 1 1 2 2 4 8 --model unet1

The image size in the dataset used for training is 1024x1024, so not sure if I should change that too.

metaphorz May 20, 2022
Author

Noise graining has indeed grown in size. Sample-18.png looks like:

metaphorz May 21, 2022
Author

I have done a new training run as follows: !python diffutrainer.py --images /content/drive/MyDrive/datasets/Strutt1024sq-1024 --lr 5e-5 --steps 1000 --accum 10 --dir output --imageSize 512 --batchSize 2 --saveEvery 100 --nsamples 4 --mults 1 1 2 4 4 8 --model unet1

Still looks like noise after 4 hours. Will train overnight. The originals are tree etchings from the web. Here is a characteristic example:

metaphorz May 21, 2022
Author

Here is sample-28.png

htoyryla · 2022-05-21T07:26:59Z

htoyryla
May 21, 2022
Maintainer

With lr 5e-5 it may indeed take very long to get started (if that was not a typo). First 5e-4, then when it "runs amok" (you will see it) resume from a good checkpoint at 5e-5. This works for me at least.

It may, however, be that your images are different enough. High detail, much variation at local level, high frequency content in technical terms. Very different from the point of view of convolutional filters which have to learn patterns starting from the adjacent pixels gradually up to more complex features.

I'll see if I could try something similar.

Your images are 1024px, you say. They are downscaled to 512px automatically, but this might affect the image quality adversely. Cutting the image into various 512px pieces would be an alternative, at least for the purpose of learning the style of the images. I do have an augmenter script which can do various crops, flips and rotations to generate a larger dataset. I can include that in the repo, but cannot promise detailed documentation.

2 replies

metaphorz May 21, 2022
Author

Does "runs amok" mean that there is a sustained upward trend in increasing loss values? Or is this a visual thing?

htoyryla May 21, 2022
Maintainer

It is both. Loss can jump drastically or behave erratically. Images change far more than normal. I should have examples but couldn't find them.

Otherwise it can be normal that there are phases when the loss increases and/or images seem to develop to a wrong direction. That is normal, the optimizer is not always going directly to the minimum (but may instead look for another minimum past a local maximum).

Actually, for me it no longer runs amok, as I tend to drop the lr as soon as the loss has decreased enough, some image forms become visible; usually happens around 10 - 16 cycles.

htoyryla · 2022-05-21T08:38:44Z

htoyryla
May 21, 2022
Maintainer

I took your image and made 64 random crops of 512x512px, now running

python diffutrainer.py --images etch/ --accum 10 --saveEvery 100 --losstype l1 --nsamples 2 --batchSize 2 --imageSize 512 --mults 1 1 2 2 4 4 8 --dir un1etch --trainsteps 200000 --lr 4e-4 --model unet1

and already the 2nd and 3rd saves look like this.

I.e. the noise is starting get organised into some forms.

Are you sure you have the latest version with Unet1? The earlier version unfortunately would not give an error, but by default use a model which turned out to be the most difficult to train.

2 replies

metaphorz May 21, 2022
Author

It has been running maybe 15 hours. The latest run gives sample-82.png

I did not try to modify the training learning rate -- kept it at 5e-5. I notice that the loss is really not decreasing. It is around 0.03 and 0.09 and does not get lower (eg. 0.0002). I could try another run later with higher learning rate and load this latest .pt file.

Your random crop software sounds like a good idea. Please include or send and I can use that

metaphorz May 21, 2022
Author

It has the right color (of the original etchings) and seems to be getting better definition. Maybe just run it longer?

htoyryla · 2022-05-21T10:49:22Z

htoyryla
May 21, 2022
Maintainer

I too notice that the loss quite soon reaches a level where it appears to stop decreasing. Still, in the long run, the images continue to improve visibly. It could be that the average loss is still decreasing (might be useful to display the average loss calculated between two saves). Or it could be that the loss guides into the right direction while still not significantly decreasing?

I will include the augmenter script but it is a bit messy right now, as I tend to edit the code every time I need something else, rather than providing options for everything.

My training, after 2+ hours, looks like this.

1 reply

metaphorz May 25, 2022
Author

These examples and a previous look way better than what I am getting, but I am following the same 3 phase LR modification approach that you did.

htoyryla · 2022-05-21T11:51:32Z

htoyryla
May 21, 2022
Maintainer

Starting to look like trees? Dropped lr after 16 cycles.

3 replies

metaphorz May 21, 2022
Author

This is after 18 hours with the full data set (no crops) Looking better. I can train another 20 hours later and load this latest pt

metaphorz May 21, 2022
Author

for your run, a "cycle" means that you have model-16.pt ?

htoyryla May 21, 2022
Maintainer

As we are not really using epochs here, I find it easiest to think in terms of cycles (or whatever) when we have trained with accum * saveEvery batches, and the model has been exposed to batchSize * accum * saveEvery images. With my usual settings this becomes 2 * 20 * 100 = 4000. And we save the model and the samples.

metaphorz · 2022-05-21T13:39:52Z

metaphorz
May 21, 2022
Author

Actually this is converging nicely. What are your training arguments? You are using Radom crops rather than a set of full images and maybe that is why it converges faster P Paul Fishwick, PhD Distinguished University Chair of Arts, Technology, and Emerging Communication Professor of Computer Science Director, Creative Automata Laboratory The University of Texas at Dallas Arts & Technology 800 West Campbell Road, AT10 Richardson, TX 75080-3021 Home: https://atec.utdallas.edu/content/fishwick-paul/ Media: ***@***.*** Modeling: digest.sigsim.org Twitter: @PaulFishwick ONLINE: Webex,Collaborate, TEAMS, Zoom, Skype, Hangout

…

On May 21, 2022, at 6:51 AM, Hannu Töyrylä ***@***.***> wrote: Starting to look like trees? Dropped lr after 16 cycles. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

1 reply

htoyryla May 21, 2022
Maintainer

Check anyway your diffutrainer.py, the newest version has these lines

minidiffusion/diffutrainer.py

Lines 38 to 39 in 55354b3

    
           elif mtype == "unet1": 
        
             from alt_models.Unet1 import Unet

If yours does not, that would explain everything, and you should update everything from the repo.

I'll make a gist of the augmentor and post a link.

metaphorz · 2022-05-21T13:41:46Z

metaphorz
May 21, 2022
Author

Also let me know what the Python command line arguments you are using for the cropping using the tool Paul Fishwick, PhD Distinguished University Chair of Arts, Technology, and Emerging Communication Professor of Computer Science Director, Creative Automata Laboratory The University of Texas at Dallas Arts & Technology 800 West Campbell Road, AT10 Richardson, TX 75080-3021 Home: https://atec.utdallas.edu/content/fishwick-paul/ Media: ***@***.*** Modeling: digest.sigsim.org Twitter: @PaulFishwick ONLINE: Webex,Collaborate, TEAMS, Zoom, Skype, Hangout

…

On May 21, 2022, at 6:51 AM, Hannu Töyrylä ***@***.***> wrote: Starting to look like trees? Dropped lr after 16 cycles. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

0 replies

metaphorz · 2022-05-21T14:03:25Z

metaphorz
May 21, 2022
Author

Yes, the current training has these lines. Right now, I am removing the denoising directory and doing a complete reclone each time the notebook is run. I know this is not ideal, as I should be able to do a git merge without deleting the current one.

…

-paul Paul Fishwick, PhD Distinguished University Chair of Arts, Technology, and Emerging Communication Professor of Computer Science Director, Creative Automata Laboratory The University of Texas at Dallas Arts & Technology 800 West Campbell Road, AT10 Richardson, TX 75080-3021 Home: utdallas.edu/atec/fishwick Media: ***@***.*** Modeling: digest.sigsim.org Twitter: @PaulFishwick ONLINE: Webex,Collaborate, TEAMS, Zoom, Skype, Hangout From: Hannu Töyrylä ***@***.***> Reply-To: htoyryla/minidiffusion ***@***.***> Date: Saturday, May 21, 2022 at 9:01 AM To: htoyryla/minidiffusion ***@***.***> Cc: Paul Fishwick ***@***.***>, Author ***@***.***> Subject: Re: [htoyryla/minidiffusion] Training strategies (Discussion #7) Check anyway your diffutrainer.py, the newest version has these lines https://github.com/htoyryla/minidiffusion/blob/55354b3ed9fd23eb779d5d7f132c6a86a5a1b5d6/diffutrainer.py#L38-L39 If yours does not, that would explain everything, and you should update everything from the repo. I'll make a gist of the augmentor and post a link. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

htoyryla · 2022-05-21T14:21:31Z

htoyryla
May 21, 2022
Maintainer

Here's a gist you can use making those cropped images. See at the beginning of the code for an example command

https://gist.github.com/htoyryla/a53925c224e511e132a410c6c3e7514c

0 replies

metaphorz · 2022-05-22T15:59:35Z

metaphorz
May 22, 2022
Author

Ive been training at 5e-5 for about 30 hours. The 2nd sample looks quite good. I was going to raise LR but I'll hold off for now

0 replies

htoyryla · 2022-05-22T17:04:45Z

htoyryla
May 22, 2022
Maintainer

I had mine running through the night and stopped it in the morning, when samples looked like this.

However, as all training images are crops of one image, the model is overfitting, All samples too look like crops of the original, and maybe not so useful, but this does proves that the model is learning.

I also tried to apply the model to a photo of myself, with results like this:

0 replies

metaphorz · 2022-05-22T18:20:15Z

metaphorz
May 22, 2022
Author

Samples look great! Did you use 5e-4 throughout the training?

…

Sent from my iPhone Paul Fishwick, PhD Distinguished University Chair of Arts, Technology & Emerging Communication Professor of Computer Science The University of Texas at Dallas Art, Technology & Emerging Communication 800 West Campbell Road, AT10 Richardson, TX 75080-3021 Blog 1: ***@***.*** Blog 2: digest.sigsim.org LinkedIn: metaphorz Twitter: @PaulFishwick

On May 22, 2022, at 12:04 PM, Hannu Töyrylä ***@***.***> wrote: I had mine running through the night and stopped it in the morning, when samples looked like this. However, as all training images are crops of one image, the model is overfitting, All samples too look like crops of the original, and maybe not so useful, but this does proves that the model is learning. I also tried to apply the model to a photo of myself, with results like this: — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

18 replies

htoyryla May 25, 2022
Maintainer

I have added now loss_type ssim. Will try on one of my own datasets.

Now the trainer should also output average loss each time it stores model and sample (average loss since last save).

htoyryla May 25, 2022
Maintainer

Ssim loss while training appears to work nicely, decreasing constantly so far. Also easier to monitor now that we see the average loss.

Images are evolving in the right direction but it remains to be seen whether they will get detailed enough. I may not be able to continue this test training so long.

metaphorz May 25, 2022
Author

that looks great. I assume --losstype ssim ?

metaphorz May 25, 2022
Author

Nice catch on the layer number. I totally missed this. I need to educate myself using the diagram:

mults 1 1 2 2 4 4 8

means 7 layers apparently and the number in each layer refers to what? For example, does 1 mean 1x1 template and 4 mean 4x4 template (convolution) ? I am curious to try different numbers or adding more layers. Clearly, each number is a power of 2.

Also assuming that the architecture, coding it as you have the 7 numbers, is for both the encoder and decoder.

htoyryla May 25, 2022
Maintainer

The numbers work like this. Each layer in a convolutional network consists of a number of channels, or filters. Each filter responds to a particular feature. Normally the number of filters is increased when we go to higher abstraction levels. We may start with 64 filters in the pixel level level and then gradually increase the number of filters (when, incidentally, the spatial size of the layer decreases). We may start at 512x512 with 64 filters and end up with, say 4x4 with 512 filters.

Now, with mults 1 1 2 2 4 4 8, and base 64, we have 64, 64, 128, 128, 256, 256, 512 filters in each layer respectively.

Sometimes 64 filters is not enough... in my GAN work I normally used minimum of 128 filters in the discrimination, to give it more capacity to perceive different patterns. We could have an option like that also; when we create the model, we are now using a constant value:

minidiffusion/diffutrainer.py

Lines 46 to 49 in 9fb006b

    
           model = Unet( 
        
               dim = 64, 
        
               dim_mults = tuple(opt.mults) 
        
             ).cuda()

Note BTW that my diagram is from another, image-to-image converter (like pix2pix) and does not exactly show the architecture used here. It is similar enough to illustrate the principle of U-net.

And yes, --losstype ssim.

Right, in a U-net we go first from pixels to conceptual, layer by layer, and then back in the reverse order. It is somehow misleading to call the parts encoder and decoder, as there are shortcuts between the two stacks as shown in my diagram. You cannot take them apart, like you can an encoder and a decoder. There is no real bottleneck, as features can pass unchanged at each abstraction level, and the overall result is a mixture of feature flows at every level.

The Unet has been very much used in image-to-image transforms such as pix2pix. Because of the shortcuts, it can learn transformations very quickly if there is enough correspondence between input and target images.

metaphorz · 2022-05-22T19:08:43Z

metaphorz
May 22, 2022
Author

A good guide for learning P

…

Sent from my iPhone Paul Fishwick, PhD Distinguished University Chair of Arts, Technology & Emerging Communication Professor of Computer Science The University of Texas at Dallas Art, Technology & Emerging Communication 800 West Campbell Road, AT10 Richardson, TX 75080-3021 Blog 1: ***@***.*** Blog 2: digest.sigsim.org LinkedIn: metaphorz Twitter: @PaulFishwick

On May 22, 2022, at 1:29 PM, Hannu Töyrylä ***@***.***> wrote: Here's the command history, which is great in linux when it works... I guess I need to make minidiffusion keep it's own log like I have previously done with my nqgan. 1945 python diffutrainer.py --images etch/ --accum 10 --saveEvery 100 --losstype l1 --nsamples 2 --batchSize 2 --imageSize 512 --mults 1 1 2 2 4 4 8 --dir un1etch --trainsteps 200000 --lr 4e-4 --model unet1 1946 python diffutrainer.py --images etch/ --accum 10 --saveEvery 100 --losstype l1 --nsamples 2 --batchSize 2 --imageSize 512 --mults 1 1 2 2 4 4 8 --dir un1etch2 --trainsteps 200000 --lr 8e-5 --model unet1 --load un1etch/model-16.pt 1947 python diffutrainer.py --images etch/ --accum 10 --saveEvery 100 --losstype l1 --nsamples 2 --batchSize 2 --imageSize 512 --mults 1 1 2 2 4 4 8 --dir un1etch3 --trainsteps 200000 --lr 2e-5 --model unet1 --load un1etch2/model-54.pt I.e. first 16 cycles at 4e-4, then at 8e-5 up to 54, and finally at 2e-5 up to 110 when I finished. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

3 replies

metaphorz May 26, 2022
Author

I tried an 8 layer approach: 1 1 2 2 4 4 8 8 with ssim loss. Starting with LR of 4e-4, the average loss decreased to a point and then began an upward trend resulting in almost solid red or purple. So, I ran 4e-4 for 20 cycles and now am on 4e-5. It is on cycle 22 now. The loss is decreasing but at a slow pace. Likely, a plot of average over time would be a negative exponential curve (diminishing returns). I think I recall Katherine Crowson @crowsonkb creating the unconditional 512x512 diffusion with 60 hours on a computer with 8 A100s. Some of my numbers may be off.

metaphorz May 26, 2022
Author

Follow up question: is it "Mini" Diffusion because of a smaller # of layers? I named the notebook "MiniDiffusion"

htoyryla May 26, 2022
Maintainer

I had mine training with SSIM overnight on a dataset of my own, loss decreasing very neatly. The images though were not as good as from an earlier training using L1 and unetcn0. I am thinking of changing the code so that one could use L1/L2 and SSIM in parallel. And it might be time to introduce lr scheduling as well, in my GAN work I've had goo results with cyclical LR.

When adding layers, there's not much points in going beyond where the output of the "encoder" is less than 2x2. I need to check where the limit is with this code. In principle, when starting with 512, one layer gets to 256, 2 > 128, 3 > 64, 4 > 32, 5 > 16, 6 > 8, 7 > 4, 8 > 2. But this is assuming that every layer cuts the size to half, need to look into it.

If memory is not a problem, I would suggest trying 1 2 4 8 and 8 for all remaining layers. That way the intermediate layers will have increased capacity to perceive different features. When I was working with GANs, this was the standard way, hard coded into the script.

Or perhaps have values higher than 8 in the most abstract layers. These layers are small in spatial terms anyway, so increasing mults there is very cheap.

BTW, whereas there is a reason why the spatial dimensions change in powers of 2, I see no reasons the mults could be anything.

htoyryla · 2022-05-26T15:41:14Z

htoyryla
May 26, 2022
Maintainer

The "mini" is more related to my interests. To be able to run in a limited memory and still make large enough images. To be able to train one's own models in reasonable time. The tradeoff, then, is flexibility... the models will not be able to produce all kinds of objects by themselves, or reproduce every style imaginable. A different philosophy. A tool for personal work.

That said, there is not necessarily anything in the codebase to prevent using it in larger scale, like training a model with imagenet dataset or comparable. The models could be made quite powerful by using more filters per layer. It is only that I have not done that myself, and probably will never do either.

0 replies

metaphorz · 2022-05-26T21:17:10Z

metaphorz
May 26, 2022
Author

You earlier posted this:

What did you use to make this happen? I am finding that I am getting really interesting patterns, but nothing reflecting the detail you have gotten. Did you use the 3 LR phase approach starting with something large like 4e-4 and then lowering it?

7 replies

htoyryla May 27, 2022
Maintainer

I have never used stylegan, strangely enough, so I don't even know if you would train it from scratch (which sounds a huge task) or finetune with new data.

A rich dataset is maybe not the solution either... the trick to manage a minimal training is that data is homogenous enough. Every image that is significantly different will tend to confuse the model. Remember that training must begin at the pixel level, learning different patterns of adjacent pixels.

In principle the etchings should be homogenous , but there the problem probably is that it is not easy for the convolutions to see the shapes from what looks like noise at the pixel level. Maybe with larger convolution kernels, but I am not too eager to start adding more moving parts.

htoyryla May 27, 2022
Maintainer

I was mildly successful with a dataset of line drawings of faces (actually photos converted to line drawings).

Samples are not as clean as the training images, but the model clerkly has learned to recognise lines.

Am right now testing this particular model in a tiling arrangement, in which miniDiffusion works on 512x512 windows of a larger canvas, one at a time, looks like this (under progress, an excerpt from a larger canvas). I did not expect this model to work well, but it appears to follow the underlying image to a reasonable degree.

htoyryla May 27, 2022
Maintainer

Looked at the code and it was easy enough to add a variant of Unet0 with kernel of 5. Pushed into the repo already, but could not test yet.

Quick test: not working yet.

Corrected. Now at least starts to train with --model unet0k5 using kernel of 5x5 in the downward/input side stack.

metaphorz May 27, 2022
Author

I like these image results

metaphorz May 27, 2022
Author

Typically, one starts transfer learning using something like wikiart.pkl and then you train from there. That seemed to get good results.

metaphorz · 2022-05-27T16:26:17Z

metaphorz
May 27, 2022
Author

So this is just specifying unet0 on command line? P

…

Sent from my iPhone Paul Fishwick, PhD Distinguished University Chair of Arts, Technology & Emerging Communication Professor of Computer Science The University of Texas at Dallas Art, Technology & Emerging Communication 800 West Campbell Road, AT10 Richardson, TX 75080-3021 Blog 1: ***@***.*** Blog 2: digest.sigsim.org LinkedIn: metaphorz Twitter: @PaulFishwick

On May 27, 2022, at 10:43 AM, Hannu Töyrylä ***@***.***> wrote: Looked at the code and it was easy enough to add a variant of Unet0 with kernel of 5. Pushed into the repo already, but could not test yet. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

4 replies

htoyryla May 27, 2022
Maintainer

So this is just specifying unet0 on command line? P

unet0 uses 3x3 kernel as before, for backward compatibility.

unet0k5 uses 5x5 kernel (as I had added above as soon as I had tested that it started to run).

Having a 5x5 kernel throughout the layer stack means that if we should not use so many layers that the dimensions go below that. 8 => 4 is ok, 4 => 2 will probably throw an error.

htoyryla May 27, 2022
Maintainer

Have being training for 1.5 hours on the 64 etching crops set, using unet0k5, L1 loss, 4e-4 and 1 1 2 2 4 4 8.

Average loss between saves decreasing neatly, now around 0.2 (cycle 8). Samples look like

Will terminate the run now, even the remaining machine has tendency to overheat. On the one that failed, it was the CPU. Should be getting improvement to this in a week or two.

metaphorz May 27, 2022
Author

I'll keep you updated on my experiments. Changed to the 5x5 kernel and keeping the original etching data set for now. Changed the "save after" to 500 rather than 100 because my Google Drive was filling up; not something you need to worry about with a local machine.

htoyryla May 27, 2022
Maintainer

Which makes me think it might be useful to use a cycle of 100 for samples and averaging loss while saving models less often.

BTW, storage is not unlimited in a local machine either. In the long run it is a good practice to save runs on an external drive directly, arranged according to project. Archiving hundreds of gigabytes of projects only when you already need the extra room on the disk is time consuming, I know.

htoyryla · 2022-05-28T11:08:08Z

htoyryla
May 28, 2022
Maintainer

Not trained on etchings, but I post it anyway. My way of doing things, training models as lightly as possible, observing how they behave, finding ways to use them in artistic work. Rather than trying to make AI do all the work, to imitate art.

Same model from from which I had sampled those faces above. Image created in 2048x2048 in 16 512x512px tiles by miniDifffusion.

Another run with post processing on (decrease contrast etc.).

3 replies

metaphorz May 28, 2022
Author

The tree etchings do present a challenge and I am reporting on a test yesterday with them. I'll likely move on to a new dataset soon. I tried training on the original (no crops) dataset with LR of 4e-4. I saved after 500. Here is the loss progression:

500: 0.1419
1000: 0.0723
1500: 0.0690
2000: 0.0663
2500: 0.0657
3000: 0.0631

As you can see, it decreases toward a diminishing return. The loss calculation would take an enormous # of further runs to get down to 0.01 for instance. I am now taking the best model from that run and using 4e-5 to see where it leads.

metaphorz May 28, 2022
Author

Fuzzy, but not bad right now (one on left)

metaphorz May 28, 2022
Author

Current arguments:

!python diffutrainer.py --images /content/drive/MyDrive/datasets/Strutt1024sq-1024 --lr 4e-5 --steps 1000 --accum 10 --dir output --losstype l1 --imageSize 512 --batchSize 2 --saveEvery 500 --nsamples 2 --mults 1 1 2 2 4 4 8 --model unet0k5 --load {pretrained_models}/model-16.pt

metaphorz · 2022-05-28T16:03:31Z

metaphorz
May 28, 2022
Author

I may be mixing threads; however, here are some results from the tree etching dataset (no cropping). It is interesting to me because one cannot associate loss value with level of realism. This one I am working on starting optimizing in an aesthetically good direction (my view) and the average losses are not at all below 0.1. Anthropomorphically, it is as if the network is learning the essence of the etchings now.

6 replies

metaphorz May 28, 2022
Author

The question is whether I can use the trained model to generate images not in the dataset. Will try this. Is the above experiment an example of 'overfitting' ?

htoyryla May 28, 2022
Maintainer

It is obvious, that if we try to train a model with only a few images to generate complete images, it will overfit.

Why should we care about using the model for unguided generation? I.e. if we manage to train the model to learn enough features that are useful in making images, we can use that model as a kind of brush guided by visual and textual guidance.

That is what I am trying to do with miniDiffusion.

PS. Most loss types correlate poorly on how we perceive images. L1 and L2 may be good for training, to some extent, but are not perceptual measures. We just need to move a pixels a bit and the loss jumps. SSIM loss should be a little better. There are also perceptual losses (LPIPS) which use a neural network to evaluate visual features for the loss measurement.

PPS. On the other hand, I have managed to train GANs with small datasets to generate interesting images which are not in the training set (there was an example above). What happens there, I think, that the model learns features which it can combine in visually interesting ways, while the results do not resemble any of the objects in the dataset. Technically, a failure, artistically a success.

PPPS. Your case of overfitting is actually a proof that the model is capable of learning. For training a model to generalise one would need much more data and much longer training. I don't have resources for that, therefore I seek domains where I can make models work in an artistically meaningful process with small datasets and short training times.

metaphorz May 28, 2022
Author

I observed your PPS comment in StyleGAN2 (SG2). There are only 24 images in the dataset, but SG2 managed to create unique images. See below

I now understand that I need to reframe my thinking on minidiffusion. Yes, the diffusion model learns well. It is kind of incredible that I can use this model to generate the dataset.

I am going to work on a much larger dataset to get out of the overfitting.

htoyryla May 28, 2022
Maintainer

A larger dataset together with informed use of augmentation (by which I mean that the augmented images should also be representative of desired output.

And... properly guided diffusion may well produce good images too even when the unguided samples are not acceptable. That is the area where I am working.

htoyryla May 28, 2022
Maintainer

I'd like rephrase what I wrote a moment ago

Why should we care about using the model for unguided generation? I.e. if we manage to train the model to learn enough features that are useful in making images, we can use that model as a kind of brush guided by visual and textual guidance.

What we see during training is unguided samples, which may not well enough represent the potential of the model when used under guidance, i.e. with diffudiver. Only when we test a model well enough with diffudiver, will we find out its real potential. And "well enough" again mandates that one is familiar enough with using the various methods and approaches possible in diffudiver.

I find myself using much more time and attention to using diffudiver than to using diffiutrainer. Luckily, training takes long and requires minimal attention, so there is time there for exploring an already trained model to find its potential. Exploring... I feel the word fully covers how I feel about it. Training is a chore, guided generation is exploration.

htoyryla · 2022-05-28T18:43:41Z

htoyryla
May 28, 2022
Maintainer

It may well be that Unet architecture learns quickly as it can directly reproduce images through the shortcuts between the lower levels, input to output. It will take more time to learn more complex relationships in which the higher layers are needed.

GANs do not have such shortcuts. Therefore they are more tricky to train but in principle better in generalising.

I have used Unets earlier for image transformations, like contours to images or colorisation. In such tasks the ability to quickly learn the identity transform and proceed to learn to add whatever needs to be modified, is a real benefit.

Makes me think of two options. One would be to add discriminator, like in a GAN, which would try to ensure that the samples in general are in the desired domain. This is something that I probably have already mentioned and should be quite straightforward to implement.

The second option is to remove the shortcuts from the Unet, ie. make it effectively an autoencoder.

0 replies

metaphorz · 2022-05-28T21:28:07Z

metaphorz
May 28, 2022
Author

Seems logical to spend more time on sampling; Training is taken up at significant cost--of time mostly. I am training an art face-based dataset, and am starting to get interesting patterns and color

Soon, I'll just be using sampling on this model.

2 replies

metaphorz May 29, 2022
Author

The model is still training and faces are beginning to appear more clearly (the original dataset is painted faces)

metaphorz May 29, 2022
Author

This is after 30 cycles (time = 3000). Looking good. I used PicArrange to do a reverse image search to see if these we already present. They were not. If there are other reverse image software that you use in order to test for feature learning, let me know

So many creative options -- going abstract with model-6.pt all the way to face realism (model30.pt)

metaphorz · 2022-05-30T03:53:22Z

metaphorz
May 30, 2022
Author

The above 2 images are sampled from model-30.pt. After that I had to restart the training. I then used the same python command, with same LR and other arguments, and then used --load model-30.pt. Then, the network seems to regress back to noise. For example:

This is equivalent to what I was getting with model-6.pt in the first training run.

Is there a way to continue where I left off with model-30.pt ? Just loading that model is not doing that.

12 replies

metaphorz May 30, 2022
Author

I see -- so when I load a saved model-30.pt and it generates a new model-30.pt, the latter has one more cycle

htoyryla May 30, 2022
Maintainer

No... I did not mean to say that. If there was an extra training cycle we would see the losses.

I only meant that when we start from scratch, the first save and sampling happens at the end of the first cycle.

When we load a model, we go directly into evaluating and sampling it. There is also a minor bug... the average loss shown is far too small, as I divide the accumulated loss by saveEvery, but in this case we have not accumulated loss from a cycle, just from a single evaluation. I'll do something to this later.

PS. No... it happens you are actually right :) It looks like it goes through one training step, not a cycle, before sampling and saving. When I added the capability to load and resume, I did not care so much about detail. I probably should clean up that part at some point.

metaphorz May 31, 2022
Author

This question is a bit outside of the thread: do you have a suggestion on what git commands to execute to take your master branch and update whatever directory is used to create the clone? Right now, I check to see if the directory exists on Google Drive. If it does not exit, I do a "git clone ...". If it does exist, I remove the directory first and then "git clone". A bit inefficient, but my git skills are lacking.

htoyryla May 31, 2022
Maintainer

My git skills are rather limited as well, but from the command line "git pull" is what I use regularly to update my clones.

Then again... always having the latest may be a risky too as newest changes may sometimes introduce errors. To minimise the risk, one would need to separate between the development line and actual releases...but frankly, that's not something I would want to spend my time on.

metaphorz May 31, 2022
Author

Let me work on this. Google Colab also makes things a bit tricky with git so I may ask Derrick Schultz who has worked with this.

metaphorz · 2022-05-31T15:23:43Z

metaphorz
May 31, 2022
Author

Is there a way to get N images produced by the sampler? I know I can put a loop in there, but I wondered whether this is a feature. Typically, I would like to get 12 or 24 images

10 replies

htoyryla May 31, 2022
Maintainer

It is a new script, or a couple of them already. I'll have to think whether to share one... I'd rather see people get their own ideas and try them. There is so much more one can do than merely text-to-image.

metaphorz May 31, 2022
Author

I agree on this - often it is better left to the artist's approach. More fun that way, and more original.

metaphorz May 31, 2022
Author

Now that I have a great many training models to use for sampling, what are recommended settings for diffudiver to get similar samples produced during training? I assume using no prompt and perhaps no cuts?

htoyryla May 31, 2022
Maintainer

To be frank... might be best to do a simple script for just that, rather than starting to think of settings :D

But let's see. No targets: text, image, tgt_image, img_prompt.

Lr to 0. Don't use skip or set to zero. Imagesize same as model size. Omit all postproduction settings: contrast, brightness, gamma, eqhist etc.

Just to be sure ssimw, textw and imgpw to zero.

In fact it may be enough just to omit all of the above options.

I guess I have never tried this myself but it should work.

PS. Just made this with tiling. It is funny that no etchings were used in the training, neither is etching mentioned in the prompt.

Training images looked like this one. Made from photos with another neural model I found somewhere.

metaphorz May 31, 2022
Author

Like the tilings and images. I'll let you know what works to get similar output to training samples

metaphorz · 2022-05-31T21:19:03Z

metaphorz
May 31, 2022
Author

Although it is hard for me to tell whether the following does an exact sampling as is done during training,
the results seem reasonable. For reference, here is what is sampled during training for model-55:

This is what the notebook user expects when they sample model-55.pt without any further optimization (e.g. using CLIP).

Now onto sampling with diffudiver.py using your instructions:

!python diffudiver.py --dir {outputdir} --textw 0 --ssimw 0 --imgpw 0 --name 'faces' --imageSize 512 --modelSize 512 --load {pretrained_models}/Diffusion-Model-55.pt --mults 1 1 2 2 4 4 8 --saveEvery 100 --saveAfter 700 --model unet0k5

Time to sample one image on a V100: about 1 minute 30 seconds. The above 3 were hand-picked from several runs. Other runs produce less desirable results, but clearly this is an aesthetic choice. It might be generating what the sampling-during-training produces. Here are some others:

Not bad:

A bit on the odd side:

But yeah, ideally I'd like to go to sleep after asking for 50 images, and then wake up and pick the best ones.

0 replies

metaphorz · 2022-05-31T21:52:15Z

metaphorz
May 31, 2022
Author

But for a grand finale (at least for Tuesday), I used diffudiver on the phrase "a vase of flowers" using the painterly faces trained network as follows:

!python diffudiver.py --text "a flower vase" --low 0.4 --high 0.8 --cutn 16 --dir {outputdir} --name 'vase' --textw 0.5 --ssimw 0.5 --lr 0.002 --imageSize 1024 --modelSize 512 --load {pretrained_models}/Diffusion-Model-55.pt --mults 1 1 2 2 4 4 8 --ema --saveEvery 50 --saveAfter 700 --model unet0k5

1 reply

htoyryla Jun 1, 2022
Maintainer

One thing that has been bothering me that the way how I has implemented CLIP guidance was pretty crude. Kind of having CLIP and diffusion tampering with the same image independently. It worked, but especially with higher lr / textw may not look so nice.

I have started to look for alternative ways of doing this, and have added one in diffudiver. Use option --gradv to enable it.

htoyryla · 2022-06-01T09:13:49Z

htoyryla
Jun 1, 2022
Maintainer

For sampling, have a look at this #8

You can sample one or multiple epochs, as many samples per epoch as you want, and it supports batchSize.

2 replies

metaphorz Jun 1, 2022
Author

#8 was very useful. However, I am unsure about the epochs. I have the trained model, which I call Diffusion-Model-55.pt. That is what I'd like to load. What is an epoch relative to this pretrained file?

htoyryla Jun 1, 2022
Maintainer

Then use

--load {pretrained_models}/Diffusion-Model --epoch 55

For further explanation see #8 .

Training strategies #7

metaphorz May 20, 2022

Replies: 25 comments · 81 replies

htoyryla May 20, 2022 Maintainer

metaphorz May 20, 2022 Author

metaphorz May 20, 2022 Author

metaphorz May 21, 2022 Author

metaphorz May 21, 2022 Author

htoyryla May 21, 2022 Maintainer

metaphorz May 21, 2022 Author

htoyryla May 21, 2022 Maintainer

htoyryla May 21, 2022 Maintainer

metaphorz May 21, 2022 Author

metaphorz May 21, 2022 Author

htoyryla May 21, 2022 Maintainer

metaphorz May 25, 2022 Author

htoyryla May 21, 2022 Maintainer

metaphorz May 21, 2022 Author

metaphorz May 21, 2022 Author

htoyryla May 21, 2022 Maintainer

metaphorz May 21, 2022 Author

htoyryla May 21, 2022 Maintainer

metaphorz May 21, 2022 Author

metaphorz May 21, 2022 Author

htoyryla May 21, 2022 Maintainer

metaphorz May 22, 2022 Author

htoyryla May 22, 2022 Maintainer

metaphorz May 22, 2022 Author

htoyryla May 25, 2022 Maintainer

htoyryla May 25, 2022 Maintainer

metaphorz May 25, 2022 Author

metaphorz May 25, 2022 Author

htoyryla May 25, 2022 Maintainer

metaphorz May 22, 2022 Author

metaphorz May 26, 2022 Author

metaphorz May 26, 2022 Author

htoyryla May 26, 2022 Maintainer

htoyryla May 26, 2022 Maintainer

metaphorz May 26, 2022 Author

htoyryla May 27, 2022 Maintainer

htoyryla May 27, 2022 Maintainer

htoyryla May 27, 2022 Maintainer

metaphorz May 27, 2022 Author

metaphorz May 27, 2022 Author

metaphorz May 27, 2022 Author

htoyryla May 27, 2022 Maintainer

htoyryla May 27, 2022 Maintainer

metaphorz May 27, 2022 Author

htoyryla May 27, 2022 Maintainer

htoyryla May 28, 2022 Maintainer

metaphorz May 28, 2022 Author

metaphorz May 28, 2022 Author

metaphorz May 28, 2022 Author

metaphorz May 28, 2022 Author

metaphorz May 28, 2022 Author

htoyryla May 28, 2022 Maintainer

metaphorz May 28, 2022 Author

htoyryla May 28, 2022 Maintainer

metaphorz
May 20, 2022

Replies: 25 comments 81 replies

htoyryla
May 20, 2022
Maintainer

metaphorz May 20, 2022
Author

metaphorz May 20, 2022
Author

metaphorz May 21, 2022
Author

metaphorz May 21, 2022
Author

htoyryla
May 21, 2022
Maintainer

metaphorz May 21, 2022
Author

htoyryla May 21, 2022
Maintainer

htoyryla
May 21, 2022
Maintainer

metaphorz May 21, 2022
Author

metaphorz May 21, 2022
Author

htoyryla
May 21, 2022
Maintainer

metaphorz May 25, 2022
Author

htoyryla
May 21, 2022
Maintainer

metaphorz May 21, 2022
Author

metaphorz May 21, 2022
Author

htoyryla May 21, 2022
Maintainer

metaphorz
May 21, 2022
Author

htoyryla May 21, 2022
Maintainer

metaphorz
May 21, 2022
Author

metaphorz
May 21, 2022
Author

htoyryla
May 21, 2022
Maintainer

metaphorz
May 22, 2022
Author

htoyryla
May 22, 2022
Maintainer

metaphorz
May 22, 2022
Author

htoyryla May 25, 2022
Maintainer

htoyryla May 25, 2022
Maintainer

metaphorz May 25, 2022
Author

metaphorz May 25, 2022
Author

htoyryla May 25, 2022
Maintainer

metaphorz
May 22, 2022
Author

metaphorz May 26, 2022
Author

metaphorz May 26, 2022
Author

htoyryla May 26, 2022
Maintainer

htoyryla
May 26, 2022
Maintainer

metaphorz
May 26, 2022
Author

htoyryla May 27, 2022
Maintainer

htoyryla May 27, 2022
Maintainer

htoyryla May 27, 2022
Maintainer

metaphorz May 27, 2022
Author

metaphorz May 27, 2022
Author

metaphorz
May 27, 2022
Author

htoyryla May 27, 2022
Maintainer

htoyryla May 27, 2022
Maintainer

metaphorz May 27, 2022
Author

htoyryla May 27, 2022
Maintainer

htoyryla
May 28, 2022
Maintainer

metaphorz May 28, 2022
Author

metaphorz May 28, 2022
Author

metaphorz May 28, 2022
Author

metaphorz
May 28, 2022
Author

metaphorz May 28, 2022
Author

htoyryla May 28, 2022
Maintainer

metaphorz May 28, 2022
Author

htoyryla May 28, 2022
Maintainer