-
I have been training some tree etchings (about 30). It has been only a couple of hours so far, and the samples look like noise fields. Is a good strategy for adjusting LR to watch the loss value and if it starts a sustained rise in loss value, to lower the rate? It is 5e-4 now. Not sure if I need to alter it, and when. |
Beta Was this translation helpful? Give feedback.
Replies: 25 comments 81 replies
-
Sounds like something is wrong. I usually see first the noise graining growing in size and then starting to arrange according to the material. I would recommend unet1, the new resnet model. Unetcn0 which uses ConvNext is more difficult, it seems, and the original repo has replaced with with what I have named unet1. Lr 5e-4 should show results quite soon, and I usually find that around 10 saved models I need to stop and lower the rate. It seems that changing it once to 5e-5 has been enough. What are your other parameters? Image size, number of layers / mults . There needs to be a certain number of layers in order for the model to "perceive" the image as a whole. I am currently using 7 layers for 512px. Mults 1 1 2 2 4 4 8. The idea is that as we go from pixels to more abstract feature levels, we need more filters each looking for a specific feature. Loss type then... have recently started to prefer L1. Maybe more a question of how the output looks like, both have been successful in training. |
Beta Was this translation helpful? Give feedback.
-
With lr 5e-5 it may indeed take very long to get started (if that was not a typo). First 5e-4, then when it "runs amok" (you will see it) resume from a good checkpoint at 5e-5. This works for me at least. It may, however, be that your images are different enough. High detail, much variation at local level, high frequency content in technical terms. Very different from the point of view of convolutional filters which have to learn patterns starting from the adjacent pixels gradually up to more complex features. I'll see if I could try something similar. Your images are 1024px, you say. They are downscaled to 512px automatically, but this might affect the image quality adversely. Cutting the image into various 512px pieces would be an alternative, at least for the purpose of learning the style of the images. I do have an augmenter script which can do various crops, flips and rotations to generate a larger dataset. I can include that in the repo, but cannot promise detailed documentation. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I too notice that the loss quite soon reaches a level where it appears to stop decreasing. Still, in the long run, the images continue to improve visibly. It could be that the average loss is still decreasing (might be useful to display the average loss calculated between two saves). Or it could be that the loss guides into the right direction while still not significantly decreasing? I will include the augmenter script but it is a bit messy right now, as I tend to edit the code every time I need something else, rather than providing options for everything. |
Beta Was this translation helpful? Give feedback.
-
Actually this is converging nicely. What are your training arguments? You are using Radom crops rather than a set of full images and maybe that is why it converges faster
P
Paul Fishwick, PhD
Distinguished University Chair of Arts, Technology, and Emerging Communication
Professor of Computer Science
Director, Creative Automata Laboratory
The University of Texas at Dallas
Arts & Technology
800 West Campbell Road, AT10
Richardson, TX 75080-3021
Home: https://atec.utdallas.edu/content/fishwick-paul/
Media: ***@***.***
Modeling: digest.sigsim.org
Twitter: @PaulFishwick
ONLINE: Webex,Collaborate, TEAMS, Zoom, Skype, Hangout
… On May 21, 2022, at 6:51 AM, Hannu Töyrylä ***@***.***> wrote:
Starting to look like trees? Dropped lr after 16 cycles.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
-
Also let me know what the Python command line arguments you are using for the cropping using the tool
Paul Fishwick, PhD
Distinguished University Chair of Arts, Technology, and Emerging Communication
Professor of Computer Science
Director, Creative Automata Laboratory
The University of Texas at Dallas
Arts & Technology
800 West Campbell Road, AT10
Richardson, TX 75080-3021
Home: https://atec.utdallas.edu/content/fishwick-paul/
Media: ***@***.***
Modeling: digest.sigsim.org
Twitter: @PaulFishwick
ONLINE: Webex,Collaborate, TEAMS, Zoom, Skype, Hangout
… On May 21, 2022, at 6:51 AM, Hannu Töyrylä ***@***.***> wrote:
Starting to look like trees? Dropped lr after 16 cycles.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
-
Yes, the current training has these lines.
Right now, I am removing the denoising directory and doing a complete reclone each time the notebook
is run. I know this is not ideal, as I should be able to do a git merge without deleting the current one.
…-paul
Paul Fishwick, PhD
Distinguished University Chair of Arts, Technology, and Emerging Communication
Professor of Computer Science
Director, Creative Automata Laboratory
The University of Texas at Dallas
Arts & Technology
800 West Campbell Road, AT10
Richardson, TX 75080-3021
Home: utdallas.edu/atec/fishwick
Media: ***@***.***
Modeling: digest.sigsim.org
Twitter: @PaulFishwick
ONLINE: Webex,Collaborate, TEAMS, Zoom, Skype, Hangout
From: Hannu Töyrylä ***@***.***>
Reply-To: htoyryla/minidiffusion ***@***.***>
Date: Saturday, May 21, 2022 at 9:01 AM
To: htoyryla/minidiffusion ***@***.***>
Cc: Paul Fishwick ***@***.***>, Author ***@***.***>
Subject: Re: [htoyryla/minidiffusion] Training strategies (Discussion #7)
Check anyway your diffutrainer.py, the newest version has these lines
https://github.com/htoyryla/minidiffusion/blob/55354b3ed9fd23eb779d5d7f132c6a86a5a1b5d6/diffutrainer.py#L38-L39
If yours does not, that would explain everything, and you should update everything from the repo.
I'll make a gist of the augmentor and post a link.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Here's a gist you can use making those cropped images. See at the beginning of the code for an example command https://gist.github.com/htoyryla/a53925c224e511e132a410c6c3e7514c |
Beta Was this translation helpful? Give feedback.
-
Ive been training at 5e-5 for about 30 hours. The 2nd sample looks quite good. I was going to raise LR but I'll hold off for now |
Beta Was this translation helpful? Give feedback.
-
I had mine running through the night and stopped it in the morning, when samples looked like this. However, as all training images are crops of one image, the model is overfitting, All samples too look like crops of the original, and maybe not so useful, but this does proves that the model is learning. I also tried to apply the model to a photo of myself, with results like this: |
Beta Was this translation helpful? Give feedback.
-
Samples look great! Did you use 5e-4 throughout the training?
…Sent from my iPhone
Paul Fishwick, PhD
Distinguished University Chair of Arts, Technology & Emerging Communication
Professor of Computer Science
The University of Texas at Dallas
Art, Technology & Emerging Communication
800 West Campbell Road, AT10
Richardson, TX 75080-3021
Blog 1: ***@***.***
Blog 2: digest.sigsim.org
LinkedIn: metaphorz
Twitter: @PaulFishwick
On May 22, 2022, at 12:04 PM, Hannu Töyrylä ***@***.***> wrote:
I had mine running through the night and stopped it in the morning, when samples looked like this.
However, as all training images are crops of one image, the model is overfitting, All samples too look like crops of the original, and maybe not so useful, but this does proves that the model is learning.
I also tried to apply the model to a photo of myself, with results like this:
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
-
A good guide for learning
P
…Sent from my iPhone
Paul Fishwick, PhD
Distinguished University Chair of Arts, Technology & Emerging Communication
Professor of Computer Science
The University of Texas at Dallas
Art, Technology & Emerging Communication
800 West Campbell Road, AT10
Richardson, TX 75080-3021
Blog 1: ***@***.***
Blog 2: digest.sigsim.org
LinkedIn: metaphorz
Twitter: @PaulFishwick
On May 22, 2022, at 1:29 PM, Hannu Töyrylä ***@***.***> wrote:
Here's the command history, which is great in linux when it works... I guess I need to make minidiffusion keep it's own log like I have previously done with my nqgan.
1945 python diffutrainer.py --images etch/ --accum 10 --saveEvery 100 --losstype l1 --nsamples 2 --batchSize 2 --imageSize 512 --mults 1 1 2 2 4 4 8 --dir un1etch --trainsteps 200000 --lr 4e-4 --model unet1
1946 python diffutrainer.py --images etch/ --accum 10 --saveEvery 100 --losstype l1 --nsamples 2 --batchSize 2 --imageSize 512 --mults 1 1 2 2 4 4 8 --dir un1etch2 --trainsteps 200000 --lr 8e-5 --model unet1 --load un1etch/model-16.pt
1947 python diffutrainer.py --images etch/ --accum 10 --saveEvery 100 --losstype l1 --nsamples 2 --batchSize 2 --imageSize 512 --mults 1 1 2 2 4 4 8 --dir un1etch3 --trainsteps 200000 --lr 2e-5 --model unet1 --load un1etch2/model-54.pt
I.e. first 16 cycles at 4e-4, then at 8e-5 up to 54, and finally at 2e-5 up to 110 when I finished.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
-
The "mini" is more related to my interests. To be able to run in a limited memory and still make large enough images. To be able to train one's own models in reasonable time. The tradeoff, then, is flexibility... the models will not be able to produce all kinds of objects by themselves, or reproduce every style imaginable. A different philosophy. A tool for personal work. That said, there is not necessarily anything in the codebase to prevent using it in larger scale, like training a model with imagenet dataset or comparable. The models could be made quite powerful by using more filters per layer. It is only that I have not done that myself, and probably will never do either. |
Beta Was this translation helpful? Give feedback.
-
What did you use to make this happen? I am finding that I am getting really interesting patterns, but nothing reflecting the detail you have gotten. Did you use the 3 LR phase approach starting with something large like 4e-4 and then lowering it? |
Beta Was this translation helpful? Give feedback.
-
So this is just specifying unet0 on command line?
P
…Sent from my iPhone
Paul Fishwick, PhD
Distinguished University Chair of Arts, Technology & Emerging Communication
Professor of Computer Science
The University of Texas at Dallas
Art, Technology & Emerging Communication
800 West Campbell Road, AT10
Richardson, TX 75080-3021
Blog 1: ***@***.***
Blog 2: digest.sigsim.org
LinkedIn: metaphorz
Twitter: @PaulFishwick
On May 27, 2022, at 10:43 AM, Hannu Töyrylä ***@***.***> wrote:
Looked at the code and it was easy enough to add a variant of Unet0 with kernel of 5. Pushed into the repo already, but could not test yet.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
-
Not trained on etchings, but I post it anyway. My way of doing things, training models as lightly as possible, observing how they behave, finding ways to use them in artistic work. Rather than trying to make AI do all the work, to imitate art. Same model from from which I had sampled those faces above. Image created in 2048x2048 in 16 512x512px tiles by miniDifffusion. Another run with post processing on (decrease contrast etc.). |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
It may well be that Unet architecture learns quickly as it can directly reproduce images through the shortcuts between the lower levels, input to output. It will take more time to learn more complex relationships in which the higher layers are needed. GANs do not have such shortcuts. Therefore they are more tricky to train but in principle better in generalising. I have used Unets earlier for image transformations, like contours to images or colorisation. In such tasks the ability to quickly learn the identity transform and proceed to learn to add whatever needs to be modified, is a real benefit. Makes me think of two options. One would be to add discriminator, like in a GAN, which would try to ensure that the samples in general are in the desired domain. This is something that I probably have already mentioned and should be quite straightforward to implement. The second option is to remove the shortcuts from the Unet, ie. make it effectively an autoencoder. |
Beta Was this translation helpful? Give feedback.
-
Seems logical to spend more time on sampling; Training is taken up at significant cost--of time mostly. I am training an art face-based dataset, and am starting to get interesting patterns and color Soon, I'll just be using sampling on this model. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Is there a way to get N images produced by the sampler? I know I can put a loop in there, but I wondered whether this is a feature. Typically, I would like to get 12 or 24 images |
Beta Was this translation helpful? Give feedback.
-
Although it is hard for me to tell whether the following does an exact sampling as is done during training, This is what the notebook user expects when they sample model-55.pt without any further optimization (e.g. using CLIP). Now onto sampling with diffudiver.py using your instructions: !python diffudiver.py --dir {outputdir} --textw 0 --ssimw 0 --imgpw 0 --name 'faces' --imageSize 512 --modelSize 512 --load {pretrained_models}/Diffusion-Model-55.pt --mults 1 1 2 2 4 4 8 --saveEvery 100 --saveAfter 700 --model unet0k5 Time to sample one image on a V100: about 1 minute 30 seconds. The above 3 were hand-picked from several runs. Other runs produce less desirable results, but clearly this is an aesthetic choice. It might be generating what the sampling-during-training produces. Here are some others: But yeah, ideally I'd like to go to sleep after asking for 50 images, and then wake up and pick the best ones. |
Beta Was this translation helpful? Give feedback.
-
But for a grand finale (at least for Tuesday), I used diffudiver on the phrase "a vase of flowers" using the painterly faces trained network as follows: !python diffudiver.py --text "a flower vase" --low 0.4 --high 0.8 --cutn 16 --dir {outputdir} --name 'vase' --textw 0.5 --ssimw 0.5 --lr 0.002 --imageSize 1024 --modelSize 512 --load {pretrained_models}/Diffusion-Model-55.pt --mults 1 1 2 2 4 4 8 --ema --saveEvery 50 --saveAfter 700 --model unet0k5 |
Beta Was this translation helpful? Give feedback.
-
For sampling, have a look at this #8 You can sample one or multiple epochs, as many samples per epoch as you want, and it supports batchSize. |
Beta Was this translation helpful? Give feedback.
Sounds like something is wrong. I usually see first the noise graining growing in size and then starting to arrange according to the material. I would recommend unet1, the new resnet model. Unetcn0 which uses ConvNext is more difficult, it seems, and the original repo has replaced with with what I have named unet1.
Lr 5e-4 should show results quite soon, and I usually find that around 10 saved models I need to stop and lower the rate. It seems that changing it once to 5e-5 has been enough.
What are your other parameters? Image size, number of layers / mults . There needs to be a certain number of layers in order for the model to "perceive" the image as a whole. I am currently using 7 laye…