-
2 questions but very related:
I am doing this through Colab and will be glad to share the notebook once done. I put a notebook together that does training (diffutrainer) and then diffudriver |
Beta Was this translation helpful? Give feedback.
Replies: 9 comments 16 replies
-
The option steps means the diffusion steps inside the diffusion model, and best left at 1000. Trainsteps is more like what you want, but it is not epochs but more like batches (I need to check how exactly it counts). Roughly speaking we train accum batches, then update model. After saveEvery update rounds we store model and generate samples. That's what you should look at. I have found that accum = 10, saveEvery = 100 and nsamples = 2 work well. I.e. after 1000 batches we store the model and generate two samples. Trainsteps derives from the original code, perhaps we should count the number of such rounds (1000 batches in the example). I have usually trained for from half a day to two days. My experience is that as we intend to use the model for text and image guided diffusion, not sampling from the model alone, it works without extended training. As to a pretrained model, I have been thinking about it. When I have a suitable one, I will share. Any idea where to put it? BTW, for diffudiver I would recommend using the newest version with tgt_image, ssimw and textw. Works much better than the very experimental seed_image. |
Beta Was this translation helpful? Give feedback.
-
Here is a pretrained model https://drive.google.com/file/d/1bYJ67QJM5H4NRqlfrHTHizJbkZcYMprs/view?usp=sharing |
Beta Was this translation helpful? Give feedback.
-
Checked the code. Trainsteps are incremented each time the model is updated, i.e. after accum batches. As it is difficult to know how much training will be needed, I tend to set a big enough limit, then monitor how the samples look like, and the interrupt when I see it fit. Often to resume training later. I understand that with colab it is different. Anyhow, training requires time, but from a few hours to two days is still not really long time for model training. Diffutrainer now outputs loss every time it is evaluated, i.e. after accum batches. That is probably not practical in a colab and needs to be changed, around here https://github.com/htoyryla/denoising-diffusion-pytorch-ht/blob/3d9c51f00ead14351181fa90531415454522af67/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py#L573 |
Beta Was this translation helpful? Give feedback.
-
I have been training a model first with 200+ facial photos of myself. A set I have used many times. I made it a few years ago, set a camera to take photos at one second intervals, placed myself by a white background, changed my position and expression. Trained perhaps for 20 - 30 hours. First at 5e-4 to get training start fast, and it did. During the night it had however run astray, which is to be expected (too high lr in the long run). I selected a good checkpoint and resumed at a lower lr. Maybe 1e-4, maybe 5e-5. Looks like a good range is from 5e-5 to 1e-5. Then I changed the dataset, script generated images like this. Continued with a lower lr overnight, now it makes samples like this. Here's an example for this model when used with an init image (with skip, mul and weak, yes, they work now too). Full command to make this was:
I was using diffudiver2, which includes an image post processing chain and an option to decay text weight so as to be able to use a higher text weight during the early steps without CLIP introducing too much detail in the long run. I am releasing this latter model, which can be found here https://drive.google.com/file/d/1doJUqqJqrdQHIe7Dw4quqPA7q5KmBDn2/view?usp=sharing |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
A note on network architectures. Lucidrains' repo used resnet blocks originally, then went to convnext. Here I have provided both, unet0 and unetcn0 respectively, and mainly used unetcn0, apart from some early tests on unet0. Lucidrains' repo has not moved back to convnet blocks, so it may be a good idea to use unet0 here. My intuition from experience is likewise that unet0 is easier to get to learn. Others had reported in lucidrains' repo that convnext blocks failed to learn make proper lines. My experience, though limited, is similar. I have just started a training run with the dataset referred to in the previous comment. In one hour (3090 running at 250 W, lr 5e-4, 7 layers unet0, L2) the samples started assuming the proper form (though yet without detail), see sample below: While using lr as high as 5e-4 helps to get training started, it is likely to run amok after a few hours, in which case one should resume from a good checkpoint with a lower lr. |
Beta Was this translation helpful? Give feedback.
-
Fine-tuned the model further, which eventually turned out not so great. But the transition was worth the effort. Samples like this. using which diffudiver2 can make images like this |
Beta Was this translation helpful? Give feedback.
-
What is the difference between diffudiver regular vs 2 ?
Paul Fishwick, PhD
Distinguished University Chair of Arts, Technology, and Emerging Communication
Professor of Computer Science
Director, Creative Automata Laboratory
The University of Texas at Dallas
Arts & Technology
800 West Campbell Road, AT10
Richardson, TX 75080-3021
Home: https://atec.utdallas.edu/content/fishwick-paul/
Media: ***@***.***
Modeling: digest.sigsim.org
Twitter: @PaulFishwick
ONLINE: Webex,Collaborate, TEAMS, Zoom, Skype, Hangout
… On May 17, 2022, at 3:36 AM, Hannu Töyrylä ***@***.***> wrote:
Fine-tuned the model further, which eventually turned out not so great. But the transition was worth the effort. Samples like this.
using which diffudiver2 can make images like this
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
-
Back from a trip. Let me know if you make V2 the new default. Then I will not need to
change the notebook
…-p
Paul Fishwick, PhD
Distinguished University Chair of Arts, Technology, and Emerging Communication
Professor of Computer Science
Director, Creative Automata Laboratory
The University of Texas at Dallas
Arts & Technology
800 West Campbell Road, AT10
Richardson, TX 75080-3021
Home: utdallas.edu/atec/fishwick
Media: ***@***.***
Modeling: digest.sigsim.org
Twitter: @PaulFishwick
ONLINE: Webex,Collaborate, TEAMS, Zoom, Skype, Hangout
From: Hannu Töyrylä ***@***.***>
Reply-To: htoyryla/minidiffusion ***@***.***>
Date: Tuesday, May 17, 2022 at 7:23 AM
To: htoyryla/minidiffusion ***@***.***>
Cc: Paul Fishwick ***@***.***>, Author ***@***.***>
Subject: Re: [htoyryla/minidiffusion] Training and PreTrained (Discussion #4)
V2 has the post processing chain. It is just that I did not want to mess up anything with the additions. Skip (step) was another thing I added, but I added it in V1 as well.
Seems to me now that V2 is stable enough, could replace v1 altogether,
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
The option steps means the diffusion steps inside the diffusion model, and best left at 1000.
Trainsteps is more like what you want, but it is not epochs but more like batches (I need to check how exactly it counts).
Roughly speaking we train accum batches, then update model. After saveEvery update rounds we store model and generate samples. That's what you should look at.
I have found that accum = 10, saveEvery = 100 and nsamples = 2 work well. I.e. after 1000 batches we store the model and generate two samples.
Trainsteps derives from the original code, perhaps we should count the number of such rounds (1000 batches in the example).
I have usually trained for from half a day to two days…