Training results on A100 significantly worse than T4? #172

Susanoo1337 · 2024-07-12T16:01:21Z

Susanoo1337
Jul 12, 2024

So I love this colab, very easy to use. I tried it out on the free T4 allotment I got from google and the standard Style LoCon settings produced amazing results without clogging up my local GPU. Didn't quite get to epoch 10, ran out of compute time after Epoch 8 (to be fair it's a dataset with almost 400 images), but I was more than happy with the results and decided to shell out for some time on the A100, just to get the good stuff quicker.

However: I trained a different dataset LoCon on nearly the same settings (aside from not using diffusers and batch size of 16 instead of 1) and 10 epochs produced a terribly weak and low quality network. So I retrained the one I never got to finish, 10 epochs, 1 repeat, Prodigy optimizer, sdpa attention etc. same settings as before except for diffuser and batch size) and the resulting epoch 10 LoCon was weaker than the epoch 5 LoCon trained on the T4.

At this point I just feel like I'm wasting compute credits trying to figure out what's going wrong. Am I missing something? Should I always train on diffusers instead of the safetensor? Or should I avoid large batch sizes? Any hints?

hollowstrawberry · 2024-07-12T17:52:41Z

hollowstrawberry
Jul 12, 2024
Maintainer

If it's weaker, it might be because of the number of steps. If you have fewer images you often need to increase the repeats value. I'd say the number of images times the repeats should range from 100 to 400.

It's also important to consider the dim/alpha and learning rate. The default values of the XL trainer should work well with the numbers I just mentioned. If you increase the dim/alpha you would need to lower the learning rate, and if you lower the learning rate you would need to train for longer.

Other than that, the captioning is important, even moreso in XL than 1.5. It could affect the way you need to prompt while using your lora, or whether it learns properly at all.

0 replies

Susanoo1337 · 2024-07-13T06:49:44Z

Susanoo1337
Jul 13, 2024
Author

Hmmm, thanks for the swift reply, and sorry I do tend to ramble a bit. I'll try to structure my issue a little better:

I made one dataset for an SDXL/Pony lora, 343images, captioned using WD14, manually weeded.
I trained it twice:

Using your colab and recommended Dim/Alpha settings for a Style LoCon using Prodigy on a T4 GPU. Batch size 1 and 1 repeat for 343steps per epoch. Recommended LR and Conv settings for Prodigy. Using diffusers.
This training got to Epoch 8, results were great, a strong high quality LoCon.
Using your colab and recommended Dim/Alpha settings for a Style LoCon using Prodigy on an A100 GPU. Batch size 16 and 1 repeat for 21 steps per epoch. Recommended LR and Conv settings for Prodigy. Using checkpoint.
The resulting Epoch 10 LoCon was about the strength/quality of the Epoch 4 LoCon trained on the T4.

I've also (Painfully) ran the training locally on my 3060, after many hours I got an Epoch 10 lora that was comparable to the one trained on the T4.

Could using diffusers vs checkpoints make such a big difference?
Or do I have to, when using the A100 with batch size 16, increase the number of repeats so that the steps per epoch go back up to 100 ~ 400. (So roughly 5 repeats at batch size 16. So 343/16 ~~ 21* 5 = 105steps. Which would mean 1680 effective steps per epoch?)
I'm really confused at the outcomes and a little hesitant to keep wasting compute time, but if there's no clear known answer I'll have to experiment. I have no experience with different batch sizes as I've never trained on more than 1.
Thank you in advance!

3 replies

hollowstrawberry Jul 13, 2024
Maintainer

I've seen people get good results with high batch sizes, and I know personally that diffusers/checkpoint doesn't affect too much.

If you're training with fp16, try bf16 or viceversa. That could change things, but I'm not sure.

It's also possible the batch size is messing it up, it's not unheard of. I've done batch size 8 myself but with multiple folders and more steps overall. You could try increasing the steps, but it would be a bit wasteful compute-wise, so you should try with batch size 4 and same steps per epoch.

I wish I could help more. AI training is far from an exact science, so I can only go by what me and my peers have tried.

Susanoo1337 Jul 14, 2024
Author

Oh yea, absolutely, that's why I didn't want to open an issue, just a friendly discussion in case someone has had similar experiences or knows any helpful tips. Thanks so much for your input. I'm working on improving the quality of my dataset and will try again soon, probably with diffuser. Download from HF was vastly faster compared to the checkpoint from civit, wasted a lot of time on just loading that.

While I have you, if I drop the checkpoint into my gdrive, can I just point to it via content/MyDrive/xyz.safetensors ?

hollowstrawberry Jul 14, 2024
Maintainer

You can't select a model on your drive yet but I've been meaning to add that option

osombeach · 2024-07-22T20:03:28Z

osombeach
Jul 22, 2024

unfortunately i gotta agree, i've used it for few month now on A100, but couple of days ago i retrained a lora that worked perfect before, i just changed couple of words in captions and it came out horrible, like not a bit off but completely ruined, insted of photo looking pictures i got some neon drawings on black. than i tried to train completely new lora with another dataset and all i got was completely black images on generation. I then retrained the lora on T4 without changing anything except turning on diffusers and switching to fp16 from bf16 and it came out fine, fully working. This all happened right after some colab update lasts week.

0 replies

Susanoo1337 · 2024-08-01T09:31:11Z

Susanoo1337
Aug 1, 2024
Author

It's been a while since I had time to prepare a dataset, but now I've trained a couple more lora - here are my observation that work with A100 (and L4):
Use Diffusers (they load faster anyway)
Batch size <12 (or exactly 8 for L4 even though it could fit 9-10)

And this one I found in my old LoRA training journals from 1.5, seems be true for Pony as well:
STEPS are better than EPOCHS.
1500 steps per epoch for 2-3 epochs produces much better results than 150-450 steps for 10+ epochs. So far not a single LoRA has failed or produced poor results, I'm just overshooting the training a little, 3 epochs at ~1500 steps per epoch is a little overkill for style LoCON.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training results on A100 significantly worse than T4? #172

{{title}}

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Training results on A100 significantly worse than T4? #172

Susanoo1337 Jul 12, 2024

Replies: 4 comments · 3 replies

hollowstrawberry Jul 12, 2024 Maintainer

Susanoo1337 Jul 13, 2024 Author

hollowstrawberry Jul 13, 2024 Maintainer

Susanoo1337 Jul 14, 2024 Author

hollowstrawberry Jul 14, 2024 Maintainer

osombeach Jul 22, 2024

Susanoo1337 Aug 1, 2024 Author

Susanoo1337
Jul 12, 2024

Replies: 4 comments 3 replies

hollowstrawberry
Jul 12, 2024
Maintainer

Susanoo1337
Jul 13, 2024
Author

hollowstrawberry Jul 13, 2024
Maintainer

Susanoo1337 Jul 14, 2024
Author

hollowstrawberry Jul 14, 2024
Maintainer

osombeach
Jul 22, 2024

Susanoo1337
Aug 1, 2024
Author