Does this code mess up clip_grad_norm when training text encoder? #1028

araleza · 2023-12-30T23:41:03Z

araleza
Dec 30, 2023

Hey, just came back to kohya-ss code after a while, and I saw this in sdxl_train.py:

accelerator.backward(loss)
if accelerator.sync_gradients and args.max_grad_norm != 0.0:
    params_to_clip = []
    for m in training_models:
        params_to_clip.extend(m.parameters())
    accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)

optimizer.step()

It looks like all parameters for the three models (tenc1, tenc2, unet) are having a singular gradient normal formed for them, which is then clipped. Shouldn't each of the three models get its own normal generated, and these could be clipped instead? Might lead to better training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does this code mess up clip_grad_norm when training text encoder? #1028

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Does this code mess up clip_grad_norm when training text encoder? #1028

araleza Dec 30, 2023

Replies: 0 comments

araleza
Dec 30, 2023