Fix gradient clipping #5150

tohtana · 2024-02-19T02:48:45Z

The gradient clipping API doesn't apply the coefficient correctly. This PR resolves the issue and adds a test case.

cloneofsimo · 2024-02-24T05:18:48Z

I'm actually fascinated how did this bug go unnoticed for so long, so LONG? isn't grad clip on by default? does this mean literally all of huggingface trainer, lightning etc that leverages Deepspeed backend was faulty, yet somehow everyone succeeded on training? like HOW? is this not-as-critical?

Just to be clear I'm huge fan / user of deepspeed and I am very glad this tool exists, I am just genuinely curious how this could've been not-so-impactful for so long (in terms of training dynamics or so) given it looks like it impacts all of engine.step() function, and all of my previous experience with deepspeed has been very good. Sorry if I sounded too passive aggressive

tohtana · 2024-02-24T06:57:28Z

@cloneofsimo One reason is that this function is called only for limited cases. I noticed that this issue when I set zero_stage=0 and precision=fp32. I didn't see this issue when I changed zero_stage. Probably not many users have used DeepSpeed with this config.

cloneofsimo · 2024-02-24T07:07:13Z

@tohtana thanks for clarification!

Edit: yeah @SeunghyunSEO , @tohtana is right, looks like other functions are used for zero, which is why people use this in the first place

SeunghyunSEO · 2024-02-24T08:12:23Z

@tohtana ty for your kind explanation. my understanding is that when zero3 (or 1,2) is activated, it has no problem because we use this function to clip and scale the gradient, right?

The gradient clipping API doesn't apply the coefficient correctly. This PR resolves the issue and adds a test case. Co-authored-by: Logan Adams <[email protected]>

fix gradient clipping

794e992

tohtana marked this pull request as ready for review February 19, 2024 06:17

tohtana requested review from mrwyattii, tjruwase and loadams as code owners February 19, 2024 06:17

tjruwase approved these changes Feb 20, 2024

View reviewed changes

Merge branch 'master' into tohtana/fix_fp32_clipping

9871421

loadams approved these changes Feb 20, 2024

View reviewed changes

tjruwase added this pull request to the merge queue Feb 21, 2024

Merged via the queue into master with commit 005afe1 Feb 21, 2024
12 checks passed

tohtana deleted the tohtana/fix_fp32_clipping branch February 24, 2024 06:58

tohtana mentioned this pull request Apr 17, 2024

Comparison of Deepspeed Stage 1,2 and 3 vs DDP #4815

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gradient clipping #5150

Fix gradient clipping #5150

tohtana commented Feb 19, 2024

cloneofsimo commented Feb 24, 2024 •

edited

Loading

tohtana commented Feb 24, 2024

cloneofsimo commented Feb 24, 2024 •

edited

Loading

SeunghyunSEO commented Feb 24, 2024

Fix gradient clipping #5150

Fix gradient clipping #5150

Conversation

tohtana commented Feb 19, 2024

cloneofsimo commented Feb 24, 2024 • edited Loading

tohtana commented Feb 24, 2024

cloneofsimo commented Feb 24, 2024 • edited Loading

SeunghyunSEO commented Feb 24, 2024

cloneofsimo commented Feb 24, 2024 •

edited

Loading

cloneofsimo commented Feb 24, 2024 •

edited

Loading