Add skip_unnecessary_grad_clip to TrainingArguments for optimized gradient clipping #41491

vaibhavgarg230 · 2025-10-09T21:47:27Z

What does this PR do?

This PR adds an opt-in skip_unnecessary_grad_clip argument to TrainingArguments, optimizing Trainer’s gradient clipping for better efficiency. When enabled, the Trainer computes and logs the gradient norm every step, but skips calling clip_grad_norm_ if the norm is already below max_grad_norm. This prevents unnecessary computation for models/trainers with consistently low gradient norms, while always maintaining logging.

Maintainer requests addressed:

Grad norm is logged every step, even when clipping is skipped.
Default behavior is unchanged (the flag is False by default).

Motivation:

Addresses [issue gradient scaling occurs even though total gradient remains < max_grad_norm in trainer.py #41431 ]
Improves performance for stable gradient or CPU-bound training.
Keeps code backward-compatible and easy to opt-in.

Dependencies:

No new dependencies.

Tests added:

New test: tests/trainer/test_gradient_clipping.py
- Verifies clipping is skipped when grad norm is under threshold.
- Verifies grad norm is still correctly logged.
- Verifies the default and opt-in cases.

Documentation:

Argument and code block are documented as per contributing guidelines.

Before submitting

Discussed/approved via issue or maintainer
Documentation updated
All tests and repo checks run locally

Who can review?

Trainer logic: @zach-huggingface @SunMarc

Thanks for reviewing! Feedback and suggestions are very welcome.

vaibhavgarg230 · 2025-10-10T10:53:07Z

Hey @zach-huggingface, @SunMarc !
Please review my code.

vaibhavgarg230 added 2 commits October 10, 2025 03:07

Add skip_unnecessary_grad_clip for optimized gradient clipping

4feb582

Merge branch 'huggingface:main' into optimize-gradient-clipping

eedb86c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add skip_unnecessary_grad_clip to TrainingArguments for optimized gradient clipping #41491

Add skip_unnecessary_grad_clip to TrainingArguments for optimized gradient clipping #41491

Uh oh!

vaibhavgarg230 commented Oct 9, 2025 •

edited

Loading

Uh oh!

vaibhavgarg230 commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add skip_unnecessary_grad_clip to TrainingArguments for optimized gradient clipping #41491

Are you sure you want to change the base?

Add skip_unnecessary_grad_clip to TrainingArguments for optimized gradient clipping #41491

Uh oh!

Conversation

vaibhavgarg230 commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

vaibhavgarg230 commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vaibhavgarg230 commented Oct 9, 2025 •

edited

Loading