Skip to content

Conversation

vaibhavgarg230
Copy link

@vaibhavgarg230 vaibhavgarg230 commented Oct 9, 2025

What does this PR do?

This PR adds an opt-in skip_unnecessary_grad_clip argument to TrainingArguments, optimizing Trainer’s gradient clipping for better efficiency. When enabled, the Trainer computes and logs the gradient norm every step, but skips calling clip_grad_norm_ if the norm is already below max_grad_norm. This prevents unnecessary computation for models/trainers with consistently low gradient norms, while always maintaining logging.

Maintainer requests addressed:

  • Grad norm is logged every step, even when clipping is skipped.
  • Default behavior is unchanged (the flag is False by default).

Motivation:

Dependencies:

  • No new dependencies.

Tests added:

  • New test: tests/trainer/test_gradient_clipping.py
    • Verifies clipping is skipped when grad norm is under threshold.
    • Verifies grad norm is still correctly logged.
    • Verifies the default and opt-in cases.

Documentation:

  • Argument and code block are documented as per contributing guidelines.

Before submitting

  • Discussed/approved via issue or maintainer
  • Documentation updated
  • All tests and repo checks run locally

Who can review?


Thanks for reviewing! Feedback and suggestions are very welcome.

@vaibhavgarg230
Copy link
Author

Hey @zach-huggingface, @SunMarc !
Please review my code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant