Run init_weights under no_grad #747

carmocca · 2024-12-17T19:08:50Z

The initializations in init_weights can create gradients. This is almost always not intended

The alternative would be to add the decorator to model.init_weights directly. Then the responsibility is moved to the model writer. I can change that if that's preferred.

tianyu-l

Thanks for the fix. I think it's fine, compared to having a decorator on every single init_weights() definition (reasoning: technically we probably only need it on the outermost one, but what if users would like to call it explicitly for a submodule -- they still have to use the context manager).

Run init_weights under no_grad

b3cd2eb

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 17, 2024

tianyu-l approved these changes Dec 17, 2024

View reviewed changes

awgu merged commit 5ce8a0c into pytorch:main Dec 17, 2024
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run init_weights under no_grad #747

Run init_weights under no_grad #747

carmocca commented Dec 17, 2024

tianyu-l left a comment

Run init_weights under no_grad #747

Run init_weights under no_grad #747

Conversation

carmocca commented Dec 17, 2024

tianyu-l left a comment

Choose a reason for hiding this comment