DeepSpeedEngineWrapper.backward() does a bit too much #2951

zhc7 · 2024-07-22T16:17:48Z

The source code of DeepSpeedEngineWrapper:

class DeepSpeedEngineWrapper:
    """
    Internal wrapper for deepspeed.runtime.engine.DeepSpeedEngine. This is used to follow conventional training loop.

    Args:
        engine (deepspeed.runtime.engine.DeepSpeedEngine): deepspeed engine to wrap
    """

    def __init__(self, engine):
        self.engine = engine

    def backward(self, loss, **kwargs):
        # runs backpropagation and handles mixed precision
        self.engine.backward(loss, **kwargs)

        # Deepspeed's `engine.step` performs the following operations:
        # - gradient accumulation check
        # - gradient clipping
        # - optimizer step
        # - zero grad
        # - checking overflow
        # - lr_scheduler step (only if engine.lr_scheduler is not None)
        self.engine.step()
        # and this plugin overrides the above calls with no-ops when Accelerate runs under
        # Deepspeed, but allows normal functionality for non-Deepspeed cases thus enabling a simple
        # training loop that works transparently under many training regimes.

My question is: Why do we need to do self.engine.step() here immediately? This behavior zeros grad and change the parameter without noticing the user. It might be out of expectation. Since backward step is internally binded with zeroing grad and changing parameter, this blocks users from checking the gradient or parameter manually before stepping.

I know deepspeed-wrapped models can't be seen as normal models, but this behavior still elimiates a lot of flexibility.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-08-22T15:07:36Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

muellerzr · 2024-09-06T17:52:02Z

Hitting this snag right now actually, will see what we decide to do

muellerzr · 2024-09-10T15:26:17Z

We're working with the DS team to try and remove the engine entirely, however as a user you can always call model.engine.backward() etc manually without harm in accelerate

nom · 2024-09-23T22:45:59Z

Somehow

loss = loss / accelerator.gradient_accumulation_steps
accelerator.deepspeed_engine_wrapped.engine.backward(loss)

does not give equivalent results to
accelerator.backward(loss)

with deepspeed. What gives?

github-actions · 2024-10-18T15:07:24Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

LinB203 · 2024-10-27T10:01:06Z

same question here.

github-actions · 2024-11-20T15:07:12Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

GreenWindow1997 · 2024-11-22T03:26:53Z

same question here. It's indeed very hard to understand why it was designed this way. I believe accelerator.backward(loss) should only perform the backward operation, and other steps should be written outside this function in a more standard and understandable manner.

github-actions · 2024-12-16T15:07:46Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

LinB203 · 2024-12-16T15:09:26Z

Not stale.

github-actions · 2025-01-10T15:06:44Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Nirmal-Adhikari-hub · 2025-02-07T08:49:56Z

self.engine.module.named_parameters()
might have the gradients you are looking for. I am not sure about this as I am not the person who knows enough to answer these esoteric questions but I also went through a similar problem of wanting to observe the gradients of the model after the backward pass and found out this solution which worked for me.

I hope it helps and if it did glad to be of any help to you.

Happy Coding!

muellerzr · 2025-02-11T17:53:29Z

So for a bit more context, we've been waiting on this PR to happen, so hopefully we can give more flexibility soon: deepspeedai/DeepSpeed#7018

zhc7 changed the title ~~DeepSpeedEngineWrapper.backword() does a bit too much~~ DeepSpeedEngineWrapper.backward() does a bit too much Jul 22, 2024

shouyezhe mentioned this issue Oct 24, 2024

Unable to access model gradients with DeepSpeed and Accelerate #3184

Closed

4 tasks

github-actions bot closed this as completed Jan 20, 2025

muellerzr reopened this Feb 11, 2025

muellerzr added enhancement New feature or request feature request Request for a new feature to be added to Accelerate labels Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSpeedEngineWrapper.backward() does a bit too much #2951

DeepSpeedEngineWrapper.backward() does a bit too much #2951

zhc7 commented Jul 22, 2024

github-actions bot commented Aug 22, 2024

muellerzr commented Sep 6, 2024

muellerzr commented Sep 10, 2024

nom commented Sep 23, 2024 •

edited

Loading

github-actions bot commented Oct 18, 2024

LinB203 commented Oct 27, 2024

github-actions bot commented Nov 20, 2024

GreenWindow1997 commented Nov 22, 2024

github-actions bot commented Dec 16, 2024

LinB203 commented Dec 16, 2024

github-actions bot commented Jan 10, 2025

Nirmal-Adhikari-hub commented Feb 7, 2025

muellerzr commented Feb 11, 2025

DeepSpeedEngineWrapper.backward() does a bit too much #2951

DeepSpeedEngineWrapper.backward() does a bit too much #2951

Comments

zhc7 commented Jul 22, 2024

github-actions bot commented Aug 22, 2024

muellerzr commented Sep 6, 2024

muellerzr commented Sep 10, 2024

nom commented Sep 23, 2024 • edited Loading

github-actions bot commented Oct 18, 2024

LinB203 commented Oct 27, 2024

github-actions bot commented Nov 20, 2024

GreenWindow1997 commented Nov 22, 2024

github-actions bot commented Dec 16, 2024

LinB203 commented Dec 16, 2024

github-actions bot commented Jan 10, 2025

Nirmal-Adhikari-hub commented Feb 7, 2025

muellerzr commented Feb 11, 2025

nom commented Sep 23, 2024 •

edited

Loading