Enable torch.compile with ZeRO (Experimental) #4878

tohtana · 2023-12-28T02:30:00Z

This PR enables torch.compile with ZeRO stages 1/2/3. You need to add compile section in your DeepSpeed config. The fields in the section are passed to torch.compile.

  "compile": {
    "disable": false,
    "backend": "inductor"
  }

To enable a custom backend, you can pass the fully qualified name of the backend function. For example, if you have a backend class my_backend in my_backend.py in the current directory, you can enable it by "backend": "my_backend.my_backend". You can find an example in a unit test.

Currently we validated the results with Megatron-DeepSpeed. See the example for the details.

NOTICE: This PR is a draft. We will need to validate the coverage and accuracy with many more examples.

tjruwase · 2023-12-30T13:58:05Z

@stas00, FYI

stas00 · 2023-12-31T00:46:49Z

Amazing work, @tohtana! I'm looking forward to trying it out

Here is a quick feedback:

Could we please flip disable to enabled so that the logic is consistent with other config values?

no double negation logic
consistent enabled (and not enable) - as all other config sections use that name.

stas00 · 2024-01-03T03:37:10Z

tried it out and the compiled engine doesn't seem to forward some (all?) custom methods to the unwrapped model, e.g. it's failing:

[28:7]:  File "/data/env/lib/repos/retro-llama/tr043-dawn-llama-3/DeepSpeed/deepspeed/runtime/engine.py", line 468, in __getattr__
[28:7]:    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
[28:7]:AttributeError: 'DeepSpeedEngine' object has no attribute 'get_model_tflops_per_batch_per_gpu'

get_model_tflops_per_batch_per_gpu is a normal model's attribute and the same setup works if I set "disable": true for the compile section.

This method is just part of the normal model.

stas00 · 2024-01-03T03:58:28Z

I hacked around it via model.module.method... and then I get many warnings and errors with the inductor backend and then it fails. I have attached the log.

This is just training Llama-2 on a single node using Accelerate with torch-nightly from last night.

The llama model is the same as HF Transformers with some additional methods. https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py

ds-compile.txt

stas00 · 2024-01-03T04:05:18Z

If I disable the ds profiler than it runs despite the compilation errors/warnings - same log as in the previous comment, other than the last traceback where it crashes.

stas00 · 2024-01-03T04:25:34Z

I'm also observing a very strange behavior of performance cycling:

the tflops go like this per iteration: 196, 196, 192, 196, 196, 192, 196, 196, 192, - 2 fast one slower - very exactly

w/o compile it was a consistent 194.

so this tells me something gets recompiled every 3 iterations.

tohtana · 2024-01-05T01:23:37Z

@stas00 Thank you for your feedback! This PR is still experimental. Let me address the issues one by one.

The configuration disable is what I specifically sought feedback on. Currently, all configuration items under compile are passed to torch.compile, which accepts disable, not enable. This design was chosen for its simplicity, given the uncertainty of future changes in torch.compile. But we can define enable and flip it before passing it to torch.compile.

Do you have any further comments on this? If not, I will switch it to enable as you suggested. Actually, it is also my personal preference.

stas00 · 2024-01-05T01:32:46Z

That's totally understandable, Masahiro. Tunji made that clear when he tagged me. If it's too early to provide feedback please ping me when you're ready for it.

disable vs enabled:

Ideally, Deepspeed users will never need to know anything about torch.compile specifics - many frameworks integrate this feature w/o having the user interact with it directly. So its API doesn't have to impact Deepspeed's API.

Since most (all?) Deepspeed config sections use enabled I'd say it'd be the most consistent to continue with that convention.

But this is an opinion of a single person, so please seek out opinions of others.

tohtana · 2024-01-05T01:44:31Z

@stas00 Thank you for your quick reply. Probably it is difficult to have a clear conclusion for now. I will simply switch it to enable. Otherwise, many other users would have the same question as yours.
For a clearer answer, we need more experience to know what options DeepSpeed's users need in their applications. Even the options of torch.compile may change.

stas00 · 2024-01-05T02:00:24Z

please note that it's enabled that DS uses everywhere else and not enable
wrt other options I'd say - use the minimal amount of options -

let's perhaps start with only backend and then pick the most sensible defaults for that option.
Then provide a user an API where they can preset their own **torch_compile_kwargs that will be passed to torch.compile - that way you're future proofing the Deepspeed API while allowing torch to do what they please - deepspeed will sync with the future changes to keep up with the sensible defaults and power-users should always be able to override the defaults.

deepspeed_engine.set_torch_compile_kwargs(**kwargs)

2a. I don't know if the current config file allows for a not predefined dict, so perhaps this could be possible:

  "compile": {
    "enabled": true,
    "backend": "inductor",
    "kwargs": {"key1"=value, "key2"=value}
  }

this should definitely work:

  "compile": {
    "enabled": true,
    "backend": "inductor",
    "kwargs": "key1=value;key2=value"
  }

but I don't know if all torch.compile kwargs could be stringified

but providing a programmatical API for power users would be the most fool-proof:

Tests running older version of torch will fail the compile tests added in #4878.

This PR enables `torch.compile` with ZeRO stages 1/2/3. You need to add `compile` section in your DeepSpeed config. The fields in the section are passed to `torch.compile`. ```json "compile": { "disable": false, "backend": "inductor" } ``` To enable a custom backend, you can pass the fully qualified name of the backend function. For example, if you have a backend class `my_backend` in `my_backend.py` in the current directory, you can enable it by `"backend": "my_backend.my_backend"`. You can find an example in [a unit test](https://github.com/microsoft/DeepSpeed/blob/eb9d4e06e9596f391aea305a6a5c6ec70cc28b58/tests/unit/runtime/compile/test_config.py#L116). Currently we validated the results with Megatron-DeepSpeed. See the [example](https://github.com/microsoft/Megatron-DeepSpeed/tree/tohtana/enable_compile/examples_deepspeed/compile) for the details. NOTICE: This PR is a draft. We will need to validate the coverage and accuracy with many more examples. --------- Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>

Tests running older version of torch will fail the compile tests added in microsoft#4878.

oraluben · 2024-02-18T09:44:01Z

deepspeed/runtime/compiler.py

+        return backend
+
+    elif isinstance(backend, str):
+        if backend in torch._dynamo.list_backends():


@tohtana The default list_backends call will exclude debug and experimental backends, e.g. eager. I think it's better to use list_backends(exclude_tags=()) here.

Thank you for the comment. I opened #5191.

As mentioned at #4878 (comment), we are currently unable to enable debug or experimental backends for the compiler. This PR enables users to utilize these backends.

As mentioned at microsoft#4878 (comment), we are currently unable to enable debug or experimental backends for the compiler. This PR enables users to utilize these backends.

This PR enables `torch.compile` with ZeRO stages 1/2/3. You need to add `compile` section in your DeepSpeed config. The fields in the section are passed to `torch.compile`. ```json "compile": { "disable": false, "backend": "inductor" } ``` To enable a custom backend, you can pass the fully qualified name of the backend function. For example, if you have a backend class `my_backend` in `my_backend.py` in the current directory, you can enable it by `"backend": "my_backend.my_backend"`. You can find an example in [a unit test](https://github.com/microsoft/DeepSpeed/blob/eb9d4e06e9596f391aea305a6a5c6ec70cc28b58/tests/unit/runtime/compile/test_config.py#L116). Currently we validated the results with Megatron-DeepSpeed. See the [example](https://github.com/microsoft/Megatron-DeepSpeed/tree/tohtana/enable_compile/examples_deepspeed/compile) for the details. NOTICE: This PR is a draft. We will need to validate the coverage and accuracy with many more examples. --------- Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>

Tests running older version of torch will fail the compile tests added in microsoft#4878.

As mentioned at microsoft#4878 (comment), we are currently unable to enable debug or experimental backends for the compiler. This PR enables users to utilize these backends.

tohtana added 6 commits December 14, 2023 17:01

add option to run torch.compile

91c01ab

improve compile helper

49c7acd

fix compile wrapper to make modules() work

bbeb38c

add torch.compiler-disable to comm module

719cc79

move options for torch.compile to ds config

3ea9b44

Merge branch 'master' into tohtana/compile-zero

eb9d4e0

This was referenced Dec 28, 2023

[REQUEST] torch.compile + DeepSpeed #4677

Open

Enable torch.compile microsoft/Megatron-DeepSpeed#322

Draft

tohtana added 5 commits December 27, 2023 19:52

rename module and wrap decorator

fbae027

fix validation of compile config

4f8f86d

avoid reference to torch._dynamo when torch has no support

d83963b

fix custom backend for test

6920ab6

fix validation

c3429a6

tohtana and others added 7 commits January 10, 2024 02:38

refactor config for torch.compile

bfafb88

Merge branch 'master' into tohtana/compile-zero

3c13fd4

Merge branch 'master' into tohtana/compile-zero

9e63c95

fix validation of compiler config

ff9c1ef

fix access to wrapped model

26b7f25

add test for api to set torch compile options

48d2453

rename util module

d5584b0

tohtana and others added 7 commits February 2, 2024 14:54

Merge branch 'tohtana/z3_moe_bwd' into tohtana/compile-zero

0637968

Merge branch 'master' into tohtana/compile-zero

7131d6e

fix exception used in test

96c8647

increse tolerance in tests

0b3dae9

add check for bf16

94cc97a

enable accelerator check for bf16

eb27b9d

Merge branch 'master' into tohtana/compile-zero

c1e10e5

tjruwase approved these changes Feb 5, 2024

View reviewed changes

mrwyattii added 2 commits February 5, 2024 13:38

update DistributedTest to work with torch.compile tests

c2ba829

remove unused global

82f80d1

mrwyattii approved these changes Feb 5, 2024

View reviewed changes

Merge branch 'master' into tohtana/compile-zero

cb540d3

tohtana enabled auto-merge February 6, 2024 04:34

tohtana added this pull request to the merge queue Feb 6, 2024

Merged via the queue into master with commit c3cfe96 Feb 6, 2024
14 checks passed

mrwyattii mentioned this pull request Feb 12, 2024

disable compile tests for torch<2.1 #5121

Merged

mrwyattii added a commit that referenced this pull request Feb 12, 2024

disable compile tests for torch<2.1 (#5121)

d67d4e5

Tests running older version of torch will fail the compile tests added in #4878.

mauryaavinash95 pushed a commit to mauryaavinash95/DeepSpeed that referenced this pull request Feb 17, 2024

disable compile tests for torch<2.1 (microsoft#5121)

2924630

Tests running older version of torch will fail the compile tests added in microsoft#4878.

oraluben mentioned this pull request Feb 18, 2024

Support deepspeed dynamo huggingface/accelerate#2460

Closed

5 tasks

oraluben reviewed Feb 18, 2024

View reviewed changes

tohtana mentioned this pull request Feb 26, 2024

allow debug/experimental compiler backends #5191

Merged

This was referenced Apr 17, 2024

[REQUEST] Add torchdynamo disable decorators to graph-break on collectives #3150

Open

Skip deepspeed and triton in dynamo pytorch/pytorch#124273

Closed

rraminen pushed a commit to ROCm/DeepSpeed that referenced this pull request May 9, 2024

disable compile tests for torch<2.1 (microsoft#5121)

09d09b0

Tests running older version of torch will fail the compile tests added in microsoft#4878.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable torch.compile with ZeRO (Experimental) #4878

Enable torch.compile with ZeRO (Experimental) #4878

tohtana commented Dec 28, 2023

tjruwase commented Dec 30, 2023

stas00 commented Dec 31, 2023

stas00 commented Jan 3, 2024 •

edited

Loading

stas00 commented Jan 3, 2024 •

edited

Loading

stas00 commented Jan 3, 2024

stas00 commented Jan 3, 2024 •

edited

Loading

tohtana commented Jan 5, 2024

stas00 commented Jan 5, 2024 •

edited

Loading

tohtana commented Jan 5, 2024

stas00 commented Jan 5, 2024 •

edited

Loading

oraluben Feb 18, 2024

tohtana Feb 26, 2024

Enable torch.compile with ZeRO (Experimental) #4878

Enable torch.compile with ZeRO (Experimental) #4878

Conversation

tohtana commented Dec 28, 2023

tjruwase commented Dec 30, 2023

stas00 commented Dec 31, 2023

stas00 commented Jan 3, 2024 • edited Loading

stas00 commented Jan 3, 2024 • edited Loading

stas00 commented Jan 3, 2024

stas00 commented Jan 3, 2024 • edited Loading

tohtana commented Jan 5, 2024

stas00 commented Jan 5, 2024 • edited Loading

tohtana commented Jan 5, 2024

stas00 commented Jan 5, 2024 • edited Loading

oraluben Feb 18, 2024

Choose a reason for hiding this comment

tohtana Feb 26, 2024

Choose a reason for hiding this comment

stas00 commented Jan 3, 2024 •

edited

Loading

stas00 commented Jan 3, 2024 •

edited

Loading

stas00 commented Jan 3, 2024 •

edited

Loading

stas00 commented Jan 5, 2024 •

edited

Loading

stas00 commented Jan 5, 2024 •

edited

Loading