[torchao float8tensor] #1415

crcrpar · 2024-11-08T12:08:18Z

What does this PR do?

Improve the tensor subclass support of #1394 for TorchAo float8.

note: pytorch/ao#1339 is needed

my environment

torch: 2.6.0a0+git62eea62
nvfuser: 0.2.23+gitbb05859
torchao: 0.7.0+gitb2e42ff6
CUDA device: RTX 6000 Ada Generation
Driver Version: 560.35.03
CUDA Version: 12.6

t-vi · 2024-11-25T15:19:50Z

@crcrpar if you merge main, the pt nightly distributed ci tests should be fixed.

crcrpar · 2024-11-28T12:13:50Z

thunder/__init__.py

This change should be in #1394

crcrpar · 2024-11-28T12:14:16Z

thunder/core/jit_ext.py

@@ -637,7 +637,7 @@ def _convert_pytorchfunc_to_thundertrace(
    trace = TraceCtx()
    trace.bound_symbols.extend(active_jit_ctx.computation_trace.pop_scope())
    func_result = unwrap(wrapped_func_result)
-    if shallow_copy_output:
+    if shallow_copy_output and not trace.bound_symbols:


crcrpar · 2024-11-28T12:14:32Z

thunder/core/jit_ext.py

+
+    added_bsym: BoundSymbol = get_jit_ctx().computation_trace.scopes[-1][-1]
+    import_ctx, call_ctx, object_ctx = {}, {}, {}
+    for bsym in trace_of_fwd.bound_symbols:
+        cur_import_ctx, cur_call_ctx, cur_object_ctx = bsym.gather_ctxs()
+        import_ctx.update(cur_import_ctx)
+        call_ctx.update(cur_call_ctx)
+        object_ctx.update(cur_object_ctx)
+
+    if import_ctx:
+        added_bsym._import_ctx.update(import_ctx)
+    if call_ctx:
+        if added_bsym._call_ctx is not None:
+            added_bsym._call_ctx.update(call_ctx)
+        else:
+            added_bsym._call_ctx = call_ctx
+    if object_ctx:
+        added_bsym._object_ctx.update(object_ctx)


should be in #1394

crcrpar · 2024-11-28T12:15:09Z

thunder/core/prims.py

This change should also be in #1394

crcrpar · 2024-11-28T12:15:50Z

thunder/executors/torch_autograd.py

should be in #1394

crcrpar · 2024-11-28T13:13:26Z

thunder/tests/test_tensor_subclass.py

+    if executor == DynamoThunderExecutor:
+        with pytest.raises(AssertionError):
+            torch.testing.assert_close(actual, expected)


This failure doesn't feel easy to fix to me. So I made this into a script:

import torch import torch.nn as nn from torchao.float8 import convert_to_float8_training from thunder.dynamo import ThunderCompiler from thunder.dynamo.splitter import SubgraphInfo from thunder.tests.make_tensor import make_tensor def main(): batch_size, in_features, out_features = 16, 32, 64 device = torch.device("cuda") dtype = torch.float32 model = nn.Linear(in_features, out_features, bias=False, device=device, dtype=dtype) fp8_model = convert_to_float8_training(model) x = make_tensor((batch_size, in_features), device=device, dtype=dtype) expected = fp8_model(x) backend = ThunderCompiler() jitted = torch.compile(fp8_model, backend=backend) actual = jitted(x) backend.save_reproducer_to_folder("./debug_torchao_with_thunderfx", use_pytest_benchmark=True) print(f"{len(backend.subgraph_infos) = }") subgraph: SubgraphInfo for subgraph in backend.subgraph_infos: print(f"# {len(subgraph.thunder_compiled_fns) = }") torch.testing.assert_close(actual, expected) if __name__ == "__main__": main()

note that pytorch/ao#1339 is needed at the moment.

Below, I put the console output of the script above:

% python debug_thunderfx_torchao_fp8.py /home/mkozuki/ghq/github.com/Lightning-AI/lightning-thunder/thunder/dynamo/compiler.py:21: UserWarning: The ThunderCompiler is in active development and may not work as expected. Please report any issues you encounter to the Lightning Thunder team. warnings.warn( len(backend.subgraph_infos) = 1 # len(subgraph.thunder_compiled_fns) = 0 Traceback (most recent call last): File "/home/mkozuki/ghq/github.com/Lightning-AI/lightning-thunder/debug_thunderfx_torchao_fp8.py", line 34, in <module> main() File "/home/mkozuki/ghq/github.com/Lightning-AI/lightning-thunder/debug_thunderfx_torchao_fp8.py", line 30, in main torch.testing.assert_close(actual, expected) File "/home/mkozuki/ghq/github.com/crcrpar/pytorch/torch/testing/_comparison.py", line 1530, in assert_close raise error_metas[0].to_error(msg) AssertionError: Tensor-likes are not close! Mismatched elements: 388 / 1024 (37.9%) Greatest absolute difference: 0.18639898300170898 at index (1, 61) (up to 1e-05 allowed) Greatest relative difference: 1.9664803743362427 at index (10, 33) (up to 1.3e-06 allowed)

So it seems that thunder.jit isn't used for this program but the numeric is diverging.

Can you check the result to see if they stay the same between different invocations. (Maybe due to low precision, the results could be different).

expected = fp8_model(x) actual = fp8_model(x) torch.testing.assert_close(actual, expected)

But please add a comment why expected and actual are both from calling the same model rather than one model and a reference.

For thunderfx, I updated the test to check the parity between inductor and ThunderCompiler, not eager and ThunderFX.

crcrpar · 2024-12-11T13:25:57Z

thunder/tests/test_tensor_subclass.py

+    # TODO(crcrpar): Think of how to push tensor subclasses to `thunder.jit`.
+    # Currently no subgraphs go to thunder.jit.
+    if is_thunderfx:
+        for subgraph in backend.subgraph_infos:
+            if not bias and dtype == thunder.core.dtypes.bfloat16:
+                assert not subgraph.thunder_compiled_fns
+            else:
+                assert subgraph.thunder_compiled_fns


I feel #1539 related in that both of them somehow mistakenly push things to the fallback, not thunder.jit.

Signed-off-by: Masaki Kozuki <[email protected]>

next, function with tensor creation in it Signed-off-by: Masaki Kozuki <[email protected]>

Signed-off-by: Masaki Kozuki <[email protected]>

Signed-off-by: Masaki Kozuki <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Masaki Kozuki <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Masaki Kozuki <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Masaki Kozuki <[email protected]>

between torch and thunder proxy Signed-off-by: Masaki Kozuki <[email protected]>

since the outputs of subclass flattening would be replaceable with the args of ctor/unflatten of that subclass tensors. Signed-off-by: Masaki Kozuki <[email protected]>

Signed-off-by: Masaki Kozuki <[email protected]>

for more information, see https://pre-commit.ci

This comment was marked as outdated.

Sign in to view

crcrpar mentioned this pull request Nov 14, 2024

codeutils.to_printable does not seem capable of handling NamedTuple #1442

Closed

crcrpar force-pushed the crpa/subclass-tensor-ops branch from 3fa8e2d to d5fb9fe Compare November 19, 2024 06:41

crcrpar force-pushed the crpa/subclass-torchao_float8tensor branch from abf0167 to e7ca8b7 Compare November 21, 2024 03:14

This comment was marked as outdated.

Sign in to view

crcrpar force-pushed the crpa/subclass-torchao_float8tensor branch 2 times, most recently from 896b631 to 316327f Compare November 24, 2024 16:13

crcrpar force-pushed the crpa/subclass-tensor-ops branch from d5fb9fe to 15c8d12 Compare November 26, 2024 07:22

crcrpar force-pushed the crpa/subclass-torchao_float8tensor branch from c87a36c to 0de44ee Compare November 26, 2024 07:22

This was referenced Nov 27, 2024

check scale.ndim before applying t/transpose pytorch/ao#1339

Merged

Unrolling tensor subclasses in fwd/bwd split #1489

Merged

crcrpar commented Nov 28, 2024

View reviewed changes

crcrpar force-pushed the crpa/subclass-tensor-ops branch from 15c8d12 to 70dc6ba Compare November 28, 2024 12:31

crcrpar force-pushed the crpa/subclass-torchao_float8tensor branch from 04d528a to 804bc99 Compare November 28, 2024 12:32

crcrpar commented Nov 28, 2024

View reviewed changes

crcrpar mentioned this pull request Nov 30, 2024

internal assert failure I saw while working on Thunder NVIDIA/Fuser#3498

Open

crcrpar force-pushed the crpa/subclass-torchao_float8tensor branch from 7c1fea6 to 8475ff7 Compare November 30, 2024 07:02

crcrpar force-pushed the crpa/subclass-tensor-ops branch from 70dc6ba to fc6d8a9 Compare December 7, 2024 07:22

crcrpar force-pushed the crpa/subclass-torchao_float8tensor branch 2 times, most recently from ca3b5f7 to 2b30049 Compare December 9, 2024 09:28

crcrpar commented Dec 11, 2024

View reviewed changes

crcrpar force-pushed the crpa/subclass-tensor-ops branch from fc6d8a9 to ce3edbc Compare December 12, 2024 23:23

crcrpar added 6 commits December 13, 2024 08:23

add path of SubclassTensorProxy in tensorproxy

f2ef54d

Signed-off-by: Masaki Kozuki <[email protected]>

phase 1 for backward test

927f8e8

Signed-off-by: Masaki Kozuki <[email protected]>

check backward is runnable with subclass arguments

a22e253

next, function with tensor creation in it Signed-off-by: Masaki Kozuki <[email protected]>

bwd run with tensor creation inside of trace

b70dd8c

Signed-off-by: Masaki Kozuki <[email protected]>

flatten Function.apply of converter

7fe8e14

Signed-off-by: Masaki Kozuki <[email protected]>

torchao small test

b779d0f

Signed-off-by: Masaki Kozuki <[email protected]>

crcrpar and others added 23 commits December 13, 2024 08:24

xfail reason

385a2eb

Signed-off-by: Masaki Kozuki <[email protected]>

cosmetic

88982cb

Signed-off-by: Masaki Kozuki <[email protected]>

simplify subclass output handling

182d227

Signed-off-by: Masaki Kozuki <[email protected]>

Unrolling tensor subclasses in fwd/bwd split (#1489)

6c33446

Signed-off-by: Masaki Kozuki <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

reduce return values by one

6e90fc3

Signed-off-by: Masaki Kozuki <[email protected]>

clarify the error is numeric

a1cea76

Signed-off-by: Masaki Kozuki <[email protected]>

add bfloat16 to test parametrization

f3c6cef

Signed-off-by: Masaki Kozuki <[email protected]>

torch_compile_ex style transform for execution

008021c

Signed-off-by: Masaki Kozuki <[email protected]>

update test

4163f01

Signed-off-by: Masaki Kozuki <[email protected]>

clarify nothing is put into thunder.jit when thunderfx

13409ef

Signed-off-by: Masaki Kozuki <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

b14e8c1

for more information, see https://pre-commit.ci

shorter header for torch dispatch result

36d6bbf

Signed-off-by: Masaki Kozuki <[email protected]>

try to tell if the trace is backward or not by checking certain bsyms

293af4d

Signed-off-by: Masaki Kozuki <[email protected]>

update test

fbc56fd

Signed-off-by: Masaki Kozuki <[email protected]>

check bsym.args itself before its first name

12cf478

Signed-off-by: Masaki Kozuki <[email protected]>

warn tensor subclass support

f42f41e

Signed-off-by: Masaki Kozuki <[email protected]>

test update

34dc785

Signed-off-by: Masaki Kozuki <[email protected]>

more meticulous bsym check to tell if the trace is bwd

e71c47b

Signed-off-by: Masaki Kozuki <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

2085e4d

for more information, see https://pre-commit.ci

remove flat_trace_args_spec

dfed7c5

Signed-off-by: Masaki Kozuki <[email protected]>

fix wrong rebase output

651c7b0

Signed-off-by: Masaki Kozuki <[email protected]>

fix typo

832bf79

Signed-off-by: Masaki Kozuki <[email protected]>

add tensor subclass transform output to traces

6b73636

Signed-off-by: Masaki Kozuki <[email protected]>

crcrpar force-pushed the crpa/subclass-torchao_float8tensor branch from 2b30049 to 6b73636 Compare December 12, 2024 23:25

crcrpar added 3 commits December 13, 2024 12:27

bring back unexpectedly deleted line

751227a

Signed-off-by: Masaki Kozuki <[email protected]>

add note about the behavioral difference

53d28a7

between torch and thunder proxy Signed-off-by: Masaki Kozuki <[email protected]>

DCE for tensor.__tensor_flatten__

97416e2

since the outputs of subclass flattening would be replaceable with the args of ctor/unflatten of that subclass tensors. Signed-off-by: Masaki Kozuki <[email protected]>

crcrpar mentioned this pull request Dec 13, 2024

add e5m2 & e4m3fn to nvfuserex_impl dtype map #1551

Open

crcrpar and others added 2 commits December 13, 2024 14:50

update regex of assert raises

4b7832c

Signed-off-by: Masaki Kozuki <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

574703e

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torchao float8tensor] #1415

[torchao float8tensor] #1415

crcrpar commented Nov 8, 2024 •

edited

Loading

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

t-vi commented Nov 25, 2024

crcrpar Nov 28, 2024

crcrpar Nov 28, 2024

crcrpar Nov 28, 2024

crcrpar Nov 28, 2024

crcrpar Nov 28, 2024

crcrpar Nov 28, 2024

kshitij12345 Nov 28, 2024

t-vi Nov 28, 2024

crcrpar Nov 29, 2024

crcrpar Dec 11, 2024

[torchao float8tensor] #1415

Are you sure you want to change the base?

[torchao float8tensor] #1415

Conversation

crcrpar commented Nov 8, 2024 • edited Loading

What does this PR do?

my environment

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

t-vi commented Nov 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crcrpar commented Nov 8, 2024 •

edited

Loading