[torch.compile] use depyf to dump torch.compile internals #10972

youkaichao · 2024-12-07T00:20:10Z

depyf https://github.com/thuml/depyf is a library ( I maintain ) that is dedicated for dumping all sorts of internals from torch.compile .

with this pr, run vllm serve meta-llama/Meta-Llama-3-8B -O '{"level": 3, "inductor_specialize_for_cudagraph_no_more_than": 1, "debug_dump_path": "depyf_dump"}' , and we can get:

tree depyf_dump/rank_0 
depyf_dump/rank_0
├── __compiled_fn_1.AFTER_POST_GRAD.0.py
├── __compiled_fn_1.AFTER_POST_GRAD.1.py
├── __compiled_fn_1.AFTER_POST_GRAD.2.py
├── __compiled_fn_1.after_split.0.py
├── __compiled_fn_1.before_split.0.py
├── __compiled_fn_1.Captured_Graph.0.py
├── __compiled_fn_1.Forward_graph.0.py
├── __compiled_fn_1.Forward_graph.1.py
├── __compiled_fn_1.Forward_graph.2.py
├── __compiled_fn_1.Forward_graph.3.py
├── __compiled_fn_1.Forward_graph.4.py
├── __compiled_fn_1.Forward_graph.5.py
├── __compiled_fn_1.kernel_0.py
├── __compiled_fn_1.kernel_10.py
├── __compiled_fn_1.kernel_11.py
├── __compiled_fn_1.kernel_12.py
├── __compiled_fn_1.kernel_13.best_config
├── __compiled_fn_1.kernel_13.py
├── __compiled_fn_1.kernel_14.best_config
├── __compiled_fn_1.kernel_14.py
├── __compiled_fn_1.kernel_15.best_config
├── __compiled_fn_1.kernel_15.py
├── __compiled_fn_1.kernel_16.best_config
├── __compiled_fn_1.kernel_16.py
├── __compiled_fn_1.kernel_17.py
├── __compiled_fn_1.kernel_18.best_config
├── __compiled_fn_1.kernel_18.py
├── __compiled_fn_1.kernel_19.best_config
├── __compiled_fn_1.kernel_19.py
├── __compiled_fn_1.kernel_1.py
├── __compiled_fn_1.kernel_20.best_config
├── __compiled_fn_1.kernel_20.py
├── __compiled_fn_1.kernel_21.best_config
├── __compiled_fn_1.kernel_21.py
├── __compiled_fn_1.kernel_22.best_config
├── __compiled_fn_1.kernel_22.py
├── __compiled_fn_1.kernel_23.best_config
├── __compiled_fn_1.kernel_23.py
├── __compiled_fn_1.kernel_24.best_config
├── __compiled_fn_1.kernel_24.py
├── __compiled_fn_1.kernel_25.best_config
├── __compiled_fn_1.kernel_25.py
├── __compiled_fn_1.kernel_26.py
├── __compiled_fn_1.kernel_27.py
├── __compiled_fn_1.kernel_28.py
├── __compiled_fn_1.kernel_29.py
├── __compiled_fn_1.kernel_2.best_config
├── __compiled_fn_1.kernel_2.py
├── __compiled_fn_1.kernel_30.py
├── __compiled_fn_1.kernel_31.py
├── __compiled_fn_1.kernel_32.py
├── __compiled_fn_1.kernel_33.py
├── __compiled_fn_1.kernel_34.py
├── __compiled_fn_1.kernel_35.py
├── __compiled_fn_1.kernel_36.py
├── __compiled_fn_1.kernel_37.py
├── __compiled_fn_1.kernel_38.py
├── __compiled_fn_1.kernel_39.py
├── __compiled_fn_1.kernel_3.best_config
├── __compiled_fn_1.kernel_3.py
├── __compiled_fn_1.kernel_40.py
├── __compiled_fn_1.kernel_41.py
├── __compiled_fn_1.kernel_42.py
├── __compiled_fn_1.kernel_43.py
├── __compiled_fn_1.kernel_44.py
├── __compiled_fn_1.kernel_45.py
├── __compiled_fn_1.kernel_46.py
├── __compiled_fn_1.kernel_47.py
├── __compiled_fn_1.kernel_48.py
├── __compiled_fn_1.kernel_49.py
├── __compiled_fn_1.kernel_4.py
├── __compiled_fn_1.kernel_50.py
├── __compiled_fn_1.kernel_51.py
├── __compiled_fn_1.kernel_52.py
├── __compiled_fn_1.kernel_53.py
├── __compiled_fn_1.kernel_54.py
├── __compiled_fn_1.kernel_55.py
├── __compiled_fn_1.kernel_56.py
├── __compiled_fn_1.kernel_57.best_config
├── __compiled_fn_1.kernel_57.py
├── __compiled_fn_1.kernel_5.py
├── __compiled_fn_1.kernel_6.best_config
├── __compiled_fn_1.kernel_6.py
├── __compiled_fn_1.kernel_7.py
├── __compiled_fn_1.kernel_8.best_config
├── __compiled_fn_1.kernel_8.py
├── __compiled_fn_1.kernel_9.best_config
├── __compiled_fn_1.kernel_9.py
├── __compiled_fn_1.post_split_module.0.py
├── __compiled_fn_1.pre_insert_deferred_runtime_asserts___compiled_fn_1.0.py
├── __compiled_fn_1.pre_split_module.0.py
├── full_code_for_forward_0.py
└── __transformed_code_0_for_forward.py

it includes the dynamo compiled bytecode, inductor generated kernels, max-autotune configs, graph module during all transformations, etc.

Signed-off-by: youkaichao <[email protected]>

github-actions · 2024-12-07T00:20:22Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: youkaichao <[email protected]>

ProExpertProg · 2024-12-07T05:38:33Z

How does depyf know to output after every pass? If it's not automatic, maybe we could add an annotation that can be added to the __call__ function so the graph is printed. And still provide a way to print mid-pass (for fusion, there's a two-step process so printing in the middle helps).

Finally, could you post an example output file from depyf?

youkaichao · 2024-12-07T06:02:54Z

How does depyf know to output after every pass?

it hooks the lazy_format_graph_code function. so we just need to call it after every pass. NOTE: just lazy_format_graph_code, not logger.debug("%s", lazy_format_graph_code("before split", self.graph)).

could you post an example output file from depyf?

see https://drive.google.com/drive/folders/1gBVguoCbCGKpKb8SveKrbZkIhiZe9jeS?usp=sharing

ProExpertProg

This looks good! If we could reduce the verbosity of the output that would be nice but happy to deal with that later

mergify · 2024-12-11T15:51:50Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @youkaichao.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

youkaichao · 2024-12-11T16:38:21Z

yeah we can further organize the output in the future, to debug and improve the performance of inductor.

Signed-off-by: youkaichao <[email protected]>

…ct#10972) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Akshat Tripathi <[email protected]>

…ct#10972) Signed-off-by: youkaichao <[email protected]>

youkaichao added 10 commits December 6, 2024 11:08

pipe through init_backend

ce2ae89

Signed-off-by: youkaichao <[email protected]>

pipe through VllmBackend

00a1b04

Signed-off-by: youkaichao <[email protected]>

pipe through PiecewiseCompileInterpreter

e9f23e6

Signed-off-by: youkaichao <[email protected]>

pipe through PiecewiseBackend

08b0a36

Signed-off-by: youkaichao <[email protected]>

pipe through start_monitoring_torch_compile

887e967

Signed-off-by: youkaichao <[email protected]>

pipe through end_monitoring_torch_compile

0d3d9f9

Signed-off-by: youkaichao <[email protected]>

add depyf

bde6bb3

Signed-off-by: youkaichao <[email protected]>

fix typo

06d8128

Signed-off-by: youkaichao <[email protected]>

fix typo

e4caec5

Signed-off-by: youkaichao <[email protected]>

add depyf dependency

71f63c6

Signed-off-by: youkaichao <[email protected]>

mergify bot added the ci/build label Dec 7, 2024

youkaichao mentioned this pull request Dec 7, 2024

[V1] Multiprocessing Tensor Parallel Support for v1 #9856

Merged

use pypi

80c6f74

Signed-off-by: youkaichao <[email protected]>

ProExpertProg approved these changes Dec 11, 2024

View reviewed changes

mergify bot added the needs-rebase label Dec 11, 2024

Merge branch 'main' into use_depyf

cdda6c3

add

a019f6c

Signed-off-by: youkaichao <[email protected]>

mergify bot removed the needs-rebase label Dec 11, 2024

fix

da97198

Signed-off-by: youkaichao <[email protected]>

youkaichao merged commit 91642db into vllm-project:main Dec 11, 2024
20 of 23 checks passed

youkaichao deleted the use_depyf branch December 11, 2024 18:43

youkaichao mentioned this pull request Dec 11, 2024

[torch.compile] remove graph logging in ci #11110

Merged

Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Dec 12, 2024

[torch.compile] use depyf to dump torch.compile internals (vllm-proje…

1982718

…ct#10972) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Akshat Tripathi <[email protected]>

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[torch.compile] use depyf to dump torch.compile internals (vllm-proje…

782521e

…ct#10972) Signed-off-by: youkaichao <[email protected]>

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[torch.compile] use depyf to dump torch.compile internals (vllm-proje…

1e627f6

…ct#10972) Signed-off-by: youkaichao <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torch.compile] use depyf to dump torch.compile internals #10972

[torch.compile] use depyf to dump torch.compile internals #10972

youkaichao commented Dec 7, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 7, 2024

ProExpertProg commented Dec 7, 2024

youkaichao commented Dec 7, 2024

ProExpertProg left a comment

mergify bot commented Dec 11, 2024

youkaichao commented Dec 11, 2024

[torch.compile] use depyf to dump torch.compile internals #10972

[torch.compile] use depyf to dump torch.compile internals #10972

Conversation

youkaichao commented Dec 7, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 7, 2024

ProExpertProg commented Dec 7, 2024

youkaichao commented Dec 7, 2024

ProExpertProg left a comment

Choose a reason for hiding this comment

mergify bot commented Dec 11, 2024

youkaichao commented Dec 11, 2024

youkaichao commented Dec 7, 2024 •

edited by github-actions bot

Loading