Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

after torch compile with 0.2.0, speed is become very slow #727

Open
MichoChan opened this issue Jan 9, 2025 · 6 comments
Open

after torch compile with 0.2.0, speed is become very slow #727

MichoChan opened this issue Jan 9, 2025 · 6 comments

Comments

@MichoChan
Copy link

No description provided.

@yzh119
Copy link
Collaborator

yzh119 commented Jan 9, 2025

Yes, that is observed in #709, you can cherry pick the changes there.

@MichoChan
Copy link
Author

MichoChan commented Jan 9, 2025

i test, but that hotfix can break graph, when compile using fullgraph, and if not use fullgraph, the compile will succees,and the speed is ok,but i need use fullgraph when using vllm compile fast inductor

@youkaichao
Copy link

@MichoChan how do you use vllm with torch.compile?

@MichoChan
Copy link
Author

@MichoChan how do you use vllm with torch.compile?

torch compile in vllm is ok, but when i use vllm compilation impl in my framework,my model code would lead to graph break when compile,then this assert not self._called, "VllmBackend can only be called once", and i use fullgraph with flashinfer 0.20.0.

i find vllm already use custom op register with attention for torch compile, i use this same method with flashinfer 0.16.0, then everything is ok for me.

so flashinfer 0.20.0 can't use torch compile full graph

@yzh119
Copy link
Collaborator

yzh119 commented Jan 10, 2025

so flashinfer 0.20.0 can't use torch compile full graph

Can you explain this? I don't see why fullgraph work for v0.1.6 but not for v0.2.0

@MichoChan
Copy link
Author

MichoChan commented Jan 10, 2025

@MichoChan how do you use vllm with torch.compile?

so flashinfer 0.20.0 can't use torch compile full graph

Can you explain this? I don't see why fullgraph work for v0.1.6 but not for v0.2.0

sorry, not 0.20.0, is the master with #709 which cant use fullgraph, i test, and find the #709 could break graph, BatchDecodeMlaWithPagedKVCacheWrapper.run break graph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants