Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[torch.compile] fast inductor #11108
[torch.compile] fast inductor #11108
Changes from 13 commits
02876a0
de61c66
d59d2d8
9f599cc
fc0f60f
3a8e678
9515e63
82d60e6
84aa0b3
56a03c7
e3f0a14
51a1efb
b20435c
c41a8d4
eb0dba2
f48a4f6
955989c
4f1c4a0
6c325d9
37d744d
516db43
28b98fb
2a7f729
8104cfa
75cf1f5
75da0b6
d7946ab
ee60692
5c5eb2b
b346bd9
f58b566
a264175
76fcc99
59365c4
aacf7c8
37829dd
c4bc393
4b31b4f
40f1355
c4478c8
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried to place this class into
vllm.compilation.backends
, but then needs to be lazily imported, andpydantic
will complain.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put into a separate file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be better if we also save a serialized form of the config, but we need to design the serialized format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which config is not serializable? Isn't
CompilationConfig
serializable?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is serializable, but i want a human-readable form, so that we can also manually check the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I missed some quantization args that can affect model execution, but I don't know how to pull out all factors that affect quantization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vLLM version? We can add the git SHA to the key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be a large source of potential bugs so definitely should be careful here. Most quantization related stuff from NM goes in the
model_config
but there's a lot of arguments toLLM
that can affect things likedtype
andquantization
. Are these in the key already?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not yet. that's why I want to ask for reviews.
one direction is we consider all factors affecting compilation, and we can use compilation cache by default.
another approach is we don't cache by default, but tell user the cache directory, and users can specify the cache directory if they know nothing changed.
which one would you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should always check the known factors when we cache, and expose an accessible switch for enabling/disabling caching. And then it's less important whether it's on by default or not. And for that decision @robertgshaw2-neuralmagic should chime in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more factors to consider in 2a7f729 . Let me know if I miss anything.