-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[misc][optimization] optimize data structure in allocator #5968
base: main
Are you sure you want to change the base?
Conversation
what batch size is the perf optimization measured at? |
OK can we get @alexm-neuralmagic to review this? I want to make sure this optimization is compatible with #5602 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, comments are all about cleaning up the code / reducing coupling between interfaces and implementation.
Let's make sure it fits with @alexm-neuralmagic PR
num_blocks: int | ||
block_size: int | ||
block_index_start: int | ||
block_index_end: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's define these as @property
so implementations not coupled with the interface beyond the required bits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @property
is over used in the project. It also has runtime cost, and you need to keep a private self._num_blocks
. That's not good in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runtime cost of property is negligible if it helps to create a better/more maintainable API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think @property
is more maintainable here. In fact, I think it adds unnecessary complexity. We need to define self. _num_blocks
as well as @property def num_blocks
. If future class needs to define @property
, they can do it. But we don't need to do it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to prioritize code quality in the block manager because the software design requirements are so complex. Otherwise we will trend towards the V1 block manager and its problems -- untestable, hard to modify thinking about the exponential complexity as prefix caching x cow x swap x lookahead x sliding window are combined.
The concrete things @property
buys us here:
- We don't allow users to write to these fields (only read). This allows the implementations to trust their fields are not modified, making them easier to reason about. We don't want to have to closely review every open source PR that goes into the block manager to ensure people don't write to these fields.
- We allow future implementations to have different implementations of the same API. this can allow performance optimizations (like how this PR required minimal API changes) or different features. We shouldn't trade this off without good reason.
You are correct that this adds overhead (LOAD_ATTR is replaced by a method call). If this is significantly impactful to ITL then we need to reconsider the approach. Unfortunately, if we cannot maintain good design due to Python performance in a component that requires good design, it means we have to rewrite it in C++. Let's motivate this decision by data before we compromise on design requirements.
You are correct that there is more code (storing private fields in constructor and returning them in property). This is required to allow only reads to those fields and is well worth the tradeoff. I think it's a small amount of code and confined to single files :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use alternative methods to make a field read-only like type checking. The downsides of using mypy is that it does not have great check coverage and can miss some cases, whereas defining an API ensures users can't go wrong when they're using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are over-reacting here. Let's make our life easier. I don't want to add unnecessary complexity that we need to write so many lines just to define a field. Who will modify these fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarize, I don't think forcing @property.getter
is necessary until I see someone violates this rule, which is very rare in my understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it is necessary to maintain good design of block manager v2. Don't want to regress on that without good reason. The <1% performance improvement of this PR not a good enough reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, please go ahead then. I don't have bandwidth to follow up with this PR.
Took a look at the PR and it is a nice optimization of using a list for a sequence of ids instead of a dict. It would work without any issues with the block_manager_v2 optimizations I did, since I simply use the RefCount as is, and the API did not change (just the underlying implementation). Thanks for doing this @youkaichao ! |
|
Sorry, what QPS is this? just trying to understand when this optimization begins to be impactful |
Need to check with |
OK I looked through the benchmark and this is batch size 1000. |
Hm if the improvement is <1% at batch size 1000, I wonder if we need this right now. it might be due to variance, and the property change could impact this small of an improvement. |
I think this is a why-not problem. When we know all block ids inside an allocator line in a continuous range, why do we need to pay the cost of using a dict to store the ref count? |
I like the simplification aspect of this but agree if it hampers the design / clean abstractions then it should be justified performance-wise. Also note that tp=8 with 7b model is going to magnify these kinds of performance differences in relative terms but is not a realistic production configuration; presumably the percentage gain would be quite a bit smaller with e.g. tp=4 and 70b model. |
data structure used in allocator can be improved in efficiency.
we can use plain list to track ref count, given that block ids are consecutive in one allocator.
this simple change can already bring some performance benefit:
before: Throughput: 38.93 requests/s, 19932.08 tokens/s
after: Throughput: 39.15 requests/s, 20044.03 tokens/s