Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stateful inference #2513
stateful inference #2513
Changes from 25 commits
ed5239e
0794f54
4d55643
a857307
4ae7404
5c0dd97
0651806
3e16993
6aee437
91b9f99
a3f84eb
c60f390
f5c7707
dd23216
1ea33cf
fdb03c9
c3a2cca
ba1bc45
f6c888d
d723754
077bf27
bd3296a
d3777e5
d1e5e8d
c24f1b8
3831ff5
8840a1c
79f2e66
91a201e
a431e78
7c81022
fef7780
cb3d232
67436d3
fe663fb
421f31e
7f7bb69
838a896
0698bab
4749b74
80053ca
44d3986
8879393
b05e653
5f7125e
9bb9245
2f83255
a592f10
4b9145b
5fe05cd
8e7ce9e
0a90a87
fb9cdb5
627a31e
876d83d
4b19885
c5a0708
6dc374a
d4ea03d
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not using the network in the handler, so no need for any layers. Just return x in forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm, is it the case that
batchSize
is the least upper bound oflen(data)
, i.e.len(data) <= batchSize
and for alll
such thatlen(data) <= l
,batchSize <= l
?Is it possible for two separate requests to get batched to this worker? If so, suppose there are two separate streaming requests that are batched to this worker. What happens if one client is much much faster than the other? Do we throttle the faster client to match the speed of the slower one by buffering the faster client's messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q1: yes, len(data) <= batchSize. data is a batch of requests received at realtime.
Q2: Yes, a batch of requests comes from different sequences. eg. len(data) = 4, it means there are 4 sequences. Each sequence has its own dedicated jobQ. Only the parameter "maxBatchDelay" decides the msec of batching a group of requests from different sequences. In other words, the different traffic volume of different sequences has no impact on batching latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok but if two streams produce data at drastically different rates, how do you keep the batch index coherent? For instance, fix a stateful worker. At time
t_0
, the worker receives datad_0_0
andd_1_0
from two streams. So thenlen(data) == 2
anddata[0]
is the payload for stream 0 anddata[1]
is the payload for stream 1.At
t_1
, stream 0 does not produce any data because it took longer thanmaxBatchDelay
, but stream 1 produces datad_1_1
. So thenlen(data) == 1
anddata[0]
is the payload for stream 1. In the line below,idx == 0
, so then you fetch the sequence ID for index 0. It seems like this would fetch the sequence ID for stream 0,but you actually want the sequence ID for stream 1. Am I understanding the API semantics correctly? Perhaps I am misunderstanding how
context.get_sequence_id
works. Does it keep track of which stream corresponds to the elements of thedata
list passed to the handler?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each request's sequence id is added into its header with key = "ts_request_sequence_id". Backend can get a request's sequence id via its header. This can guarantee we can always get the sequence id regardless the real batch size is changed or the request of a sequence enters into a different batch slot.