You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While debugging the request prioritization feature, I saw that only ever 1 request is in the scheduler queue. No matter how hard I tried to generate more requests in parallel.
I had a look at the relevant part of the code in router.py (schedule_acompletion#1309ff):
## ADDS REQUEST TO QUEUE ##awaitself.scheduler.add_request(request=item)
## POLL QUEUEend_time=time.time() +self.timeoutcurr_time=time.time()
poll_interval=self.scheduler.polling_interval# poll every 3msmake_request=Falsewhilecurr_time<end_time:
_healthy_deployments, _=awaitself._async_get_healthy_deployments(
model=model, parent_otel_span=parent_otel_span
)
make_request=awaitself.scheduler.poll( ## POLL QUEUE ## - returns 'True' if there's healthy deployments OR if request is at top of queueid=item.request_id,
model_name=item.model_name,
health_deployments=_healthy_deployments,
)
...
As I understand the code (and I tested it), the request is added to the queue and then is immediately removed from the queue if there is at least 1 healthy deployment.
Now what I don't understand is the reason for this polling logic:
Wouldn't that mean
If there is a healthy deployment, no request ever waits for the polling_interval
If there is no healthy deployment and the request is at the top of the queue, then it would also be immediately be sent.
But why send a request at all if I know there is no healthy deployment?
Relevant log output
No response
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered:
I made a few adjustments in the code and now it's working for me the way I want. Also, I would be glad to make a pull request if that aligns with your vision of this feature. However, before doing that I wanted to understand your implementation and the reason behind it. Maybe I got the configuration wrong after all.
What happened?
While debugging the request prioritization feature, I saw that only ever 1 request is in the scheduler queue. No matter how hard I tried to generate more requests in parallel.
I had a look at the relevant part of the code in router.py (schedule_acompletion#1309ff):
As I understand the code (and I tested it), the request is added to the queue and then is immediately removed from the queue if there is at least 1 healthy deployment.
Now what I don't understand is the reason for this polling logic:
Wouldn't that mean
But why send a request at all if I know there is no healthy deployment?
Relevant log output
No response
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered: