[Bug]: Proxy | [BETA] Request Prioritization | Polling logic for scheduler.queue #6867

thevogoncoder · 2024-11-22T15:15:15Z

What happened?

While debugging the request prioritization feature, I saw that only ever 1 request is in the scheduler queue. No matter how hard I tried to generate more requests in parallel.

I had a look at the relevant part of the code in router.py (schedule_acompletion#1309ff):

        ## ADDS REQUEST TO QUEUE ##
        await self.scheduler.add_request(request=item)

        ## POLL QUEUE
        end_time = time.time() + self.timeout
        curr_time = time.time()
        poll_interval = self.scheduler.polling_interval  # poll every 3ms
        make_request = False

        while curr_time < end_time:
            _healthy_deployments, _ = await self._async_get_healthy_deployments(
                model=model, parent_otel_span=parent_otel_span
            )
            make_request = await self.scheduler.poll(  ## POLL QUEUE ## - returns 'True' if there's healthy deployments OR if request is at top of queue
                id=item.request_id,
                model_name=item.model_name,
                health_deployments=_healthy_deployments,
            )
    ...

As I understand the code (and I tested it), the request is added to the queue and then is immediately removed from the queue if there is at least 1 healthy deployment.

Now what I don't understand is the reason for this polling logic:
Wouldn't that mean

If there is a healthy deployment, no request ever waits for the polling_interval
If there is no healthy deployment and the request is at the top of the queue, then it would also be immediately be sent.

But why send a request at all if I know there is no healthy deployment?

Relevant log output

No response

Twitter / LinkedIn details

No response

thevogoncoder · 2024-11-22T15:19:50Z

I made a few adjustments in the code and now it's working for me the way I want. Also, I would be glad to make a pull request if that aligns with your vision of this feature. However, before doing that I wanted to understand your implementation and the reason behind it. Maybe I got the configuration wrong after all.

thevogoncoder added the bug Something isn't working label Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Proxy | [BETA] Request Prioritization | Polling logic for scheduler.queue #6867

[Bug]: Proxy | [BETA] Request Prioritization | Polling logic for scheduler.queue #6867

thevogoncoder commented Nov 22, 2024

thevogoncoder commented Nov 22, 2024

[Bug]: Proxy | [BETA] Request Prioritization | Polling logic for scheduler.queue #6867

[Bug]: Proxy | [BETA] Request Prioritization | Polling logic for scheduler.queue #6867

Comments

thevogoncoder commented Nov 22, 2024

What happened?

Relevant log output

Twitter / LinkedIn details

thevogoncoder commented Nov 22, 2024