Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Proxy | [BETA] Request Prioritization | Polling logic for scheduler.queue #6867

Open
thevogoncoder opened this issue Nov 22, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@thevogoncoder
Copy link

What happened?

While debugging the request prioritization feature, I saw that only ever 1 request is in the scheduler queue. No matter how hard I tried to generate more requests in parallel.

I had a look at the relevant part of the code in router.py (schedule_acompletion#1309ff):

        ## ADDS REQUEST TO QUEUE ##
        await self.scheduler.add_request(request=item)

        ## POLL QUEUE
        end_time = time.time() + self.timeout
        curr_time = time.time()
        poll_interval = self.scheduler.polling_interval  # poll every 3ms
        make_request = False

        while curr_time < end_time:
            _healthy_deployments, _ = await self._async_get_healthy_deployments(
                model=model, parent_otel_span=parent_otel_span
            )
            make_request = await self.scheduler.poll(  ## POLL QUEUE ## - returns 'True' if there's healthy deployments OR if request is at top of queue
                id=item.request_id,
                model_name=item.model_name,
                health_deployments=_healthy_deployments,
            )
    ...

As I understand the code (and I tested it), the request is added to the queue and then is immediately removed from the queue if there is at least 1 healthy deployment.

Now what I don't understand is the reason for this polling logic:
Wouldn't that mean

  1. If there is a healthy deployment, no request ever waits for the polling_interval
  2. If there is no healthy deployment and the request is at the top of the queue, then it would also be immediately be sent.

But why send a request at all if I know there is no healthy deployment?

Relevant log output

No response

Twitter / LinkedIn details

No response

@thevogoncoder thevogoncoder added the bug Something isn't working label Nov 22, 2024
@thevogoncoder
Copy link
Author

I made a few adjustments in the code and now it's working for me the way I want. Also, I would be glad to make a pull request if that aligns with your vision of this feature. However, before doing that I wanted to understand your implementation and the reason behind it. Maybe I got the configuration wrong after all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant