Respect retry-after header from Azure openAI when ratelimited #6686

JonathanVelkeneers · 2024-09-04T12:01:29Z

JonathanVelkeneers
Sep 4, 2024

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

Azure openAI passes a 'retry-after' header when hitting a ratelimit.
Use this timeout to wait between retries of the ChatModel

Motivation

Currently the retry logic uses an exponential backoff to handle retries.
This works, but fills the console with ratelimit errors. This can be optimized by using the provided wait time from the 'retry-after' header.

Example response:

RateLimitError: 429 Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2024-06-01 have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 1 second. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.
    at APIError.generate (C:\home\site\wwwroot\dist\node_modules\openai\error.js:62:20)
    at AzureOpenAI.makeStatusError (C:\home\site\wwwroot\dist\node_modules\openai\core.js:271:31)
    at AzureOpenAI.makeRequest (C:\home\site\wwwroot\dist\node_modules\openai\core.js:314:30)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async C:\home\site\wwwroot\dist\node_modules\@langchain\openai\dist\chat_models.js:1363:29
    at async RetryOperation._fn (C:\home\site\wwwroot\dist\node_modules\p-retry\index.js:55:12) {
  status: 429,
  headers: {
    'content-length': '339',
    'content-type': 'application/json',
    date: 'Wed, 04 Sep 2024 07:31:38 GMT',
    'policy-id': 'DeploymentRatelimit-Token',
    'retry-after': '1',
    'strict-transport-security': 'max-age=31536000; includeSubDomains; preload',
    'x-content-type-options': 'nosniff',
    'x-ms-region': 'Sweden Central',
    'x-ratelimit-remaining-requests': '3',
    'x-ratelimit-reset-tokens': '1'
  },
  request_id: undefined,
  error: {
    code: '429',
    message: 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2024-06-01 have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 1 second. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'
  },

Proposal (If applicable)

No response

bushidaruler · 2024-11-15T11:28:16Z

bushidaruler
Nov 15, 2024

I'm surprised this is not implemented. The use of exponential backoff creates a lot of unnecessary calls and is not effective time-wise

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Respect retry-after header from Azure openAI when ratelimited #6686

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Respect retry-after header from Azure openAI when ratelimited #6686

JonathanVelkeneers Sep 4, 2024

Checked

Feature request

Motivation

Proposal (If applicable)

Replies: 1 comment

bushidaruler Nov 15, 2024

JonathanVelkeneers
Sep 4, 2024

bushidaruler
Nov 15, 2024