Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Configure RPM for specific models #764

Closed
usmanovbf opened this issue Jul 1, 2024 · 4 comments
Closed

Feature: Configure RPM for specific models #764

usmanovbf opened this issue Jul 1, 2024 · 4 comments
Labels
question Further information is requested

Comments

@usmanovbf
Copy link

Issue

Hi! First of all, thank you for such unique tool.
I wonder, is it possible to set request per minute amount? For example, if I use free version of Gemini, it allows 2 RPM https://ai.google.dev/pricing . So, I am getting the error now.

Unexpected error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - b'[{\n  "error": {\n    "code": 429,\n    "message": "Resource
has been exhausted (e.g. check quota).",\n    "status": "RESOURCE_EXHAUSTED"\n  }\n}\n]'

Since aider uses litellm, it will be greate to pass litellm settings from some YAML file and use rpm attribute, for example, like this https://litellm.vercel.app/docs/proxy/reliability#step-1---set-deployments-on-config

Please, make an attention to this, since it will help to bring more delicate fine tuning.

Thank you!.

Version and model info

Aider v0.40.6
Model: gemini/gemini-1.5-pro-latest with diff-fenced edit format
Git repo: .git with 6 files
Repo-map: using 1024 tokens

@paul-gauthier
Copy link
Owner

Thanks for trying aider and filing this issue.

Aider should have retried that error a bunch of times before finally giving up?

Aider doesn't use the litellm proxy, just the python library. And I don't know what the proxy would do if the client exceeds the rate limit? Probably just return a rate limit error just like google is?

@paul-gauthier paul-gauthier added the question Further information is requested label Jul 1, 2024
@usmanovbf
Copy link
Author

usmanovbf commented Jul 2, 2024

Aider should have retried that error a bunch of times before finally giving up?

Unfortunately, it stucks with the error

Unexpected error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - b'{\n  "error": {\n    "code": 429,\n    "message": "Resource has been exhausted (e.g. check quota).",\n    "status":
"RESOURCE_EXHAUSTED"\n  }\n}\n'

Or sometimes with the error like

Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.10/site-packages/litellm/llms/vertex_httpx.py", line 1122, in completion
    response.raise_for_status()
  File "/opt/homebrew/lib/python3.10/site-packages/httpx/_models.py", line 761, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest:generateContent?key=*MASKED*'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.10/site-packages/litellm/main.py", line 1942, in completion
    response = vertex_chat_completion.completion(  # type: ignore
  File "/opt/homebrew/lib/python3.10/site-packages/litellm/llms/vertex_httpx.py", line 1125, in completion
    raise VertexAIError(status_code=error_code, message=response.text)
litellm.llms.vertex_httpx.VertexAIError: {
  "error": {
    "code": 500,
    "message": "An internal error has occurred. Please retry or report in https://developers.generativeai.google/guide/troubleshooting",
    "status": "INTERNAL"
  }
}


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/bin/aider", line 8, in <module>
    sys.exit(main())
  File "/opt/homebrew/lib/python3.10/site-packages/aider/main.py", line 539, in main
    coder.run()
  File "/opt/homebrew/lib/python3.10/site-packages/aider/coders/base_coder.py", line 612, in run
    list(self.send_new_user_message(new_user_message))
  File "/opt/homebrew/lib/python3.10/site-packages/aider/coders/base_coder.py", line 917, in send_new_user_message
    saved_message = self.auto_commit(edited)
  File "/opt/homebrew/lib/python3.10/site-packages/aider/coders/base_coder.py", line 1440, in auto_commit
    res = self.repo.commit(fnames=edited, context=context, aider_edits=True)
  File "/opt/homebrew/lib/python3.10/site-packages/aider/repo.py", line 87, in commit
    commit_message = self.get_commit_message(diffs, context)
  File "/opt/homebrew/lib/python3.10/site-packages/aider/repo.py", line 163, in get_commit_message
    commit_message = simple_send_with_retries(model.name, messages)
  File "/opt/homebrew/lib/python3.10/site-packages/aider/sendchat.py", line 81, in simple_send_with_retries
    _hash, response = send_with_retries(
  File "/opt/homebrew/lib/python3.10/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/aider/sendchat.py", line 71, in send_with_retries
    res = litellm.completion(**kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/litellm/utils.py", line 959, in wrapper
    raise e
  File "/opt/homebrew/lib/python3.10/site-packages/litellm/utils.py", line 843, in wrapper
    result = original_function(*args, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/litellm/main.py", line 2607, in completion
    raise exception_type(
  File "/opt/homebrew/lib/python3.10/site-packages/litellm/utils.py", line 7586, in exception_type
    raise e
  File "/opt/homebrew/lib/python3.10/site-packages/litellm/utils.py", line 6665, in exception_type
    raise litellm.InternalServerError(
litellm.exceptions.InternalServerError: litellm.InternalServerError: VertexAIException InternalServerError - {
  "error": {
    "code": 500,
    "message": "An internal error has occurred. Please retry or report in https://developers.generativeai.google/guide/troubleshooting",
    "status": "INTERNAL"
  }
}

and I need to press Ctrl+C two times and re-run aider.

Perhaps, moving to litellm or just adding retry mechanism after certain point of time would help it.

And I don't know what the proxy would do if the client exceeds the rate limit?

It should just wait for some time to make the request again based on the RPM from the specific model quota and just using exponential backoff.

Probably just return a rate limit error just like google is?

It obligates to re-run aider manually again and again and it's not suitable. Better to have auto-retry based on the allowed RPM for the model or just based on the exponential backoff.

@paul-gauthier
Copy link
Owner

Aider does retry litellm.RateLimitError. If all the retries fail, only then does it report the error to the user.

@paul-gauthier
Copy link
Owner

I'm going to close this issue for now, but feel free to add a comment here and I will re-open or file a new issue any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants