Connecting to VLLM Instance #4602
-
What happened?Trying to access LLMLite endpoint to route User --> (Curl) --> LiteLLM --> (Vllm phi-3 model) Getting error
litellm.NotFoundError: NotFoundError: OpenAIException - Error code: 404 Relevant log outputFile "/usr/local/lib/python3.11/site-packages/litellm/router.py", line 645, in acompletion
response = await self.async_function_with_fallbacks(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/litellm/router.py", line 2329, in async_function_with_fallbacks
raise original_exception
File "/usr/local/lib/python3.11/site-packages/litellm/router.py", line 2174, in async_function_with_fallbacks
response = await self.async_function_with_retries(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/litellm/router.py", line 2429, in async_function_with_retries
raise original_exception
File "/usr/local/lib/python3.11/site-packages/litellm/router.py", line 2351, in async_function_with_retries
response = await original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/litellm/router.py", line 787, in _acompletion
raise e
File "/usr/local/lib/python3.11/site-packages/litellm/router.py", line 756, in _acompletion
response = await _response
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 1500, in wrapper_async
raise e
File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 1315, in wrapper_async
result = await original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/litellm/main.py", line 410, in acompletion
raise exception_type(
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 7665, in exception_type
raise e
File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 6036, in exception_type
raise NotFoundError(
litellm.exceptions.NotFoundError: litellm.NotFoundError: NotFoundError: OpenAIException - Error code: 404 - {'detail': 'Not Found'} LiteLLM Retried: 2 times, LiteLLM Max Retries: 3
10:48:16 - LiteLLM Proxy:ERROR: _common.py:120 - Giving up chat_completion(...) after 1 tries (litellm.proxy._types.ProxyException)
INFO: 127.0.0.1:40330 - "POST /chat/completions HTTP/1.1" 404 Not Found
INFO: 10.244.17.1:55474 - "GET /health/liveliness HTTP/1.1" 200 OK
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 6 replies
-
Can anyone guide me what am i missing ? Is the following right way to add vllm hosted models which are
|
Beta Was this translation helpful? Give feedback.
-
FWIW here is the kubernets config file of Vllm model deployment
|
Beta Was this translation helpful? Give feedback.
-
the model looks like it was added correctly can you share the complete stacktrace. It looks like the same issue you had with ollama where the containers aren't able to speak with each other. Moving to discussion, as this doesn't look like a litellm error. |
Beta Was this translation helpful? Give feedback.
-
@krrishdholakia you caught that right adding Would be great to fix it here https://docs.litellm.ai/docs/providers/openai_compatible#usage-with-litellm-proxy-server ? |
Beta Was this translation helpful? Give feedback.
can you modify your api_base to be
http://vllm-service.vllm.svc.cluster.local/v1
and let me know if that works @ksingh-scogo