Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Can openllm run on k8s clusters without GPUs? #1078

Open
Lucas-16 opened this issue Sep 9, 2024 · 3 comments
Open

bug: Can openllm run on k8s clusters without GPUs? #1078

Lucas-16 opened this issue Sep 9, 2024 · 3 comments

Comments

@Lucas-16
Copy link

Lucas-16 commented Sep 9, 2024

Describe the bug

I want to run Qwen0.5b on a k8s cluster without GPU, but the service startup has failed so far. Is there any way to support CPU machines
Uploading 屏幕截图 2024-09-09 164657.jpg…

To reproduce

No response

Logs

No response

Environment

only have CPU

System information (Optional)

No response

@Lucas-16
Copy link
Author

Lucas-16 commented Sep 9, 2024

Traceback (most recent call last):
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/starlette/routing.py", line 732, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/root/anaconda3/envs/openllm/lib/python3.9/contextlib.py", line 181, in aenter
return await self.gen.anext()
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/bentoml/_internal/server/base_app.py", line 74, in lifespan
await on_startup()
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/_bentoml_impl/server/app.py", line 275, in create_instance
self._service_instance = self.service()
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/_bentoml_sdk/service/factory.py", line 257, in call
instance = self.inner()
File "/root/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-33df/src/service.py", line 99, in init
self.engine = AsyncLLMEngine.from_engine_args(ENGINE_ARGS)
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 466, in from_engine_args
engine = cls(
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 380, in init
self.engine = self._init_engine(*args, **kwargs)
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 547, in _init_engine
return engine_class(*args, **kwargs)
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 251, in init
self.model_executor = executor_class(
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/executor/executor_base.py", line 47, in init
self._init_executor()
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/executor/gpu_executor.py", line 34, in _init_executor
self.driver_worker = self._create_worker()
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/executor/gpu_executor.py", line 85, in _create_worker
return create_worker(**self._get_create_worker_kwargs(
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/executor/gpu_executor.py", line 20, in create_worker
wrapper.init_worker(**kwargs)
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/worker/worker_base.py", line 367, in init_worker
self.worker = worker_class(*args, **kwargs)
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/worker/worker.py", line 90, in init
self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 651, in init
self.attn_backend = get_attn_backend(
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/attention/selector.py", line 46, in get_attn_backend
backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/attention/selector.py", line 149, in which_attn_to_use
if current_platform.get_device_capability()[0] < 8:
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/platforms/cuda.py", line 49, in get_device_capability
return get_physical_device_capability(physical_device_id)
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/platforms/cuda.py", line 18, in wrapper
pynvml.nvmlInit()
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/pynvml.py", line 1793, in nvmlInit
nvmlInitWithFlags(0)
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/pynvml.py", line 1776, in nvmlInitWithFlags
_LoadNvmlLibrary()
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/pynvml.py", line 1823, in _LoadNvmlLibrary
_nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND)
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/pynvml.py", line 855, in _nvmlCheckReturn
raise NVMLError(ret)
pynvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found

@aarnphm
Copy link
Collaborator

aarnphm commented Sep 9, 2024

maybe you can try the llamacpp models, but by default vllm requires GPU to be available.

@bojiang
Copy link
Member

bojiang commented Sep 29, 2024

All models supported by openllm today requires Nvidia GPU or Apple silicon to run. We may add more options in the future, or you can contribute to https://github.com/bentoml/OpenLLM-models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants