bug: Can openllm run on k8s clusters without GPUs？ #1078

Lucas-16 · 2024-09-09T08:50:25Z

Describe the bug

I want to run Qwen0.5b on a k8s cluster without GPU, but the service startup has failed so far. Is there any way to support CPU machines

To reproduce

No response

Logs

No response

Environment

only have CPU

System information (Optional)

No response

Lucas-16 · 2024-09-09T08:52:36Z

Traceback (most recent call last):
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/starlette/routing.py", line 732, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/root/anaconda3/envs/openllm/lib/python3.9/contextlib.py", line 181, in aenter
return await self.gen.anext()
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/bentoml/_internal/server/base_app.py", line 74, in lifespan
await on_startup()
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/_bentoml_impl/server/app.py", line 275, in create_instance
self._service_instance = self.service()
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/_bentoml_sdk/service/factory.py", line 257, in call
instance = self.inner()
File "/root/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-33df/src/service.py", line 99, in init
self.engine = AsyncLLMEngine.from_engine_args(ENGINE_ARGS)
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 466, in from_engine_args
engine = cls(
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 380, in init
self.engine = self._init_engine(*args, **kwargs)
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 547, in _init_engine
return engine_class(*args, **kwargs)
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 251, in init
self.model_executor = executor_class(
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/executor/executor_base.py", line 47, in init
self._init_executor()
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/executor/gpu_executor.py", line 34, in _init_executor
self.driver_worker = self._create_worker()
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/executor/gpu_executor.py", line 85, in _create_worker
return create_worker(**self._get_create_worker_kwargs(
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/executor/gpu_executor.py", line 20, in create_worker
wrapper.init_worker(**kwargs)
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/worker/worker_base.py", line 367, in init_worker
self.worker = worker_class(*args, **kwargs)
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/worker/worker.py", line 90, in init
self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 651, in init
self.attn_backend = get_attn_backend(
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/attention/selector.py", line 46, in get_attn_backend
backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/attention/selector.py", line 149, in which_attn_to_use
if current_platform.get_device_capability()[0] < 8:
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/platforms/cuda.py", line 49, in get_device_capability
return get_physical_device_capability(physical_device_id)
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/vllm/platforms/cuda.py", line 18, in wrapper
pynvml.nvmlInit()
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/pynvml.py", line 1793, in nvmlInit
nvmlInitWithFlags(0)
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/pynvml.py", line 1776, in nvmlInitWithFlags
_LoadNvmlLibrary()
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/pynvml.py", line 1823, in _LoadNvmlLibrary
_nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND)
File "/root/.openllm/venv/397201824397438346/lib/python3.9/site-packages/pynvml.py", line 855, in _nvmlCheckReturn
raise NVMLError(ret)
pynvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found

aarnphm · 2024-09-09T09:28:04Z

maybe you can try the llamacpp models, but by default vllm requires GPU to be available.

bojiang · 2024-09-29T05:49:27Z

All models supported by openllm today requires Nvidia GPU or Apple silicon to run. We may add more options in the future, or you can contribute to https://github.com/bentoml/OpenLLM-models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Can openllm run on k8s clusters without GPUs？ #1078

bug: Can openllm run on k8s clusters without GPUs？ #1078

Lucas-16 commented Sep 9, 2024

Lucas-16 commented Sep 9, 2024

aarnphm commented Sep 9, 2024

bojiang commented Sep 29, 2024

bug: Can openllm run on k8s clusters without GPUs？ #1078

bug: Can openllm run on k8s clusters without GPUs？ #1078

Comments

Lucas-16 commented Sep 9, 2024

Describe the bug

To reproduce

Logs

Environment

System information (Optional)

Lucas-16 commented Sep 9, 2024

aarnphm commented Sep 9, 2024

bojiang commented Sep 29, 2024