Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.16.3 版本docker镜像无法选择 sglang 作为推理引擎 #2537

Open
1 of 3 tasks
machgity opened this issue Nov 11, 2024 · 3 comments
Open
1 of 3 tasks

0.16.3 版本docker镜像无法选择 sglang 作为推理引擎 #2537

machgity opened this issue Nov 11, 2024 · 3 comments
Labels
Milestone

Comments

@machgity
Copy link

machgity commented Nov 11, 2024

System Info / 系統信息

Driver Version: 535.171.04 CUDA Version: 12.2

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

xinference 0.16.3 (docker image)

The command used to start Xinference / 用以启动 xinference 的命令

docker-compose.yml:
services:
  xinference:
    container_name: xinference
    image: xprobe/xinference:latest
    ports:
      - "9997:9997"
#      - target: 9997
#        published: 9997
    volumes:
       - /data/xinference:/data
    environment:
#      # add envs here. Here's an example, if you want to download model from modelscope
      - XINFERENCE_MODEL_SRC=modelscope
      - XINFERENCE_HOME=/data
      - ATTENTION_BACKEND=flashinfer
#    command: xinference-local --host 0.0.0.0 --port 9997
    command: sh /data/config/init.sh
    shm_size: 128gb
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
              driver: nvidia
              count: all
              
init.sh :              
xinference launch --model-name qwen2.5-instruct --model-uid Qwen2.5-72B-INT4-Instruct-awq-SGLANG --model-engine sglang --size-in-billions 72 --model-format awq --n-gpu 1 --model_path /data/modelscope/hub/qwen/Qwen2___5-72B-Instruct-AWQ --enable_torch_compile True --disable_cuda_graph True --mem_fraction_static 0.88 --kv_cache_dtype fp8_e5m2 &

Reproduction / 复现过程

xinference  | 2024-11-10 10:43:13,558 transformers.models.auto.image_processing_auto 592 INFO     Could not locate the image processor configuration file, will try to use the model config instead.
xinference  | Could not locate the image processor configuration file, will try to use the model config instead.
xinference  | INFO 11-10 10:43:13 awq_marlin.py:89] The model is convertible to awq_marlin during runtime. Using awq_marlin kernel.
xinference  | INFO 11-10 10:43:13 config.py:648] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor
xinference  | 2024-11-10 10:43:13,566 xinference.api.restful_api 7 ERROR    [address=0.0.0.0:10179, pid=165] Model qwen2.5-instruct cannot be run on engine sglang.
xinference  | Traceback (most recent call last):
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 987, in launch_model
xinference  |     model_uid = await (await self._get_supervisor_ref()).launch_builtin_model(
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send
xinference  |     return self._process_result_message(result)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message
xinference  |     raise message.as_instanceof_cause()
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 659, in send
xinference  |     result = await self._run_coro(message.message_id, coro)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 370, in _run_coro
xinference  |     return await coro
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in __on_receive__
xinference  |     return await super().__on_receive__(message)  # type: ignore
xinference  |   File "xoscar/core.pyx", line 558, in __on_receive__
xinference  |     raise ex
xinference  |   File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
xinference  |     async with self._lock:
xinference  |   File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
xinference  |     with debug_async_timeout('actor_lock_timeout',
xinference  |   File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
xinference  |     result = await result
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1040, in launch_builtin_model
xinference  |     await _launch_model()
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1004, in _launch_model
xinference  |     await _launch_one_model(rep_model_uid)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 983, in _launch_one_model
xinference  |     await worker_ref.launch_builtin_model(
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send
xinference  |     return self._process_result_message(result)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message
xinference  |     raise message.as_instanceof_cause()
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 659, in send
xinference  |     result = await self._run_coro(message.message_id, coro)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 370, in _run_coro
xinference  |     return await coro
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in __on_receive__
xinference  |     return await super().__on_receive__(message)  # type: ignore
xinference  |   File "xoscar/core.pyx", line 558, in __on_receive__
xinference  |     raise ex
xinference  |   File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
xinference  |     async with self._lock:
xinference  |   File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
xinference  |     with debug_async_timeout('actor_lock_timeout',
xinference  |   File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
xinference  |     result = await result
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 78, in wrapped
xinference  |     ret = await func(*args, **kwargs)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 869, in launch_builtin_model
xinference  |     model, model_description = await asyncio.to_thread(
xinference  |   File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread
xinference  |     return await loop.run_in_executor(None, func_call)
xinference  |   File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
xinference  |     result = self.fn(*self.args, **self.kwargs)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/model/core.py", line 73, in create_model_instance
xinference  |     return create_llm_model_instance(
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/core.py", line 216, in create_llm_model_instance
xinference  |     llm_cls = check_engine_by_spec_parameters(
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/llm_family.py", line 1136, in check_engine_by_spec_parameters
xinference  |     raise ValueError(f"Model {model_name} cannot be run on engine {model_engine}.")
xinference  | ValueError: [address=0.0.0.0:10179, pid=165] Model qwen2.5-instruct cannot be run on engine sglang.
xinference  | Traceback (most recent call last):
xinference  |   File "/usr/local/bin/xinference", line 8, in <module>
xinference  |     sys.exit(cli())
xinference  |   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
xinference  |     return self.main(*args, **kwargs)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
xinference  |     rv = self.invoke(ctx)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
xinference  |     return _process_result(sub_ctx.command.invoke(sub_ctx))
xinference  |   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
xinference  |     return ctx.invoke(self.callback, **ctx.params)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
xinference  |     return __callback(*args, **kwargs)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 33, in new_func
xinference  |     return f(get_current_context(), *args, **kwargs)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/cmdline.py", line 906, in model_launch
xinference  |     model_uid = client.launch_model(
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/client/restful/restful_client.py", line 959, in launch_model
xinference  |     raise RuntimeError(
xinference  | RuntimeError: Failed to launch model, detail: [address=0.0.0.0:10179, pid=165] Model qwen2.5-instruct cannot be run on engine sglang.

Expected behavior / 期待表现

0.16.3 版本docker镜像 中丢失 model engine sglang 引擎参数选项(cli & webui)

@XprobeBot XprobeBot added the gpu label Nov 11, 2024
@XprobeBot XprobeBot added this to the v0.16 milestone Nov 11, 2024
@zhanghx0905
Copy link
Contributor

同样的问题,之前的版本有这个问题吗

@qinxuye
Copy link
Contributor

qinxuye commented Nov 13, 2024

我们看下

@QiiiWiii
Copy link

image
image

V1.0.0 也没有

@XprobeBot XprobeBot modified the milestones: v0.16, v1.x Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants