[Bug] serve does nothing when launching Mistral Small 3 AWQ #3167

anunknowperson · 2025-02-20T22:43:40Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

I'm trying to load https://huggingface.co/casperhansen/mistral-small-24b-instruct-2501-awq/tree/main

It starts, converts model to TurboMind, load gpu on 100% for a bit and then does nothing. It doesn't bind specified port.

CUDA_VISIBLE_DEVICES==00000000:03:00.0

Reproduction

(lmdeploy) C:\Users\Admin>lmdeploy serve api_server D:\models\casperhansen_mistral-small-24b-instruct-2501-awq --server-port 45641 --backend turbomind
Add dll path C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin, please note cuda version should >= 11.3 when compiled with cuda 11
[WARNING] gemm_config.in is not found; using default GEMM algo

Environment

sys.platform: win32
Python: 3.8.20 (default, Oct  3 2024, 15:19:54) [MSC v.1929 64 bit (AMD64)]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 3090
CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8
NVCC: Cuda compilation tools, release 12.8, V12.8.61
MSVC: Оптимизирующий компилятор Microsoft (R) C/C++ версии 19.29.30157 для x64
GCC: n/a
PyTorch: 2.4.1+cu124
PyTorch compiling details: PyTorch built with:
  - C++ Version: 201703
  - MSVC 192930154
  - Intel(R) oneAPI Math Kernel Library Version 2024.2.1-Product Build 20240722 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.4.2 (Git Hash 1137e04ec0b5251ca2b4400a4fd3c667ce843d67)
  - OpenMP 2019
  - LAPACK is enabled (usually provided by MKL)
  - CPU capability usage: AVX2
  - CUDA Runtime 12.4
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 90.1
  - Magma 2.5.4
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.4, CUDNN_VERSION=9.1.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.4.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

TorchVision: 0.19.1+cpu
LMDeploy: 0.7.0.post3+
transformers: 4.46.3
gradio: Not Found
fastapi: 0.115.8
pydantic: 2.10.6
triton: Not Found

Error traceback

The text was updated successfully, but these errors were encountered:

lyj0309 · 2025-03-05T07:13:35Z

same issue at qwen 32b

lyj0309 · 2025-03-05T07:19:23Z

you can try --cache-max-entry-count 0.8 or samaller , it's fix my issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] serve does nothing when launching Mistral Small 3 AWQ #3167

[Bug] serve does nothing when launching Mistral Small 3 AWQ #3167

anunknowperson commented Feb 20, 2025

lyj0309 commented Mar 5, 2025

lyj0309 commented Mar 5, 2025

[Bug] serve does nothing when launching Mistral Small 3 AWQ #3167

[Bug] serve does nothing when launching Mistral Small 3 AWQ #3167

Comments

anunknowperson commented Feb 20, 2025

Checklist

Describe the bug

Reproduction

Environment

Error traceback

lyj0309 commented Mar 5, 2025

lyj0309 commented Mar 5, 2025