Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] serve does nothing when launching Mistral Small 3 AWQ #3167

Open
3 tasks done
anunknowperson opened this issue Feb 20, 2025 · 2 comments
Open
3 tasks done

[Bug] serve does nothing when launching Mistral Small 3 AWQ #3167

anunknowperson opened this issue Feb 20, 2025 · 2 comments

Comments

@anunknowperson
Copy link

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

Image

I'm trying to load https://huggingface.co/casperhansen/mistral-small-24b-instruct-2501-awq/tree/main

It starts, converts model to TurboMind, load gpu on 100% for a bit and then does nothing. It doesn't bind specified port.

CUDA_VISIBLE_DEVICES==00000000:03:00.0

Reproduction

(lmdeploy) C:\Users\Admin>lmdeploy serve api_server D:\models\casperhansen_mistral-small-24b-instruct-2501-awq --server-port 45641 --backend turbomind
Add dll path C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin, please note cuda version should >= 11.3 when compiled with cuda 11
[WARNING] gemm_config.in is not found; using default GEMM algo

Environment

sys.platform: win32
Python: 3.8.20 (default, Oct  3 2024, 15:19:54) [MSC v.1929 64 bit (AMD64)]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 3090
CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8
NVCC: Cuda compilation tools, release 12.8, V12.8.61
MSVC: Оптимизирующий компилятор Microsoft (R) C/C++ версии 19.29.30157 для x64
GCC: n/a
PyTorch: 2.4.1+cu124
PyTorch compiling details: PyTorch built with:
  - C++ Version: 201703
  - MSVC 192930154
  - Intel(R) oneAPI Math Kernel Library Version 2024.2.1-Product Build 20240722 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.4.2 (Git Hash 1137e04ec0b5251ca2b4400a4fd3c667ce843d67)
  - OpenMP 2019
  - LAPACK is enabled (usually provided by MKL)
  - CPU capability usage: AVX2
  - CUDA Runtime 12.4
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 90.1
  - Magma 2.5.4
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.4, CUDNN_VERSION=9.1.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.4.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

TorchVision: 0.19.1+cpu
LMDeploy: 0.7.0.post3+
transformers: 4.46.3
gradio: Not Found
fastapi: 0.115.8
pydantic: 2.10.6
triton: Not Found

Error traceback

@lyj0309
Copy link
Contributor

lyj0309 commented Mar 5, 2025

same issue at qwen 32b

@lyj0309
Copy link
Contributor

lyj0309 commented Mar 5, 2025

you can try --cache-max-entry-count 0.8 or samaller , it's fix my issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants