You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
It starts, converts model to TurboMind, load gpu on 100% for a bit and then does nothing. It doesn't bind specified port.
CUDA_VISIBLE_DEVICES==00000000:03:00.0
Reproduction
(lmdeploy) C:\Users\Admin>lmdeploy serve api_server D:\models\casperhansen_mistral-small-24b-instruct-2501-awq --server-port 45641 --backend turbomind
Add dll path C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin, please note cuda version should >= 11.3 when compiled with cuda 11
[WARNING] gemm_config.in is not found; using default GEMM algo
Environment
sys.platform: win32
Python: 3.8.20 (default, Oct 3 2024, 15:19:54) [MSC v.1929 64 bit (AMD64)]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 3090
CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8
NVCC: Cuda compilation tools, release 12.8, V12.8.61
MSVC: Оптимизирующий компилятор Microsoft (R) C/C++ версии 19.29.30157 для x64
GCC: n/a
PyTorch: 2.4.1+cu124
PyTorch compiling details: PyTorch built with:
- C++ Version: 201703
- MSVC 192930154
- Intel(R) oneAPI Math Kernel Library Version 2024.2.1-Product Build 20240722 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.4.2 (Git Hash 1137e04ec0b5251ca2b4400a4fd3c667ce843d67)
- OpenMP 2019
- LAPACK is enabled (usually provided by MKL)
- CPU capability usage: AVX2
- CUDA Runtime 12.4
- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
- CuDNN 90.1
- Magma 2.5.4
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.4, CUDNN_VERSION=9.1.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.4.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,
TorchVision: 0.19.1+cpu
LMDeploy: 0.7.0.post3+
transformers: 4.46.3
gradio: Not Found
fastapi: 0.115.8
pydantic: 2.10.6
triton: Not Found
Error traceback
The text was updated successfully, but these errors were encountered:
Checklist
Describe the bug
I'm trying to load https://huggingface.co/casperhansen/mistral-small-24b-instruct-2501-awq/tree/main
It starts, converts model to TurboMind, load gpu on 100% for a bit and then does nothing. It doesn't bind specified port.
CUDA_VISIBLE_DEVICES==00000000:03:00.0
Reproduction
Environment
Error traceback
The text was updated successfully, but these errors were encountered: