-
Notifications
You must be signed in to change notification settings - Fork 0
/
qwen.txt
62 lines (55 loc) · 3.41 KB
/
qwen.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
1、关于kvcache的量化
1)下载最新qwen和model
2)model下的cache_autogptq_cuda_256.cpp和cache_autogptq_cuda_kernel_256.cu复制到qwen目录下
3)修改openai_api.pyRuntimeError: Ninja is required to load C++ extensions
model = AutoModelForCausalLM.from_pretrained(
args.checkpoint_path,
device_map=device_map,
trust_remote_code=True,
use_cache_quantization=True,
use_cache_kernel=True,
use_flash_attn=False,
resume_download=True,
).eval()
2、RuntimeError: Ninja is required to load C++ extensions
1)git clone https://github.com/ninja-build/ninja
2)git checkout release
3)win搜索vs,执行开始菜单中visual studio 2022下的x64 native tools command prompt(注意不是x86!,其实就是运行vcvars64.bat)
4)进入ninja文件夹
5)conda activate qwen(也可能不需要)
6)python ./configure.py --bootstrap
7)运行ninja: ninja: no work to do.
8)qwen下,运行python openai_api.py
报错:
无法打开包括文件: “assert.h”
无法打开包括文件: “corecrt.h”等
解决:a)visual studio installer中,安装windows 11 sdk和 MSVC生成工具
b)set include=C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.37.32822\include;C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\ucrt;C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\shared
c)然后qwen下,运行python openai_api.py
报错:error: invalid redeclaration of type name "size_t"(这个错误就是因为误用了x86 native tools command prompt,要改用x64)
然后运行正常!
4)以后运行qwen的openai_api.py之前,要在控制台运行:
"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars64.bat"
靠谱做法:
a)新建vc.cmd:"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars64.bat"
b)新建qwen_server.cmd: cmd /K "vc.cmd && conda activate qwen && set CUDA_VISIBLE_DEVICES=0 && d: && cd d:\server\life-agent && start python -m gpu_server.qwen_client_webui && cd d:\qwen && python openai_api.py"
c)qwen_server.cmd和vc.cmd放在一起,运行qwen_server.cmd即可启动qwen服务
5)但是开启kvcache的量化后,回答质量下降!!!!
3、2080ti安装flash-attention(源码编译会报错,用pip编译)
1)安装flash-attention 1.09
a)运行"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars64.bat"
b)conda activate qwen
c)set DISTUTILS_USE_SDK=1
d)set include=C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.37.32822\include;C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\ucrt;C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\shared
e)pip install flash-attn==1.0.9 --no-build-isolation
f)成功。
2)安装flash-attention rotary (可选)
a)下载flash-attention 1.09源码
b)flash-attention 1.09的rotary目录下:
pip install .
c)成功
3)安装flash-attention rms_norm (可选)
a)下载flash-attention 1.09源码
b)flash-attention 1.09的layer_norm目录下:
pip install .
c)报错