Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

您好!我想问问这个报错是为啥呢? #15

Open
Ringssss opened this issue Dec 17, 2024 · 3 comments
Open

您好!我想问问这个报错是为啥呢? #15

Ringssss opened this issue Dec 17, 2024 · 3 comments

Comments

@Ringssss
Copy link

python -m zhilight.server.openai.entrypoints.api_server --model-path /home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf
INFO 12-17 19:26:28 api_server.py:152] ZhiLight OpenAI-Compatible Server version 0.4.8.
INFO 12-17 19:26:28 api_server.py:160] args: Namespace(host='0.0.0.0', port=8080, api_key='', served_model_name=None, response_role='assistant', uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=[''], ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], zhilight_version=None, environ=[], pip=[], model_path='/home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf', max_model_len=8192, disable_flash_attention=False, enable_cpm_chat=False, disable_tensor_parallel=False, enable_prefix_caching=False, disable_log_stats=False, quantization=None, dyn_max_batch_size=8, dyn_max_beam_size=4, ignore_eos=False, disable_log_requests=False, max_log_len=None)
INFO 12-17 19:26:28 llm_engine.py:20] engine config => EngineConfig(model_path='/home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf', model_file='/home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf/model-00002-of-00002.safetensors', vocab_file='/home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf/vocabs.txt', is_cpm_directory_struct=False, use_safetensors=True, model_config={'_name_or_path': 'meta-llama/Llama-2-7b-chat-hf', 'architectures': ['LlamaForCausalLM'], 'bos_token_id': 1, 'eos_token_id': 2, 'hidden_act': 'silu', 'hidden_size': 4096, 'initializer_range': 0.02, 'intermediate_size': 11008, 'max_position_embeddings': 4096, 'model_type': 'llama', 'num_attention_heads': 32, 'num_hidden_layers': 32, 'num_key_value_heads': 32, 'pretraining_tp': 1, 'rms_norm_eps': 1e-05, 'rope_scaling': None, 'tie_word_embeddings': False, 'torch_dtype': 'float16', 'transformers_version': '4.32.0.dev0', 'use_cache': True, 'vocab_size': 32000, 'num_layers': 32, 'dim_model': 4096, 'num_heads': 32, 'num_kv_heads': 32, 'max_token': 4096, 'dim_ff': 11008, 'eps': 1e-05, 'activate_fn': 'silu', 'bfloat16': False, 'new_vocab': False}, dyn_batch_config=DynamicBatchConfig(max_batch=8, max_beam_size=4, task_queue_size=8, max_total_token=8192, seed=0, bos_id=0, eos_id=2, nccl=-1, rag_buffer=True), quant_config={'type': <QuantType.NoQuant: 0>}, memory_limit=0, enable_tensor_parallel=True, is_chatml=False, max_model_len=8192)
[DEV]Config: HIGH_PRECISION=1; DUAL_STREAM=None; CPM_FUSE_QKV=1; CPM_FUSE_FF_IN=1; REDUCE_TP_INT8_THRES=None; W4_INT8_ALGO=None; W4_FP8_ALGO=None
dist_config: parallel=True
********* world_size=1, nccl_version=22005 *********
GS4845:1595021:1595021 [0] NCCL INFO Bootstrap : Using eno1:192.168.163.94<0>
GS4845:1595021:1595021 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
GS4845:1595021:1595021 [0] NCCL INFO cudaDriverVersion 12040
NCCL version 2.20.5+cuda12.4
CC:90, mp_count:114, L2 Cache:50MB, Max Persistent L2:32000KB, max_smem:227KB
GS4845:1595021:1595114 [0] NCCL INFO Failed to open libibverbs.so[.1]
GS4845:1595021:1595114 [0] NCCL INFO NET/Socket : Using [0]eno1:192.168.163.94<0> [1]usb0:169.254.3.1<0> [2]veth38cb222:fe80::ac59:2dff:feab:9ba4%veth38cb222<0>
GS4845:1595021:1595114 [0] NCCL INFO Using non-device net plugin version 0
GS4845:1595021:1595114 [0] NCCL INFO Using network Socket
GS4845:1595021:1595114 [0] NCCL INFO comm 0x180f5050 rank 0 nranks 1 cudaDev 0 nvmlDev 0 busId e1000 commId 0x73a7388a9bd37b96 - Init START
GS4845:1595021:1595114 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffffffff,00000000,00000000,ffffffff,ffffffff,00000000,00000000
GS4845:1595021:1595114 [0] NCCL INFO comm 0x180f5050 rank 0 nRanks 1 nNodes 1 localRanks 1 localRank 0 MNNVL 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 00/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 01/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 02/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 03/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 04/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 05/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 06/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 07/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 08/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 09/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 10/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 11/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 12/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 13/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 14/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 15/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 16/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 17/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 18/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 19/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 20/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 21/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 22/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 23/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 24/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 25/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 26/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 27/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 28/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 29/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 30/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 31/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1
GS4845:1595021:1595114 [0] NCCL INFO P2P Chunksize set to 131072
GS4845:1595021:1595114 [0] NCCL INFO Connected all rings
GS4845:1595021:1595114 [0] NCCL INFO Connected all trees
GS4845:1595021:1595114 [0] NCCL INFO 32 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
GS4845:1595021:1595114 [0] NCCL INFO comm 0x180f5050 rank 0 nranks 1 cudaDev 0 nvmlDev 0 busId e1000 commId 0x73a7388a9bd37b96 - Init COMPLETE
Config(model_type=llama, num_layers=32, dim_model=4096, num_heads=32, num_kv_heads=32, dim_head=128, dim_ff=11008, vocab_size=32000, eps=1e-05, scale_weights=0, weight_transposed=0, dim_model_base=0, scale_depth=1, scale_emb=1, dtype=half, pos_bias_type=rotary, activate_fn=silu, rope_theta=10000, max_position_embeddings=4096)

CHUNKED_PREFILL:0, SIZE: 512
CUBLAS Error: cublasLtMatmul( ctx.current_cublas_handle(), matmul_desc, p_alpha, B.data(), layout_B, A.data(), layout_A, p_beta, ret.data(), layout_C, ret.data(), layout_C, algo_found ? &algo : nullptr, NULL, 0, stream)
CUBLAS_STATUS_NOT_SUPPORTED

Verify max_token failed! please adjust reserved_work_mem_mb to a bigger value.
Killed

卡是单张H100

@Ringssss
Copy link
Author

注意到项目支持的模型有限,我也使用了Qwen2.5-14B-Instruct。但发现好像问题还是一样。
python -m zhilight.server.openai.entrypoints.api_server --model-path /data/huggingface/Qwen2.5-14B-Instruct
INFO 12-17 21:39:05 api_server.py:152] ZhiLight OpenAI-Compatible Server version 0.4.8.
INFO 12-17 21:39:05 api_server.py:160] args: Namespace(host='0.0.0.0', port=8080, api_key='', served_model_name=None, response_role='assistant', uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=[''], ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], zhilight_version=None, environ=[], pip=[], model_path='/data/huggingface/Qwen2.5-14B-Instruct', max_model_len=8192, disable_flash_attention=False, enable_cpm_chat=False, disable_tensor_parallel=False, enable_prefix_caching=False, disable_log_stats=False, quantization=None, dyn_max_batch_size=8, dyn_max_beam_size=4, ignore_eos=False, disable_log_requests=False, max_log_len=None)

Adapt qwen2 14b config

Auto Set CHUNKED_PREFILL=1

Auto Set CHUNKED_PREFILL_SIZE=512

Auto Set CPM_FUSE_QKV=1

Auto Set CPM_FUSE_FF_IN=1

Auto Set HOST_REDUCE=1

Auto Set HOST_REDUCE_COPY_ONLY=1

Auto Set DUAL_STREAM=1

Auto Set DUAL_STREAM_THRESHOLD=100

INFO 12-17 21:39:05 llm_engine.py:20] engine config => EngineConfig(model_path='/data/huggingface/Qwen2.5-14B-Instruct', model_file='/data/huggingface/Qwen2.5-14B-Instruct/model-00001-of-00008.safetensors', vocab_file='/data/huggingface/Qwen2.5-14B-Instruct/vocabs.txt', is_cpm_directory_struct=False, use_safetensors=True, model_config={'architectures': ['Qwen2ForCausalLM'], 'attention_dropout': 0.0, 'bos_token_id': 151643, 'eos_token_id': 151645, 'hidden_act': 'silu', 'hidden_size': 5120, 'initializer_range': 0.02, 'intermediate_size': 13824, 'max_position_embeddings': 32768, 'max_window_layers': 70, 'model_type': 'qwen2', 'num_attention_heads': 40, 'num_hidden_layers': 48, 'num_key_value_heads': 8, 'rms_norm_eps': 1e-06, 'rope_theta': 1000000.0, 'sliding_window': 131072, 'tie_word_embeddings': False, 'torch_dtype': 'bfloat16', 'transformers_version': '4.43.1', 'use_cache': True, 'use_sliding_window': False, 'vocab_size': 152064, 'num_layers': 48, 'dim_model': 5120, 'num_heads': 40, 'num_kv_heads': 8, 'max_token': 32768, 'dim_ff': 13824, 'eps': 1e-06, 'activate_fn': 'silu', 'bfloat16': True, 'new_vocab': False}, dyn_batch_config=DynamicBatchConfig(max_batch=8, max_beam_size=4, task_queue_size=8, max_total_token=8192, seed=0, bos_id=0, eos_id=2, nccl=-1, rag_buffer=True), quant_config={'type': <QuantType.NoQuant: 0>}, memory_limit=0, enable_tensor_parallel=True, is_chatml=False, max_model_len=8192)

Adapt qwen2 14b config

[DEV]Config: HIGH_PRECISION=1; DUAL_STREAM=1; CPM_FUSE_QKV=1; CPM_FUSE_FF_IN=1; REDUCE_TP_INT8_THRES=None; W4_INT8_ALGO=None; W4_FP8_ALGO=None
dist_config: parallel=True
********* world_size=1, nccl_version=22005 *********
GS4845:1597899:1597899 [0] NCCL INFO Bootstrap : Using eno1:192.168.163.94<0>
GS4845:1597899:1597899 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
GS4845:1597899:1597899 [0] NCCL INFO cudaDriverVersion 12040
NCCL version 2.20.5+cuda12.4
CC:90, mp_count:114, L2 Cache:50MB, Max Persistent L2:32000KB, max_smem:227KB
GS4845:1597899:1597980 [0] NCCL INFO Failed to open libibverbs.so[.1]
GS4845:1597899:1597980 [0] NCCL INFO NET/Socket : Using [0]eno1:192.168.163.94<0> [1]usb0:169.254.3.1<0> [2]veth38cb222:fe80::ac59:2dff:feab:9ba4%veth38cb222<0>
GS4845:1597899:1597980 [0] NCCL INFO Using non-device net plugin version 0
GS4845:1597899:1597980 [0] NCCL INFO Using network Socket
GS4845:1597899:1597980 [0] NCCL INFO comm 0x17ecafb0 rank 0 nranks 1 cudaDev 0 nvmlDev 0 busId e1000 commId 0x34fea5953f5c14e3 - Init START
GS4845:1597899:1597980 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffffffff,00000000,00000000,ffffffff,ffffffff,00000000,00000000
GS4845:1597899:1597980 [0] NCCL INFO comm 0x17ecafb0 rank 0 nRanks 1 nNodes 1 localRanks 1 localRank 0 MNNVL 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 00/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 01/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 02/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 03/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 04/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 05/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 06/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 07/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 08/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 09/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 10/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 11/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 12/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 13/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 14/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 15/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 16/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 17/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 18/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 19/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 20/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 21/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 22/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 23/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 24/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 25/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 26/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 27/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 28/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 29/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 30/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Channel 31/32 : 0
GS4845:1597899:1597980 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1
GS4845:1597899:1597980 [0] NCCL INFO P2P Chunksize set to 131072
GS4845:1597899:1597980 [0] NCCL INFO Connected all rings
GS4845:1597899:1597980 [0] NCCL INFO Connected all trees
GS4845:1597899:1597980 [0] NCCL INFO 32 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
GS4845:1597899:1597980 [0] NCCL INFO comm 0x17ecafb0 rank 0 nranks 1 cudaDev 0 nvmlDev 0 busId e1000 commId 0x34fea5953f5c14e3 - Init COMPLETE
Config(model_type=qwen2, num_layers=48, dim_model=5120, num_heads=40, num_kv_heads=8, dim_head=128, dim_ff=13824, vocab_size=152064, eps=1e-06, scale_weights=0, weight_transposed=0, dim_model_base=0, scale_depth=1, scale_emb=1, dtype=bfloat, pos_bias_type=rotary, activate_fn=silu, rope_theta=1e+06, max_position_embeddings=32768)
Set chat model eos_id to 151645

Use pre_alloc: 1
CHUNKED_PREFILL:1, SIZE: 512
ADD dual stream reduce workspace=80MB
KV cache_allocator: free_mem=50454MB
CUBLAS Error: cublasLtMatmul( ctx.current_cublas_handle(), matmul_desc, p_alpha, B.data(), layout_B, A.data(), layout_A, p_beta, ret.data(), layout_C, ret.data(), layout_C, algo_found ? &algo : nullptr, NULL, 0, stream)
CUBLAS_STATUS_NOT_SUPPORTED

Verify max_token failed! please adjust reserved_work_mem_mb to a bigger value.
Killed

@tonyw
Copy link

tonyw commented Dec 18, 2024

看日志最后有一个CUBLAS_STATUS_NOT_SUPPORTED的报错,感觉是这个镜像不支持你的卡型。看README的测试报告主要是针对4090和A800,估计没有对H100的支持。富哥,毕竟国内H100太稀少了。

@spetrel
Copy link
Collaborator

spetrel commented Dec 18, 2024

没有在 H100 测过。H100 的话请确认一下 torch 对应的 cublas 是不是 12.5 的版本。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants