-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
您好!我想问问这个报错是为啥呢? #15
Comments
注意到项目支持的模型有限,我也使用了Qwen2.5-14B-Instruct。但发现好像问题还是一样。 Adapt qwen2 14b configAuto Set CHUNKED_PREFILL=1Auto Set CHUNKED_PREFILL_SIZE=512Auto Set CPM_FUSE_QKV=1Auto Set CPM_FUSE_FF_IN=1Auto Set HOST_REDUCE=1Auto Set HOST_REDUCE_COPY_ONLY=1Auto Set DUAL_STREAM=1Auto Set DUAL_STREAM_THRESHOLD=100INFO 12-17 21:39:05 llm_engine.py:20] engine config => EngineConfig(model_path='/data/huggingface/Qwen2.5-14B-Instruct', model_file='/data/huggingface/Qwen2.5-14B-Instruct/model-00001-of-00008.safetensors', vocab_file='/data/huggingface/Qwen2.5-14B-Instruct/vocabs.txt', is_cpm_directory_struct=False, use_safetensors=True, model_config={'architectures': ['Qwen2ForCausalLM'], 'attention_dropout': 0.0, 'bos_token_id': 151643, 'eos_token_id': 151645, 'hidden_act': 'silu', 'hidden_size': 5120, 'initializer_range': 0.02, 'intermediate_size': 13824, 'max_position_embeddings': 32768, 'max_window_layers': 70, 'model_type': 'qwen2', 'num_attention_heads': 40, 'num_hidden_layers': 48, 'num_key_value_heads': 8, 'rms_norm_eps': 1e-06, 'rope_theta': 1000000.0, 'sliding_window': 131072, 'tie_word_embeddings': False, 'torch_dtype': 'bfloat16', 'transformers_version': '4.43.1', 'use_cache': True, 'use_sliding_window': False, 'vocab_size': 152064, 'num_layers': 48, 'dim_model': 5120, 'num_heads': 40, 'num_kv_heads': 8, 'max_token': 32768, 'dim_ff': 13824, 'eps': 1e-06, 'activate_fn': 'silu', 'bfloat16': True, 'new_vocab': False}, dyn_batch_config=DynamicBatchConfig(max_batch=8, max_beam_size=4, task_queue_size=8, max_total_token=8192, seed=0, bos_id=0, eos_id=2, nccl=-1, rag_buffer=True), quant_config={'type': <QuantType.NoQuant: 0>}, memory_limit=0, enable_tensor_parallel=True, is_chatml=False, max_model_len=8192) Adapt qwen2 14b config[DEV]Config: HIGH_PRECISION=1; DUAL_STREAM=1; CPM_FUSE_QKV=1; CPM_FUSE_FF_IN=1; REDUCE_TP_INT8_THRES=None; W4_INT8_ALGO=None; W4_FP8_ALGO=None
Verify max_token failed! please adjust reserved_work_mem_mb to a bigger value. |
看日志最后有一个CUBLAS_STATUS_NOT_SUPPORTED的报错,感觉是这个镜像不支持你的卡型。看README的测试报告主要是针对4090和A800,估计没有对H100的支持。富哥,毕竟国内H100太稀少了。 |
没有在 H100 测过。H100 的话请确认一下 torch 对应的 cublas 是不是 12.5 的版本。 |
python -m zhilight.server.openai.entrypoints.api_server --model-path /home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf
INFO 12-17 19:26:28 api_server.py:152] ZhiLight OpenAI-Compatible Server version 0.4.8.
INFO 12-17 19:26:28 api_server.py:160] args: Namespace(host='0.0.0.0', port=8080, api_key='', served_model_name=None, response_role='assistant', uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=[''], ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], zhilight_version=None, environ=[], pip=[], model_path='/home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf', max_model_len=8192, disable_flash_attention=False, enable_cpm_chat=False, disable_tensor_parallel=False, enable_prefix_caching=False, disable_log_stats=False, quantization=None, dyn_max_batch_size=8, dyn_max_beam_size=4, ignore_eos=False, disable_log_requests=False, max_log_len=None)
INFO 12-17 19:26:28 llm_engine.py:20] engine config => EngineConfig(model_path='/home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf', model_file='/home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf/model-00002-of-00002.safetensors', vocab_file='/home/zhujianian/workspace/Uneed/huggingface_download/Llama-2-7b-chat-hf/vocabs.txt', is_cpm_directory_struct=False, use_safetensors=True, model_config={'_name_or_path': 'meta-llama/Llama-2-7b-chat-hf', 'architectures': ['LlamaForCausalLM'], 'bos_token_id': 1, 'eos_token_id': 2, 'hidden_act': 'silu', 'hidden_size': 4096, 'initializer_range': 0.02, 'intermediate_size': 11008, 'max_position_embeddings': 4096, 'model_type': 'llama', 'num_attention_heads': 32, 'num_hidden_layers': 32, 'num_key_value_heads': 32, 'pretraining_tp': 1, 'rms_norm_eps': 1e-05, 'rope_scaling': None, 'tie_word_embeddings': False, 'torch_dtype': 'float16', 'transformers_version': '4.32.0.dev0', 'use_cache': True, 'vocab_size': 32000, 'num_layers': 32, 'dim_model': 4096, 'num_heads': 32, 'num_kv_heads': 32, 'max_token': 4096, 'dim_ff': 11008, 'eps': 1e-05, 'activate_fn': 'silu', 'bfloat16': False, 'new_vocab': False}, dyn_batch_config=DynamicBatchConfig(max_batch=8, max_beam_size=4, task_queue_size=8, max_total_token=8192, seed=0, bos_id=0, eos_id=2, nccl=-1, rag_buffer=True), quant_config={'type': <QuantType.NoQuant: 0>}, memory_limit=0, enable_tensor_parallel=True, is_chatml=False, max_model_len=8192)
[DEV]Config: HIGH_PRECISION=1; DUAL_STREAM=None; CPM_FUSE_QKV=1; CPM_FUSE_FF_IN=1; REDUCE_TP_INT8_THRES=None; W4_INT8_ALGO=None; W4_FP8_ALGO=None
dist_config: parallel=True
********* world_size=1, nccl_version=22005 *********
GS4845:1595021:1595021 [0] NCCL INFO Bootstrap : Using eno1:192.168.163.94<0>
GS4845:1595021:1595021 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
GS4845:1595021:1595021 [0] NCCL INFO cudaDriverVersion 12040
NCCL version 2.20.5+cuda12.4
CC:90, mp_count:114, L2 Cache:50MB, Max Persistent L2:32000KB, max_smem:227KB
GS4845:1595021:1595114 [0] NCCL INFO Failed to open libibverbs.so[.1]
GS4845:1595021:1595114 [0] NCCL INFO NET/Socket : Using [0]eno1:192.168.163.94<0> [1]usb0:169.254.3.1<0> [2]veth38cb222:fe80::ac59:2dff:feab:9ba4%veth38cb222<0>
GS4845:1595021:1595114 [0] NCCL INFO Using non-device net plugin version 0
GS4845:1595021:1595114 [0] NCCL INFO Using network Socket
GS4845:1595021:1595114 [0] NCCL INFO comm 0x180f5050 rank 0 nranks 1 cudaDev 0 nvmlDev 0 busId e1000 commId 0x73a7388a9bd37b96 - Init START
GS4845:1595021:1595114 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffffffff,00000000,00000000,ffffffff,ffffffff,00000000,00000000
GS4845:1595021:1595114 [0] NCCL INFO comm 0x180f5050 rank 0 nRanks 1 nNodes 1 localRanks 1 localRank 0 MNNVL 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 00/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 01/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 02/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 03/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 04/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 05/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 06/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 07/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 08/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 09/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 10/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 11/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 12/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 13/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 14/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 15/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 16/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 17/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 18/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 19/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 20/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 21/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 22/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 23/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 24/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 25/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 26/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 27/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 28/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 29/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 30/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Channel 31/32 : 0
GS4845:1595021:1595114 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1
GS4845:1595021:1595114 [0] NCCL INFO P2P Chunksize set to 131072
GS4845:1595021:1595114 [0] NCCL INFO Connected all rings
GS4845:1595021:1595114 [0] NCCL INFO Connected all trees
GS4845:1595021:1595114 [0] NCCL INFO 32 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
GS4845:1595021:1595114 [0] NCCL INFO comm 0x180f5050 rank 0 nranks 1 cudaDev 0 nvmlDev 0 busId e1000 commId 0x73a7388a9bd37b96 - Init COMPLETE
Config(model_type=llama, num_layers=32, dim_model=4096, num_heads=32, num_kv_heads=32, dim_head=128, dim_ff=11008, vocab_size=32000, eps=1e-05, scale_weights=0, weight_transposed=0, dim_model_base=0, scale_depth=1, scale_emb=1, dtype=half, pos_bias_type=rotary, activate_fn=silu, rope_theta=10000, max_position_embeddings=4096)
Verify max_token failed! please adjust reserved_work_mem_mb to a bigger value.
Killed
卡是单张H100
The text was updated successfully, but these errors were encountered: