-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
always be killed when build TensorRT engine #743
Comments
@Burning-XX I think this error is caused by insufficient CPU memory size. |
you mean GPU memory or CPU memory? |
what param could I change to make it success, if I do not have enough memory |
@Burning-XX I don't know which branch you are using, I solved this problem according to #102 (comment) under version 0.5.0. |
How much CPU memory do you have in your system? |
Hello, I am seeing this and have this for nvidia-smi: +---------------------------------------------------------------------------------------+ and have for top: top - 14:43:49 up 17:19, 0 users, load average: 0.00, 0.00, 0.02
I don't think memory is my issue and building the engine is always killed: python ../llama/build.py --model_dir ./Mixtral-8x7B-v0.1 Can you help me understand why? Am I doing something wrong? I am running in the docker and it built fine. |
I was doing this with mixtral, but tried with qwen and I am seeing the same issue: |
32G |
also release 0.5.0 |
Could you try on machine with larger RAM? Also, we suggest trying latest main branch or release branch. |
It would be good to get requirements of hardware per model. Errors are unclear. I experimented hoping for lower memory consumption, which was highly inefficient. |
@Burning-XX Do you still have the problem? If not, we will close it soon. |
Same issue when creating checkpoint for mistral python3 ${CONVERT_CHKPT_SCRIPT} --model_dir ${LLAMA_MODEL} --output_dir ${UNIFIED_CKPT_PATH} --dtype float16 [TensorRT-LLM] TensorRT-LLM version: 0.12.0.dev2024080600 |
I try to run llama-7b with TensorRT-LLM, when build TensorRT engine as follows:
python3 build.py --model_dir /opt/llms/llama-7b
--dtype float16
--remove_input_padding
--use_gpt_attention_plugin float16
--enable_context_fmha
--use_gemm_plugin float16
--use_inflight_batching
--output_dir /opt/trtModel/llama/1-gpu
but the program always be killed, I am confused。
The text was updated successfully, but these errors were encountered: