run.py failed for gpt #775

riyaj8888 · 2023-12-29T10:53:21Z

during Running run.py for gpt2-medium i am getting following error.

RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in cub::DeviceSegmentedRadixSort::SortPairsDescending(nullptr, cub_temp_storage_size, log_probs, (T*) nullptr, id_vals, (int*) nullptr, vocab_size * batch_size, batch_size, begin_offset_buf, offset_buf + 1, 0, sizeof(T) * 8, stream): invalid device function (/code/tekit/cpp/tensorrt_llm/kernels/samplingTopPKernels.cu:1077)

juney-nvidia · 2023-12-30T13:01:05Z

@riyaj8888

Hi, can you share the concrete steps of reproducing this error?

I have suspects that there might be some issues with your installation/build process of TensorRT-LLM, after seeing more concrete steps of reproducing it, it is easier for us to provide help.

June

riyaj8888 · 2024-01-02T07:20:40Z

My run.py unable to find the config.json .
When does it gets created?
During build or conversion of ckpts?

riyaj8888 · 2024-01-02T07:24:13Z

:/app/tensorrt_llm/examples/gpt$ python3 ../run.py --max_output_len=450 --no_add_special_tokens --engine_dir engine.outputs
[01/02/2024-06:45:01] [TRT-LLM] [W] Found pynvml==11.4.1. Please use pynvml>=11.5.0 to get accurate memory usage
Traceback (most recent call last):
File "/app/tensorrt_llm/examples/gpt/../run.py", line 390, in
main(args)
File "/app/tensorrt_llm/examples/gpt/../run.py", line 276, in main
model_name = read_model_name(args.engine_dir)
File "/app/tensorrt_llm/examples/utils.py", line 54, in read_model_name
engine_version = tensorrt_llm.builder.get_engine_version(engine_dir)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/builder.py", line 524, in get_engine_version
with open(config_path, 'r') as f:

hello-11 · 2024-11-18T02:42:07Z

@riyaj8888 Do you still have the problem? If not, we will close it soon.

nv-guomingz · 2024-12-04T10:13:37Z

Feel free to reopen it if needed.

juney-nvidia self-assigned this Dec 30, 2023

juney-nvidia added question Further information is requested triaged Issue has been triaged by maintainers labels Dec 30, 2023

hello-11 added the stale label Nov 18, 2024

nv-guomingz closed this as completed Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run.py failed for gpt #775

run.py failed for gpt #775

riyaj8888 commented Dec 29, 2023

juney-nvidia commented Dec 30, 2023

riyaj8888 commented Jan 2, 2024

riyaj8888 commented Jan 2, 2024

hello-11 commented Nov 18, 2024

nv-guomingz commented Dec 4, 2024

run.py failed for gpt #775

run.py failed for gpt #775

Comments

riyaj8888 commented Dec 29, 2023

juney-nvidia commented Dec 30, 2023

riyaj8888 commented Jan 2, 2024

riyaj8888 commented Jan 2, 2024

hello-11 commented Nov 18, 2024

nv-guomingz commented Dec 4, 2024