Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run.py failed for gpt #775

Closed
riyaj8888 opened this issue Dec 29, 2023 · 5 comments
Closed

run.py failed for gpt #775

riyaj8888 opened this issue Dec 29, 2023 · 5 comments
Assignees
Labels
question Further information is requested stale triaged Issue has been triaged by maintainers

Comments

@riyaj8888
Copy link

during Running run.py for gpt2-medium i am getting following error.

RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in cub::DeviceSegmentedRadixSort::SortPairsDescending(nullptr, cub_temp_storage_size, log_probs, (T*) nullptr, id_vals, (int*) nullptr, vocab_size * batch_size, batch_size, begin_offset_buf, offset_buf + 1, 0, sizeof(T) * 8, stream): invalid device function (/code/tekit/cpp/tensorrt_llm/kernels/samplingTopPKernels.cu:1077)

@juney-nvidia
Copy link
Collaborator

@riyaj8888

Hi, can you share the concrete steps of reproducing this error?

I have suspects that there might be some issues with your installation/build process of TensorRT-LLM, after seeing more concrete steps of reproducing it, it is easier for us to provide help.

June

@juney-nvidia juney-nvidia self-assigned this Dec 30, 2023
@juney-nvidia juney-nvidia added question Further information is requested triaged Issue has been triaged by maintainers labels Dec 30, 2023
@riyaj8888
Copy link
Author

My run.py unable to find the config.json .
When does it gets created?
During build or conversion of ckpts?

@riyaj8888
Copy link
Author

:/app/tensorrt_llm/examples/gpt$ python3 ../run.py --max_output_len=450 --no_add_special_tokens --engine_dir engine.outputs
[01/02/2024-06:45:01] [TRT-LLM] [W] Found pynvml==11.4.1. Please use pynvml>=11.5.0 to get accurate memory usage
Traceback (most recent call last):
File "/app/tensorrt_llm/examples/gpt/../run.py", line 390, in
main(args)
File "/app/tensorrt_llm/examples/gpt/../run.py", line 276, in main
model_name = read_model_name(args.engine_dir)
File "/app/tensorrt_llm/examples/utils.py", line 54, in read_model_name
engine_version = tensorrt_llm.builder.get_engine_version(engine_dir)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/builder.py", line 524, in get_engine_version
with open(config_path, 'r') as f:

@hello-11
Copy link
Collaborator

@riyaj8888 Do you still have the problem? If not, we will close it soon.

@hello-11 hello-11 added the stale label Nov 18, 2024
@nv-guomingz
Copy link
Collaborator

Feel free to reopen it if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested stale triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants