You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we start serve_reward_model.py and run annotation, the server goes down during processing. It will crash on specific samples. These samples have a long context.
Describe the bug
When we start
serve_reward_model.py
and run annotation, the server goes down during processing. It will crash on specific samples. These samples have a long context.error.log
What we did
nvidia/Llama2-13B-SteerLM-RM
, but ran into the same issue.nvcr.io/nvidia/nemo:24.05.01
(critic speedup #219 is the main difference.).nvcr.io/nvidia/nemo:24.05.01
) to 7 hours (nvcr.io/nvidia/nemo:24.07
).Steps/Code to reproduce bug
Before run
attribute_annotate.py
, you should apply #350Expected behavior
The process is completed without the server going down.
Environment overview (please complete the following information)
nvcr.io/nvidia/nemo:24.07
Environment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
Additional context
Add any other context about the problem here.
Example: GPU model
The text was updated successfully, but these errors were encountered: