Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FST is going down abruptly. #1264

Closed
kbramhendra opened this issue Dec 9, 2023 · 7 comments
Closed

FST is going down abruptly. #1264

kbramhendra opened this issue Dec 9, 2023 · 7 comments

Comments

@kbramhendra
Copy link

Hi , I am using FST for production kind of setup. I have built fst using #1218 branch and torch 1.14. The fst is going down abruptly without any particular reason. its not because of OOM issue neither any utterance is triggering it. @pkufool can you please suggest any ways to mitigate this.

@danpovey
Copy link
Collaborator

danpovey commented Dec 9, 2023

Would need much more information. Presumably a process is dying: what code is it running? How is it terminating, e.g. what signal? If python, there should be ways to catch the signal with a try-catch at the outer level of the code and report it before dying.

@kbramhendra
Copy link
Author

kbramhendra commented Dec 9, 2023

hi thanks for replying...Its running on triton inference server with python. overall setup is there with kubernetes. How to stop this process from dying. There aren't any signals per say. Memory is fine both GPU and CPU and its in idle state.

@danpovey
Copy link
Collaborator

danpovey commented Dec 9, 2023

Inference-server stuff would normally be a sherpa issue, did you build that with sherpa? If so you should probably open an issue on the associated repo. IDK why you think this is specifically about the FST.
But when a process dies, either it exits or it dies by a signal. I'm not an expert on how to debug such things, and haven't used triton, but there should always be a way to track it down, e.g. get a stack trace. Perhaps some debug setting.

@kbramhendra
Copy link
Author

Its not build with sherpa. I have 3 process running on the GPU. Encoder and CTC and FST modules. Encoder and CTC are onnx modules , these processes are still running only FST process is getting died down. All these are in docker setup, so its becoming difficult for track me to track it down. It exists suddenly. Can we prevent this from happening ?

@danpovey
Copy link
Collaborator

danpovey commented Dec 9, 2023

M that's tricky, but in principle it should be possible to reproduce it without docker for debugging purposes.

@kbramhendra
Copy link
Author

yeah...I have been trying to reproduce it but couldn't succeed. I will try to share logs if i find any...if you find any such cases or solution in future please let me know. Thank you.

@kbramhendra
Copy link
Author

Hi,
The issue was found to be in the triton memory management. Thanks for helping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants