FST is going down abruptly. #1264

kbramhendra · 2023-12-09T05:28:41Z

Hi , I am using FST for production kind of setup. I have built fst using #1218 branch and torch 1.14. The fst is going down abruptly without any particular reason. its not because of OOM issue neither any utterance is triggering it. @pkufool can you please suggest any ways to mitigate this.

danpovey · 2023-12-09T06:00:56Z

Would need much more information. Presumably a process is dying: what code is it running? How is it terminating, e.g. what signal? If python, there should be ways to catch the signal with a try-catch at the outer level of the code and report it before dying.

kbramhendra · 2023-12-09T06:05:23Z

hi thanks for replying...Its running on triton inference server with python. overall setup is there with kubernetes. How to stop this process from dying. There aren't any signals per say. Memory is fine both GPU and CPU and its in idle state.

danpovey · 2023-12-09T06:56:44Z

Inference-server stuff would normally be a sherpa issue, did you build that with sherpa? If so you should probably open an issue on the associated repo. IDK why you think this is specifically about the FST.
But when a process dies, either it exits or it dies by a signal. I'm not an expert on how to debug such things, and haven't used triton, but there should always be a way to track it down, e.g. get a stack trace. Perhaps some debug setting.

kbramhendra · 2023-12-09T07:04:02Z

Its not build with sherpa. I have 3 process running on the GPU. Encoder and CTC and FST modules. Encoder and CTC are onnx modules , these processes are still running only FST process is getting died down. All these are in docker setup, so its becoming difficult for track me to track it down. It exists suddenly. Can we prevent this from happening ?

danpovey · 2023-12-09T07:11:14Z

M that's tricky, but in principle it should be possible to reproduce it without docker for debugging purposes.

kbramhendra · 2023-12-09T07:26:07Z

yeah...I have been trying to reproduce it but couldn't succeed. I will try to share logs if i find any...if you find any such cases or solution in future please let me know. Thank you.

kbramhendra · 2024-03-27T04:58:57Z

Hi,
The issue was found to be in the triton memory management. Thanks for helping.

kbramhendra closed this as completed Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FST is going down abruptly. #1264

FST is going down abruptly. #1264

kbramhendra commented Dec 9, 2023

danpovey commented Dec 9, 2023

kbramhendra commented Dec 9, 2023 •

edited

Loading

danpovey commented Dec 9, 2023

kbramhendra commented Dec 9, 2023

danpovey commented Dec 9, 2023

kbramhendra commented Dec 9, 2023

kbramhendra commented Mar 27, 2024

FST is going down abruptly. #1264

FST is going down abruptly. #1264

Comments

kbramhendra commented Dec 9, 2023

danpovey commented Dec 9, 2023

kbramhendra commented Dec 9, 2023 • edited Loading

danpovey commented Dec 9, 2023

kbramhendra commented Dec 9, 2023

danpovey commented Dec 9, 2023

kbramhendra commented Dec 9, 2023

kbramhendra commented Mar 27, 2024

kbramhendra commented Dec 9, 2023 •

edited

Loading