You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am encountering a memory leak when serving multiple MXNet models behind the same endpoint in multi-model-server.
I am using 2 docker containers, with the multi-model-server docker image, and serving 4 models in each container.
Here are the relevant parts of my docker compose file:
The issue
There is massive memory leak. One would expect the memory to clear after each inference, but it keeps on adding and adding until the multi-model-server stops.
This issue does not occur when I use separate containers to serve each model, serving one model per container, like so:
Only 500MB memory is consumed per model in this case, which does not increase at all on multiple inferences.
But when serving multiple models, each inference uses extra memory, and the memory does not clear at all. The multi model server crashes after it runs out of memory.
The text was updated successfully, but these errors were encountered:
update: seems like this is an issue with multi model server itself. there was no memory leak when serving these models using a flask server.
Hope this is fixed soon
Description
I am encountering a memory leak when serving multiple MXNet models behind the same endpoint in multi-model-server.
I am using 2 docker containers, with the multi-model-server docker image, and serving 4 models in each container.
Here are the relevant parts of my docker compose file:
The issue
There is massive memory leak. One would expect the memory to clear after each inference, but it keeps on adding and adding until the multi-model-server stops.
This issue does not occur when I use separate containers to serve each model, serving one model per container, like so:
Only 500MB memory is consumed per model in this case, which does not increase at all on multiple inferences.
But when serving multiple models, each inference uses extra memory, and the memory does not clear at all. The multi model server crashes after it runs out of memory.
The text was updated successfully, but these errors were encountered: