-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jupyter Kernel Restart does not release the RAM usage by the kernel running in JEG kubernetes cluster #1195
Comments
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗 |
Hi @sharmasw - I'm not familiar with how/when resources are deallocated surrounding a pod's lifecycle. I guess the information you provide is not too surprising. When kernels are restarted, we retain the namespace and give the new pod the same name as the previous (because the kernel_id is also preserved). I imagine this might be why k8s defers the cleanup that you observe, and might transfer the resources to the new pod provided it's on the same node as the previous. Can you share your resource configuration in case others want to look into this? Are these specified as limits or requests, and via envs, or just configured directly into the pod's launch script? Does anyone else know how resources are deallocated in k8s? @lresende, @rahul26goyal If we can make that determination, we can possibly update |
Hi @sharmasw |
Pod names are preserved across restarts. By default, they are composed of kernel username and kernel id, both of which are static values in this context. |
Hi @kevin-bates could you elaborate on what could actually be done for explicitly deallocating the resource? We looked into the Kubernetes python library and did not find any documentation or function that talks about deallocating unused resources from a given pod. |
Hi @sharmasw - well, I'm afraid you answered the question. If the API does not expose a means to deallocate resources sooner, I'm not sure there's much we can do. Had there been a way to address this via the API, we could introduce those calls into This behavior implies that resources may be indexed by pod name (and probably namespace) - which seems very odd. I just confirmed that the Docker container ID changes across restarts - so it's definitely a different instance.
Since the pod name (and namespace) are the same, perhaps the resources are treated as high-water marks or something. (This is definitely the kind of thing that is difficult to locate w/o knowing the code or how the scheduler works as its probably not an ordinary use-case.) |
Description
We have a JEG running in the Kubernetes cluster when we spawn a pod to execute a jupyter notebook, everything works well, but when the user restarts the kernel the RAM of the kernel does not get released by the pod immediately. We either have to wait for an indefinite time for it to get released, if we continue using it, eventually it goes out of memory and Kubernetes kills the pod.
Screenshots / Logs
Start of the Kernel:
After executing some commands:
1st restart:
immediately 2nd restart without executing any code:
3rd restart without executing any code:
Now If we wait for some indefinite time (for this example it took 4 minutes) and it will release the memory:
Any clue or suggestion as to why this behavior, we just want to release all the RAM utilized post restart action is performed.
Environment
Resource configuration
But we have other configs as well like
The text was updated successfully, but these errors were encountered: