-
Notifications
You must be signed in to change notification settings - Fork 591
Fix async service cleanup in LocalBackend.close() #311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@JonesAndrew could you take a look at this? I think you played with the closing logic at one point. |
|
I would also propose to finish wandb active runs inside |
|
@giladfrid009 makes sense, would you mind opening a PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pr!
src/art/local/backend.py
Outdated
| def __enter__(self): | ||
| return self | ||
|
|
||
| def __exit__(self, *excinfo): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd like to make this an aexit if it needs to be async, just so no one tries to use the context manager in an event loop and get confused at why it doesn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'll get it out shortly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bradhilton Do you know if your close changes interact here at all? I might be a little sleepy right now, but this pr might not be needed anymore, but wondering if you have more context with the change you just made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, I think I addressed the issue
The close() method was calling service.close() without checking if it was async, causing resource leaks when services had async cleanup methods. This fix: - Makes LocalBackend.close() async to match the base Backend class - Properly awaits async service close methods - Maintains backward compatibility for sync close methods - Handles __exit__ context manager compatibility Prevents GPU memory leaks and zombie processes in production deployments.
…nforce async usage inside event loop\n\n- Implement __aenter__/__aexit__ to properly await cleanup\n- Make __exit__ raise if used under a running event loop, guiding to 'async with'\n- Keep sync 'with' working when no loop is running by calling asyncio.run(close())
… manager (__aenter__/__aexit__) for LocalBackend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check
Summary
LocalBackend.close()that was causing service close methods to not execute properly when they were asyncLocalBackend.close()async to properly handle async service cleanup and match the baseBackendclass interface__exit__method to handle both sync and async contexts appropriatelyProblem
The
close()method was synchronous but attempted to callclose()on services that may have async close methods. This meant:Solution
LocalBackend.close()async and added proper async/sync detection__exit__method to handle event loop contexts correctlyImpact
This fix prevents:
Critical for production stability and cost management in ML training environments.
Test Plan