-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allow custom exit function through dependency injection #389
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a problem in this, but let's wait for @janbuchar opinion as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this initiative! In all honesty, I'm not sure if we should call sys.exit
at all, but this seems like a nice, non-breaking change. But there's stuff to discuss first.
src/apify/_actor.py
Outdated
@@ -231,17 +231,19 @@ async def exit( | |||
event_listeners_timeout: timedelta | None = EVENT_LISTENERS_TIMEOUT, | |||
status_message: str | None = None, | |||
cleanup_timeout: timedelta = timedelta(seconds=30), | |||
sys_exit: Callable = sys.exit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be passed in when the context manager is used (async with Actor
)? Otherwise, I'd be more inclined to just not calling sys.exit
from Actor.exit
. From my point of view, the proposed solution just lets you do Actor.exit(sys_exit=something)
instead of Actor.exit(); something()
.
if is_running_in_ipython(): | ||
self.log.debug(f'Not calling sys.exit({exit_code}) because Actor is running in IPython') | ||
self.log.debug(f'Not calling sys_exit({exit_code}) because Actor is running in IPython') | ||
elif os.getenv('PYTEST_CURRENT_TEST', default=False): # noqa: PLW1508 | ||
self.log.debug(f'Not calling sys.exit({exit_code}) because Actor is running in an unit test') | ||
self.log.debug(f'Not calling sys_exit({exit_code}) because Actor is running in an unit test') | ||
elif hasattr(asyncio, '_nest_patched'): | ||
self.log.debug(f'Not calling sys.exit({exit_code}) because Actor is running in a nested event loop') | ||
self.log.debug(f'Not calling sys_exit({exit_code}) because Actor is running in a nested event loop') | ||
else: | ||
sys.exit(exit_code) | ||
sys_exit(exit_code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this logic could be simplified a fair bit.
- we don't need to check if we're running under pytest - we can instead pass in a mock
sys_exit
- didn't @vdusek manage to remove the nesting?
- we probably still have to check for ipython 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the nest-asyncio
...
We managed to resolve the problem of executing the Actor & Spider together. Here is more context and the core result structure - scrapy/scrapy#6639 (reply in thread).
However, there's still one remaining challenge - executing asyncio coroutines from the synchronous Scheduler (and HTTP cache storage) code. This is in-progress.
Once that's addressed, I'll open a new PR removing nest-asyncio
completely.
Just for a context, I agree the library probably shouldn't call But I'm just a passing person who happened to need this in order to contribute to solving the nested loops 😄 I'm already deep in the mud over there. I can't work on two mud pools in parallel. Hence I've made this minimal change, with pincers and gloves, and I send it to you as a gift 💝 If I was a maintainer of the code, I'd take a bulldozer and yeah, sure, after a week of work and a few breaking changes, there would be no |
Co-authored-by: Jan Buchar <[email protected]>
And we will forever be grateful for that. If you don't have the resources to work on this further, we can totally take over and polish it, but we value any input that you can provide.
I am a maintainer and I'm still not sure we should remove it completely, so let's try and come up with a nice incremental upgrade for now. |
Mr. @vdusek, I see what you did there! 😂 ![]() If you're okay to institutionalize my use case like this, I'm happy to close this 😄 Regarding about what's right and wrong... my hunch would be that the methods do too many things. They finalize and close the actor's resources, and they take care of ending the program at the same time. In some contexts, it's handy that I don't have to deal with the exit code myself (well, is it? more on that later), but in other contexts, it's a problem that I cannot opt out. So I think those should be separate steps:
Hmm, and do we need ending the program at all? If Python program raises, it ends with non-zero exit code by default. Do we need to handle the program termination, or do we leave it up to the user of the SDK? I don't see a good reason to handle it at all, tbh. I struggle to see the added value, but I don't have a scraping platform and can't read user support and don't know about all the use cases. To me, ending the program as it is now feels like "we'll do this as a convenience to the user", but given the number of ifs for pytest, ipython... and the fact that raising an exception results in the same thing, I feel like in the end it's more of an inconvenience, which needs workarounds. |
FYI the JS implementation allows to override the exit code explicitly, as well as opt out of the exit call completely, we should have it here too. https://github.com/apify/apify-sdk-js/blob/master/packages/apify/src/actor.ts#L1978-L1981 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking into account the JS version actor.ts#L1978-L1981, I would say we should align the Py-SDK behavior with it. Which means add a flag for turning off the calling of sys.exit
, rather than adding an custom exit function.
In some contexts it is not desirable to call
sys.exit()
directly from the Actor object. This is already apparent from the surrounding code, which has to deal with such situations: IPython, pytest, or nested loop. I found yet another such situation: Twisted reactor.I think this leads to a refactoring where the
fail()
andexit()
functions are not responsible for ending the program, as explicit is better than implicit. However, such change would be a larger task I'm not up to, and it would possibly be a breaking change. I'll let the maintainers to discuss and decide the best solution.My change is minimal and allows an escape route well within the existing API. If we use dependency injection and move
sys.exit
from being hardcoded to being a parameter of bothfail()
andexit()
, the interface stays the same for existing code, but I'd get the ability to call these methods in contexts, where I don't want them to terminate the program:Currently, the above code won't print
Done!
, because the program ends and there is no straightforward way to prevent it. I invented a workaround like this:If this PR is accepted and a new version of SDK released, I could do this:
In tests, for example, it would be possible to pass a mock and check that the
sys_exit
function got called with a correct exit code, etc.