Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use asyncio to manager workers #100

Open
AndreasHeger opened this issue Oct 26, 2018 · 2 comments
Open

use asyncio to manager workers #100

AndreasHeger opened this issue Oct 26, 2018 · 2 comments

Comments

@AndreasHeger
Copy link
Collaborator

AndreasHeger commented Oct 26, 2018

This will remove dependency on gevent, but this will break python 2.7 compatibility.

@jbarlow83
Copy link
Contributor

So no real loss then? 😈

I briefly experimented with implementing a "ruffus-lite" based on asyncio, and then I tried again with either curio or trio. If you are serious I think it is worth looking at those as well, for reasons outlined in the article linked for trio. The biggest issue of all is that asyncio is not good for CPU bound Python code because it's single threaded, you can't escape with multithreading because of the GIL, and it doesn't have any equivalent to multiprocessing.

That was okay for me when I tried my experiment, because ocrmypdf (my application that uses ruffus, and the only open source dependent of ruffus I know of) is mostly "process.wait()-bound". ocrmypdf kicks off other programs and waits for results. But as it matures, it's been accumulating more in house functionality and relies on ruffus+multiprocessing for performance.

You can kick off an asyncio child process that also runs Python, but it's not a pretty sight. https://docs.python.org/3/library/asyncio-protocol.html#subprocess-protocols

concurrent.futures might also be worth investigating too.

@AndreasHeger
Copy link
Collaborator Author

Hi @jbarlow83 , my thinking was to replace the cooperative multitasking pool in gevent with something based on asyncio - not to change the workflow part.
This is for the use-case where tasks just fire off a job to a cluster but want to make sure that the workflow main process does not take more than a specificied number of CPU resources on the submit host.
Gevent is doing this just fine, but this is a reminder to see if there is a python standard library alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants