Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink how Kedro can play a role in multiprocessing / performance boost #3713

Open
noklam opened this issue Mar 14, 2024 · 0 comments
Open

Comments

@noklam
Copy link
Contributor

noklam commented Mar 14, 2024

Description

ParallelRunner allows users to run their program with multiprocessing with one extra argument and no extra code. In reality, this is rarely used. Should we continue developing it or lower the priority? We have many discussion about Runner, so the issue is created for facilitating and documentation mainly.

I dump a question in slack recently to see how the community thinks about it: https://linen-slack.kedro.org/t/16663577/do-you-use-kedro-run-runner-parallerunner-to-speed-up-your-p#99abccb0-7970-4a65-8fad-85fd22681beb

The ecosystem has evolved and solving the multiprocessing in their own way (I think pandas is still lagging behind, but polars kinda solved it)

  • ParallelRunner solve a subset of multiprocessing, to solve this realistically, user will need finer grain control and the current - ParallelRunner fails to do it. i.e. GPU specific workflow cannot be multi-process, you want the GPU training happen on one specific process while other process handle other non-GPU node (this sounds a bit familiar to the "group node" deployment problem but affect local development too)

On the other hand:

  • async / CacheDataset` or kedro-accelerator seems to be a more practical way to speed up Kedro. I am not very up to date about async myself, maybe it's worth to put more effort on these instead of fixing ParallelRunner

Developement:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant