Rethink how Kedro can play a role in multiprocessing / performance boost #3713

noklam · 2024-03-14T13:10:21Z

Description

ParallelRunner allows users to run their program with multiprocessing with one extra argument and no extra code. In reality, this is rarely used. Should we continue developing it or lower the priority? We have many discussion about Runner, so the issue is created for facilitating and documentation mainly.

I dump a question in slack recently to see how the community thinks about it: https://linen-slack.kedro.org/t/16663577/do-you-use-kedro-run-runner-parallerunner-to-speed-up-your-p#99abccb0-7970-4a65-8fad-85fd22681beb

The ecosystem has evolved and solving the multiprocessing in their own way (I think pandas is still lagging behind, but polars kinda solved it)

ParallelRunner solve a subset of multiprocessing, to solve this realistically, user will need finer grain control and the current - ParallelRunner fails to do it. i.e. GPU specific workflow cannot be multi-process, you want the GPU training happen on one specific process while other process handle other non-GPU node (this sounds a bit familiar to the "group node" deployment problem but affect local development too)
- Synthesis of research related to deployment of Kedro to modern MLOps platforms #3094

On the other hand:

async / CacheDataset` or kedro-accelerator seems to be a more practical way to speed up Kedro. I am not very up to date about async myself, maybe it's worth to put more effort on these instead of fixing ParallelRunner

Developement:

We had some discussion of using async to rewrite the Runners before to simplify the codebase. It's unclear yet what extra benefit do we get since we haven't discussed in details.
New Runner? Add documentation for the soft-fail runner if/when it is released in a supported version. #2716 , we have a new runner created last year but haven't merged it back to kedro, it can be installed in PyPi https://pypi.org/project/kedro-softfail-runner/.

The text was updated successfully, but these errors were encountered:

noklam mentioned this issue Mar 15, 2024

Proposal for Partial/Custom node ordering for SequentialRunner #3717

Closed

noklam mentioned this issue Mar 25, 2024

Viz hook is broken with ParallelRunner [Blocked by Framework] kedro-org/kedro-viz#1801

Open

1 task

github-actions bot mentioned this issue Apr 1, 2024

Monthly issue metrics report #3764

Open

merelcht mentioned this issue Nov 1, 2024

Can we remove ParallelRunner? #4291

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethink how Kedro can play a role in multiprocessing / performance boost #3713

Rethink how Kedro can play a role in multiprocessing / performance boost #3713

noklam commented Mar 14, 2024

Rethink how Kedro can play a role in multiprocessing / performance boost #3713

Rethink how Kedro can play a role in multiprocessing / performance boost #3713

Comments

noklam commented Mar 14, 2024

Description