You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ParallelRunner allows users to run their program with multiprocessing with one extra argument and no extra code. In reality, this is rarely used. Should we continue developing it or lower the priority? We have many discussion about Runner, so the issue is created for facilitating and documentation mainly.
The ecosystem has evolved and solving the multiprocessing in their own way (I think pandas is still lagging behind, but polars kinda solved it)
ParallelRunner solve a subset of multiprocessing, to solve this realistically, user will need finer grain control and the current - ParallelRunner fails to do it. i.e. GPU specific workflow cannot be multi-process, you want the GPU training happen on one specific process while other process handle other non-GPU node (this sounds a bit familiar to the "group node" deployment problem but affect local development too)
async / CacheDataset` or kedro-accelerator seems to be a more practical way to speed up Kedro. I am not very up to date about async myself, maybe it's worth to put more effort on these instead of fixing ParallelRunner
Developement:
We had some discussion of using async to rewrite the Runners before to simplify the codebase. It's unclear yet what extra benefit do we get since we haven't discussed in details.
Description
ParallelRunner
allows users to run their program withmultiprocessing
with one extra argument and no extra code. In reality, this is rarely used. Should we continue developing it or lower the priority? We have many discussion aboutRunner
, so the issue is created for facilitating and documentation mainly.I dump a question in slack recently to see how the community thinks about it: https://linen-slack.kedro.org/t/16663577/do-you-use-kedro-run-runner-parallerunner-to-speed-up-your-p#99abccb0-7970-4a65-8fad-85fd22681beb
The ecosystem has evolved and solving the multiprocessing in their own way (I think pandas is still lagging behind, but polars kinda solved it)
On the other hand:
Developement:
async
to rewrite the Runners before to simplify the codebase. It's unclear yet what extra benefit do we get since we haven't discussed in details.kedro
, it can be installed in PyPi https://pypi.org/project/kedro-softfail-runner/.The text was updated successfully, but these errors were encountered: