Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting parallized workers in ArrayNode subNodes #4567

Merged
merged 16 commits into from
Dec 15, 2023

Conversation

hamersaw
Copy link
Contributor

@hamersaw hamersaw commented Dec 9, 2023

Tracking issue

NA

Why are the changes needed?

Hooking maptask executions (ie. ArrayNode implementation) into the core Flyte workflow evaluator is more heavy than current maptask subNode evaluations. This often results in significant performance degradation between the two implementations over similar workloads. To ensure the success and adoption of ArrayNode, performance needs to be at least on par; and with this PR it becomes significantly better.

What changes were proposed in this pull request?

This PR introduces a worker pool of go routines to parallelize I/O bound work during subNode evaluations. Foremost this means k8s Pod creation and blobstore operations (ex. validating outputs, etc) can be done for multiple subNodes simultaneously. The performance improvements with just 10 workers show a 4x improvement over ArrayNode without parallelization and often >3x over the current maptask implementation (while including significantly more functionality). This work is done partly as a PoC to push parallelization of all node evaluations in FlytePropeller for more widespread performance improvements over large workflows.

How was this patch tested?

Scale tested on EKS Flyte cluster, benchmarks to follow.

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

#4535

Docs link

NA

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Dec 9, 2023
@hamersaw hamersaw requested a review from pvditt December 9, 2023 01:14
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Copy link

codecov bot commented Dec 9, 2023

Codecov Report

Attention: 44 lines in your changes are missing coverage. Please review.

Comparison is base (1699094) 58.99% compared to head (2b59e98) 59.03%.

Files Patch % Lines
...ytepropeller/pkg/controller/nodes/array/handler.go 77.86% 23 Missing and 6 partials ⚠️
...lytepropeller/pkg/controller/nodes/array/worker.go 63.41% 10 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4567      +/-   ##
==========================================
+ Coverage   58.99%   59.03%   +0.04%     
==========================================
  Files         621      622       +1     
  Lines       52568    52682     +114     
==========================================
+ Hits        31014    31103      +89     
- Misses      19080    19097      +17     
- Partials     2474     2482       +8     
Flag Coverage Δ
unittests 59.03% <76.59%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pingsutw
pingsutw previously approved these changes Dec 11, 2023
Copy link
Member

@pingsutw pingsutw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, looking forward to parallelization of all node evaluations

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 11, 2023
@hamersaw hamersaw merged commit 398e5cb into master Dec 15, 2023
45 checks passed
@hamersaw hamersaw deleted the performance/arraynode-parallel-workers branch December 15, 2023 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants