-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting parallized workers in ArrayNode subNodes #4567
Conversation
… on ArrayNode state Signed-off-by: Daniel Rammer <[email protected]>
…atency Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #4567 +/- ##
==========================================
+ Coverage 58.99% 59.03% +0.04%
==========================================
Files 621 622 +1
Lines 52568 52682 +114
==========================================
+ Hits 31014 31103 +89
- Misses 19080 19097 +17
- Partials 2474 2482 +8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, looking forward to parallelization of all node evaluations
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Tracking issue
NA
Why are the changes needed?
Hooking maptask executions (ie. ArrayNode implementation) into the core Flyte workflow evaluator is more heavy than current maptask subNode evaluations. This often results in significant performance degradation between the two implementations over similar workloads. To ensure the success and adoption of ArrayNode, performance needs to be at least on par; and with this PR it becomes significantly better.
What changes were proposed in this pull request?
This PR introduces a worker pool of go routines to parallelize I/O bound work during subNode evaluations. Foremost this means k8s Pod creation and blobstore operations (ex. validating outputs, etc) can be done for multiple subNodes simultaneously. The performance improvements with just 10 workers show a 4x improvement over ArrayNode without parallelization and often >3x over the current maptask implementation (while including significantly more functionality). This work is done partly as a PoC to push parallelization of all node evaluations in FlytePropeller for more widespread performance improvements over large workflows.
How was this patch tested?
Scale tested on EKS Flyte cluster, benchmarks to follow.
Setup process
Screenshots
Check all the applicable boxes
Related PRs
#4535
Docs link
NA