-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add driver and executor pod template fields to SparkJob #4179
Conversation
Signed-off-by: Andrew Dye <[email protected]>
@eapolinario i see that one +1 was enough to merge this pr. should we continue to make idl a +2? I think we should. not sure if there was resolution to @hamersaw's comment. |
As far as I can tell CODEOWNERS does not support that feature (as per https://github.com/orgs/community/discussions/22522). I opened #4182 to track this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we follow the same pattern as tensorflow.proto
like
message SparkJob {
...
sparkReplicaSpec driver = 11;
sparkReplicaSpec worker = 12;
}
message sparkReplicaSpec {
oneof PodValue {
core.K8sPod podTemplate = 1;
}
string PodTemplateName = 2;
string image = 3;
...
}
Maybe we should put repliaSpec to common.proto
, all other plugins can import this proto in the future, like Ray.
# common.proto
message ReplicaSpec {
oneof PodValue {
core.K8sPod podTemplate = 1;
}
string PodTemplateName = 2;
string image = 3;
...
}
cc @eapolinario @wild-endeavor @hamersaw wdyt
Wanted to also cross reference this comment thread from the original PR, proposing using the @task(
task_config=Spark(
spark_conf={
"spark.driver.memory": "4G",
"spark.executor.memory": "8G",
},
# driver only pod template overrides (takes precedence over top level pod_template)
driver_pod_template=...
),
# default pod template overrides for BOTH driver and executor
pod_template=...
) |
Pulling together references from existing leader/worker patterns for various plugins
It does seem like a good idea to establish a pattern (and shared definitions if appropriate) |
Signed-off-by: Andrew Dye <[email protected]>
Codecov ReportAll modified lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4179 +/- ##
===========================================
+ Coverage 59.44% 78.48% +19.03%
===========================================
Files 637 18 -619
Lines 54262 1250 -53012
===========================================
- Hits 32254 981 -31273
+ Misses 19467 212 -19255
+ Partials 2541 57 -2484
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Based on comments from the initial PR thread and above, I've moved pod and pod template name behind a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 but is this the right repo @eapolinario ?
RoleSpec driverSpec = 10; | ||
// The executor spec, used in place of the task's pod template | ||
RoleSpec executorSpec = 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there ever a scenario where both of these differ from the values set in the task decorator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what you mean here. With this change we will fail
@task(
pod_template=...
)
in favor of specifying one or both values in
@task(
task_config=Spark(
executor_pod_template=...
driver_pod_template=...
)
)
In theory I think this is a great idea. But in practice I would bet there will be corner case plugins where not all options are supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory I think this is a great idea. But in practice I would bet there will be corner case plugins where not all options are supported
Are you thinking of initial fields in RoleSpec
or future ones? If we believe that K8sPod is the correct abstraction for overriding parts of the default pod template, then including it as part of the role spec seems appropriate. I could imagine, however, additional role-specific configuration desired over time that wouldn't fit into a common RoleSpec
definition. Thoughts?
RoleSpec driverSpec = 10; | ||
// The executor spec, used in place of the task's pod template | ||
RoleSpec executorSpec = 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what you mean here. With this change we will fail
@task(
pod_template=...
)
in favor of specifying one or both values in
@task(
task_config=Spark(
executor_pod_template=...
driver_pod_template=...
)
)
Given some of the outstanding questions for Spark, the pattern for multi-role plugins more broadly, and the deprecation/migration story, let me close this out until we get alignment. For now Spark plugin can use the existing TaskTemplate K8sPod field to override tolerations for both driver and executor pods, which is complimentary to the existing interface for node selectors: |
Tracking issue
#4105
Describe your changes
This change defines a common pattern for per-role pod configuration and adds RoleSpec fields for driver and executor pods, accordingly.
RoleSpec
to common.proto with initial fields forpod
andpod_template_name
. This mirrors the interface we have for container tasks spread across TaskTemplate and TaskMetadata. Fields are optional.driverSpec
andexecutorSpec
to SparkJobThis PR was manually migrated from flyteorg/flyteidl#450. See initial comment discussions there.
How should a plugin handle TaskTemplate/TaskMetadata pod templates vs. these RoleSpecs?
For Spark, the task-level pod template is not consumed today and can be treated as an error if set (not supported). The plugin can inspect optional per-role specs for configuration.
For existing plugins that consume the task-level pod template (Dask, MPI, PyTorch, Ray), they can continue doing so. If/when we are ready to introduce per-role pod configuration, we can add
RoleSpec
fields to the plugin's config, deprecate the task-level pod template, and rely on a runtime check to validate that only task-level pod template OR per-role pod templates are specified.Check all the applicable boxes