-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core feature] Refactor distributed job using common ReplicaSpec #4408
[Core feature] Refactor distributed job using common ReplicaSpec #4408
Comments
IMO there needs to be more discussion here. There are certainly many configuration parameters that will differ between
|
#take |
Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
…uf changes Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
…uf changes Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
…uf changes Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
…uf changes Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
…uf changes Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
…actor Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
…#5355) * feat(proto): Define CommonReplicaSpec in common.proto Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]> * chore(proto): Generate new clients corresponding to proto changes Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]> * feat(replica-spec): Update corresponding golang files based on protobuf changes Resolves: flyteorg#4408 Signed-off-by: Chi-Sheng Liu <[email protected]> --------- Signed-off-by: Chi-Sheng Liu <[email protected]>
…actor Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Motivation: Why do you think this is important?
Right now, different types of distributed jobs such as tensorflow, PyTorch, ... all have their own ReplicaSpec. Based on this discussion thread #4179 (review), we can have a shared ReplicaSpec in common.proto so that all types of distributed jobs can leverage it.
Goal: What should the final outcome look like, ideally?
In common.proto, we have a ReplicaSpec like
and all types of distributed jobs (tensorflow, PyTorch, ray, ...) share it.
Describe alternatives you've considered
Stay what we have now. That is, all types of distributed job have their own ReplicaSpec.
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: