-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Update files with respect to common ReplicaSpec refactor #2424
base: master
Are you sure you want to change the base?
feat: Update files with respect to common ReplicaSpec refactor #2424
Conversation
This is a PR for you to test evaluator. |
4a076c6
to
94e8e43
Compare
94e8e43
to
7b5ea22
Compare
df30c4f
to
5e21058
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2424 +/- ##
===========================================
- Coverage 79.24% 58.31% -20.94%
===========================================
Files 196 250 +54
Lines 19785 22092 +2307
Branches 4008 4006 -2
===========================================
- Hits 15678 12882 -2796
- Misses 3407 8700 +5293
+ Partials 700 510 -190 ☔ View full report in Codecov by Sentry. |
@MortalHappiness , can you merge master to get rid of the dbt failures? In order to fix the CI failures for kf-mpi and kf-tensorflow you'll need to add a line similar to this to https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-kf-pytorch/dev-requirements.in and create the corresponding |
e2d3a4b
to
d79a6f1
Compare
426f72c
to
443f8b1
Compare
443f8b1
to
a8cb8d1
Compare
d263f73
to
5f38258
Compare
…actor Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
Resolves: flyteorg/flyte#4408 Signed-off-by: Chi-Sheng Liu <[email protected]>
5f38258
to
a972abb
Compare
common=plugins_common.CommonReplicaSpec(replicas=self.max_nodes), | ||
# The following fields are deprecated. They are kept for backwards compatibility. | ||
replicas=self.max_nodes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the replica
arg stay duplicated here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? To ensure backward-compatibility, both replicas
and common.replicas
need to be sent to the backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the respective backend plugin we already distinguish between taskTemplate.TaskTypeVersion == 0/1
, see here. We'll have to add backwards compatibility for the refactoring in this PR there as well, right?
I wonder whether removing the duplication in the proto definitions is worth having to check for backwards compatibility in flytekit and flyteplugins. While it might have been better to share the replica spec from the beginning, maybe it's now better to leave as is?
Happy to be convinced otherwise!! 🙏
@pingsutw fyi, let's maybe discuss in the contrib sync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fg91 We have duplicated replica
here to maintain backward compatibility. If people only upgrade flytekit and don't upgrade flyte backend, it should still work, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test it? If it's duplicated, it should but let's definitely test it.
My personal opinion is that maybe we should have used a shared replica in the first place but duplicating the replica now feels more cluttered than leaving separate replicas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's please explicitly discuss this before we decide to merge.
Signed-off-by: Kevin Su <[email protected]>
Tracking issue
Resolves: flyteorg/flyte#4408
Why are the changes needed?
flyteorg/flyte#5355 changes protobuf files, so we need to update the corresponding files in flytekit.
What changes were proposed in this pull request?
Update files with respect to common ReplicaSpec refactor.
How was this patch tested?
Setup process
In
flyte
repomake compile
flytectl demo start --dev
kubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone?ref=v1.7.0"
POD_NAMESPACE=flyte ./flyte start --config kubeflow.yaml
where
kubeflow.yaml
isIn the parent folder of
flyte
andflytekit
repoDockerfile
docker buildx build -t localhost:30000/flytekit:dev --file Dockerfile --push .
In an arbitrary folder
kubeflow_tf_evaluator.py
pyflyte run --remote --image localhost:30000/flytekit:dev kubeflow_tf_evaluator.py my_tensorflow_task --x 100 --y acc
Test backward compatibility
master
branch inflytekit
repo.docker buildx build -t localhost:30000/flytekit:dev --file Dockerfile --push .
(In the parent folder offlyte
andflytekit
folder)pyflyte run --remote --image localhost:30000/flytekit:dev kubeflow_tf_evaluator.py my_tensorflow_task --x 100 --y acc
Screenshots
Note that the worker replica is 2.
Check all the applicable boxes
Related PRs
flyteorg/flyte#5355
Docs link