-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt ray flyteplugin to Kuberay 1.1.0 #5067
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5067 +/- ##
==========================================
+ Coverage 59.00% 59.11% +0.11%
==========================================
Files 645 645
Lines 55672 55546 -126
==========================================
- Hits 32847 32837 -10
+ Misses 20230 20120 -110
+ Partials 2595 2589 -6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
7c5f67f
to
5eb5086
Compare
@pingsutw @Yicheng-Lu-llll could you provide guidance on how to resolve the |
// RuntimeEnvYAML represents the runtime environment configuration | ||
// provided as a multi-line YAML string. | ||
string runtime_env_yaml = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// RuntimeEnvYAML represents the runtime environment configuration | |
// provided as a multi-line YAML string. | |
string runtime_env_yaml = 5; | |
// RuntimeEnvYAML represents the runtime environment configuration | |
// provided as a multi-line YAML string. | |
string runtime_env_yaml = 5; |
@@ -22,7 +22,6 @@ var ( | |||
IncludeDashboard: true, | |||
DashboardHost: "0.0.0.0", | |||
EnableUsageStats: false, | |||
KubeRayCrdVersion: "v1alpha1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ByronHsu after this MR is complete is there no going back to v1alpha1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure if Flyte has decided to do a hard switchover from v1alpha1 to v1. If not, then I would recommend with staying on v1alpha1 as the default value for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v1alpha1 in kuberay 1.0.0 and kuberay 1.1.0 are different. That being said, if you upgrade flyte but keep kuberay at 1.0.0. The new v1alpha1 imported in flyte cannot be run on the old kuberay. Also, there is no purpose to keep both v1alpha1 and v1, so we decide to discard v1alpha1.
If users really want to stay at v1alpha1, they can feel free to use the old flyte and kuberay 1.0.0. But if they want to upgrade kuberay to 1.1.0, it is required for them to upgrade flyte as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there new API changes to the KubeRay v1 CRDs or are users required to update KubeRay to v1.1.0 instead of say v1.0.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the due diligence in the testing matrix. I've one additional question related to backward compatibility. Will this handle running jobs in a graceful way? or will it fail to read the old CRD completely? I imagine if somebody has Ray jobs running and an upgrade happened at the same time. cc @pingsutw |
cc @kevin85421 Will kuberay handle running jobs in a graceful way |
KubeRay does not provide a conversion webhook to handle CRD upgrades gracefully. I haven't tried upgrading the CRD while RayJob custom resources were running, but I suspect there will be some issues, especially if the running RayJob uses |
Even though there might be issues as @kevin85421 mentioned, we only use rayjob (which is a batch job) in flyte. it shall be tolerable for users to rerun the job if the old one failed due to incompatibility. |
Signed-off-by: Pin-Lun Hsu <[email protected]>
Signed-off-by: Pin-Lun Hsu <[email protected]>
Signed-off-by: Pin-Lun Hsu <[email protected]>
Signed-off-by: Pin-Lun Hsu <[email protected]>
Signed-off-by: Pin-Lun Hsu <[email protected]>
Signed-off-by: Pin-Lun Hsu <[email protected]>
Signed-off-by: Pin-Lun Hsu <[email protected]>
f728fdf
to
d1a9e4e
Compare
Signed-off-by: Pin-Lun Hsu <[email protected]>
Signed-off-by: Pin-Lun Hsu <[email protected]>
Why are the changes needed?
From Kuberay 1.0.0 to 1.1.0, few changes is made in rayjob CRD, including:
What changes were proposed in this pull request?
In general, the change is to support kuberay 1.1.0 but ensure backward compatibility at the same time. I will first walk through the change i made, then enumerate user scenerio.
Flyteidl
Flytekit
Flyteplugin
How was this patch tested?
User scenario
runtime_env
runtime_env
I have run end-to-end tests on local sandbox with the following tests.
Related PRs
flyteorg/flytekit#2274
Notes
If they upgrade flyte, they also have to upgrade kuberay to 1.1.0. Have to add doc somewhere