Skip to content

[Feature] [kubectl-plugin] Expose setting shutdownAfterJobFinishes and ttlSecondsAfterFinished in ray job submit #3560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
ashrielbrian opened this issue May 7, 2025 · 12 comments · Fixed by #3627
Assignees
Labels
cli kubectl plugin enhancement New feature or request

Comments

@ashrielbrian
Copy link

ashrielbrian commented May 7, 2025

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

Currently shutdownAfterJobFinishes and ttlSecondsAfterFinished aren't being exposed in kubectl ray job submit, although they are available in the CRD. Ideally these two can be set as flags when using the plugin:

kubectl ray job submit --shutdown-after-job-finishes --ttl-seconds-after-finished 10 ...

Use case

I want ray jobs created by users to be automatically cleaned up, rather than be left around. This can be set if the user were to create and manage their own CRD yaml files, but not when using the plugin CLI.

In particular, after a ray job is submitted and completed, and another ray job with the same name is submitted, this returns an error since there's no cleanup of the previous RayJob CRD.

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@ashrielbrian ashrielbrian added enhancement New feature or request triage labels May 7, 2025
@kevin85421 kevin85421 added cli kubectl plugin and removed triage labels May 11, 2025
@kevin85421
Copy link
Member

I want ray jobs created by users to be automatically cleaned up, rather than be left around. This can be set if the user were to create and manage their own CRD yaml files, but not when using the plugin CLI.

Hi @ashrielbrian, thank you for your feedback. Would you mind sharing more details about how you use kubectl ray job submit? Is the person who creates the RayJob YAML the same as the one who runs the kubectl ray job submit command? Or do the infra/platform teams create the RayJob YAML while MLEs or data scientists run the CLI command?

@kevin85421
Copy link
Member

also cc @MortalHappiness @davidxia for input

@kevin85421
Copy link
Member

@CheyuWu are you interested in taking this issue if this request makes sense to Chi-Sheng and David?

@CheyuWu
Copy link
Contributor

CheyuWu commented May 11, 2025

Hi @kevin85421 , sure, no problem

@davidxia
Copy link
Contributor

Makes sense to me

@MortalHappiness
Copy link
Member

MortalHappiness commented May 12, 2025

I want ray jobs created by users to be automatically cleaned up, rather than be left around. This can be set if the user were to create and manage their own CRD yaml files, but not when using the plugin CLI.

Hi @ashrielbrian, thank you for your feedback. Would you mind sharing more details about how you use kubectl ray job submit? Is the person who creates the RayJob YAML the same as the one who runs the kubectl ray job submit command? Or do the infra/platform teams create the RayJob YAML while MLEs or data scientists run the CLI command?

Let's wait for a response on this. As for me, I don't think we should support all the fields that a regular RayJob has in the kubectl ray job submit command, because the kubectl plugin is designed primarily for data scientists, not for the infrastructure team. We've already supported reading from a YAML file in the kubectl ray job submit command, so technically all fields can be configured through that. Therefore, unless it's absolutely necessary for data scientists, we won't add more flags to support additional RayJob fields.

@kevin85421
Copy link
Member

@MortalHappiness this feature requests make sense to me if the following two assumptions are correct:

  • Data scientists should not touch the RayJob CR YAML.
  • Infra/Platform teams don't own the workloads, and different workloads may have different requirements for shutdownAfterJobFinishes and ttlSecondsAfterFinished.

@davidxia has relevant infra/platform team experience. Would you mind sharing your thoughts on these two assumptions?

@davidxia
Copy link
Contributor

yes, in my team both the assumptions are the case for most users. Sometimes we make users use their own CR YAML when it's an infrequently requested config, but when we think many users might use them, we make them into CLI flags for our internal tools.

@kevin85421
Copy link
Member

@davidxia Thank you! @MortalHappiness does this make sense to you?

@ashrielbrian
Copy link
Author

ashrielbrian commented May 13, 2025

Hi @kevin85421,

Originally, our data scientists had to craft their own RayJob CR yaml files, but this very quickly became a painful UX for them, and we also wanted to remove the need to do kubectl apply across any kubernetes resource, CRDs or otherwise. We (platform) started experimenting with the plugin and then integrated that with our internal CLI tool as a light wrapper around kubectl ray job submit .... So our current flow now discourages any direct editing of RayJob CRDs for data scientists - using the CLI is the primary and recommended way in our workflow to run ray jobs for our DS users.

An alternative we are considering is having our platform team manage a templated CRD, and our internal tool exposes CLI flags for specific parameters we'd like to expose, but this would do away with using the plugin as a wrapper.

@kevin85421
Copy link
Member

Thanks @ashrielbrian! This feature request makes sense to me. @CheyuWu you can start to work on it.

@MortalHappiness
Copy link
Member

@davidxia Thank you! @MortalHappiness does this make sense to you?

Yes, this makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli kubectl plugin enhancement New feature or request
Projects
None yet
5 participants