-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Optimization][dinky-getaway] After a task fails, it is automatically deployed #3953
Comments
I have two solutions to fix this issue:
I believe both mechanisms should be implemented. The first one should promptly delete services that fail at startup, while the second should delete services that fail during runtime. Which approach do you think is more appropriate? |
Automatically deleting failed tasks is inappropriate because when something goes wrong with a task, the user has to go to the k8s cluster to check the logs to troubleshoot the error, and if it's deleted, it will result in a very bad experience because k8s doesn't keep anything |
Then I can meet the requirements by deleting the corresponding service before submitting the task, right? |
I think it is OK not to delete the pod after the task fails, because the user needs to see the log, but if you run it again after modification, there will be a problem that the pod already exists, so I think you can check whether the current pod already exists when the task starts, and remove the pod when the state is unhealthy |
@Jam804 yes, |
Are you interested in doing this work, I can assign this issue to you @Jam804 |
Search before asking
Description
When I use k8s application to start a task, if there is an error in my task that causes the program to fail to start, the pod container of k8s will not automatically clear, resulting in an exception that already exists in the container when I start the next time. I hope that the deployment of the current job will be cleared after the abnormal start. This allows the job to restart without problems that already exist in the container
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: