-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduled rollout #727
base: main
Are you sure you want to change the base?
Scheduled rollout #727
Conversation
Co-authored-by: Christian Artin <[email protected]>
Co-authored-by: Christian Artin <[email protected]>
Co-authored-by: Christian Artin <[email protected]>
@ibrokethecloud @nickgerace FYI, tagging you because you were involved in the previous PR. |
Co-authored-by: Christian Artin <[email protected]>
f659912
to
86d28e7
Compare
I've raised this with the Fleet team, they're aware and will start working through community PRs soon. |
Tagging @ibrokethecloud - will you please review this as it brings up a two instances not covered in yours and advise whether yours should be closed in favor of this one? |
@SheilaghM Any news? We are now in May and we still have no feedback after 2 months. |
Hi @Athosone thanks a lot for your PR. @SheilaghM can we please have some clarity on whether cluster scoped schedule supersedes the bundle scoped one or are we happy to have the bundle schedule have priority. |
Hi :)! Sure no problem let me know I'll fix it ! |
Hey :)! Did you have the time to think about this PR? |
@Athosone - We are still discussing the Fleet use cases with Product Management. We will act on this one way or another as soon as we have clarity. |
@@ -53,6 +53,17 @@ defaultNamespace: default | |||
# Default: "" | |||
namespace: default | |||
|
|||
# Specify a deployment schedule to deploy bundle during that window. | |||
# In this example we will deploy bundles every monday from 4pm to 5pm. | |||
# If a helm timeout is specified in the helm structure below, it will be considered in the schedule evaluation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too familiar with the helm code. I'm a bit concerned this will not be obvious to users, maybe we need to log that the "job will not be executed, because its timeout is too large for the schedule window".
But is it really necessary to take the helm timeout into account? As far as I understand helm tasks are never cancelled by, so we rely on the helm library to honor the timeout value. If timeout is 0, wait
is not used, are we certain helm will finish immediately in that case? I feel like it must be possible to make it take very long, by using hooks or lookups maybe?
In any case the actual helm deployment going on in the cluster might take longer. So even when you take the tiemout into consideration, there is no guarantee the cluster is in a ready state after the schedule window?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi !
The way we approached it is, if, as a user I don't specify a timeout, it means I do not care how long the installation take (and helm default it to 5 min).
It could also means that you know that there is no hook/lookups and it will be installed instantly. It would not be accurate to reserve a five minutes block for that case.
On the other hand, a user that wants to have some kind of mechanism to prevent an installation from taking too much time, could specify a timeout in order for helm to cancel the installation (and maybe rollback if atomic is specified).
As you said, we cannot guarantee the cluster will be in a ready state after the schedule window, it is a best effort way of doing it.
I tested it with a simple chart including hooks. The post-install hook includes a job that sleep for 305 seconds.
I launched the install by using:
helm install httpbin .
The result is:
Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition
the job:
apiVersion: batch/v1
kind: Job
metadata:
name: sleep
annotations:
helm.sh/hook: post-install
spec:
template:
spec:
containers:
- name: sleep
image: busybox:1.28
command: ["sh", "-c", "echo Hello ! && sleep 350"]
restartPolicy: Never
backoffLimit: 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, okay. Best effort is fine with me.
However, I was looking at the helm code:
fleet/pkg/helmdeployer/deployer.go
Lines 328 to 333 in 5a141a6
u.Timeout = timeout | |
u.DryRun = dryRun | |
u.PostRenderer = pr | |
if u.Timeout > 0 { | |
u.Wait = true | |
} |
So, if you tested your hook chart without a timeout, I think it would run for 350s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I see what you mean, specifying --timeout=0 run until it completes. It looks like fleet is setting 0 implicitly instead of the default 300 sec of helm, thus it will run forever and ever if the user do not specify a timeout
Any news on this? |
We could rebase it and try to push it again if the fleet team is still interested in the contrib |
Feature description
This feature allows you to plan a deployment at a given period.
A period is composed of a cron and a duration.
Both of these properties represents a window during which the application can be deployed.
The schedule can be applied to either the cluster itself and/or the bundle.
If schedules are set on both the cluster and the bundle then the bundle schedule takes priority over the cluster schedule.
We chose to prioritise the bundle over the cluster as we think that the user may want to specify exceptions to the global rule of a cluster schedule.
Usage Example
fleet.yaml
cluster
Motivation for this PR
We noticed there was an existing PR #450 for this feature related to #383, but saw that there was no activity on it.
Also there were two implementation details that were not right in our opinion:
nextRun
was evaluated using theScheduledAt
and not computed using thecron
andduration
. This would result in the bundle being scheduled for thenextRun
even if thecron
andduration
did not match theScheduledAt
anymore.