-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
decision: labels from libs #211
base: main
Are you sure you want to change the base?
Conversation
|
||
The issue stems from the fact that StoredState is scoped to the pod and not the charm. Even if `use_juju_for_storage` is | ||
set to True, this issue will still persist. The usual solution would be to use a peer relation instead of StoredState | ||
but since this is a library, we can't really impose this on the charm (or can we?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could. Make it a mandatory arg in the requirer constructor and force the charm to specify a peer relation we can use for persisting data. I think there's a precedent for this but I can't remember where.
It is however one extra step for the charm owner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another (likely bad) solution could be to create a secret and use it for persistent storage :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We considered the first solution and decided it is better not to require a peer relation. The second idea is basically the same as the accepted solution. The data isn't secret so we use a config map instead.
Signed-off-by: Leon <[email protected]>
The issue stems from the fact that StoredState is scoped to the pod and not the charm. Even if `use_juju_for_storage` is | ||
set to True, this issue will still persist. The usual solution would be to use a peer relation instead of StoredState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this issue persist if use_juju_for_storage=True
? (I just feat stored state enough to avoid it now but...) I thought that put the stored state in the controller
|
||
## Accepted Solution | ||
|
||
We should create a ConfigMap to be used as a way to store state. Thus we will have state which lives for the lifetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this as a way to make things work, but I dislike it as a case where we're basically implementing our own version of a feature juju/ops ostensibly has. If the existing mechanism misses an edge case, we should try to add that edge case first. Especially since Istio isn't a GA product and has some time to be patient
If we do this, it would be nice to implement this as a general "persistent storage" mechanism that we could use in any charm instead of the charm's storedstate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- should we be editing Pod labels or the StatefulSet's Pod's Spec's labels?
- we are protecting ourselves against the case where Juju sees labels are not what it expects and thus it "fixes" them back to what it expects. Any time Juju does that, it'll induce a restart of all charm pods (and, if they have pebble services running, their workloads too).
(2) feels like a really big one. I think by doing this we're asking other people to add something that'll restart their workloads with undefined frequency, which sounds like a huge issue.
1. Labels must match relation data after a leader change | ||
2. Labels must match relation data after a pod restart | ||
3. Labels must match relation data when relation data changes | ||
4. Labels applied by anything outside the library should remain untouched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the problem is simpler than we first thought.
Each unit of a charm runs in its own pod. If we put our labels on each Pod (not the StatefulSet's Pod template)1, then each unit of a charm can be responsible for labeling itself regardless of whether it is the the leader or not. Given that, we don't need to handle the leader change event at all (1).
If we've only labeled the Pod, Pod restarts (2) will always remove our custom labels (because the Pod will be restarted to the spec defined in the StatefulSet). Thus we don't need to look at any StoredState in this case because we know we cannot own any of the current labels. We must note though that a Pod restart does not trigger any relation events (I tried deleting a Pod just now and saw the following events: [stop, upgrade_charm, config_changed, start]
). So to handle a restart, we have to watch at least one of those events.
Based on this, I'm proposing a new solution below in a separate thread that can work solely with StoredState.
Footnotes
-
Manipulating a Pod's label does not induce a restart. So if a charm adds a label to its own Pod, no restart is triggered. Alternatively, we could control the labels by editing the
StatefulSet
'sPod
template (StatefulSet.spec.template.metadata.labels
), adding Istio's labels there. In this case we'd want only the leader to do this, so different units wouldn't fight over it. Doing this, however, triggers a rolling restart so all charm units will be restarted by this change. See the rejected solution proposed below for more details on negatives of editing the StatefulSet's template ↩
We should create a ConfigMap to be used as a way to store state. Thus we will have state which lives for the lifetime | ||
of the charm and does not rely on controller storage (`use_juju_for_storage=True`). It also asks nothing extra from the charm developer. | ||
|
||
The naming schema, to guarantee uniqeness, is: `juju-service-mesh-{self._charm.app.name}-labels`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the discussion here, if we assert that using the service-mesh library will always create/modify if they exist the given labels (istio.io/dataplane-mode
, etc), then I suggest we use this solution:
We should create a ConfigMap to be used as a way to store state. Thus we will have state which lives for the lifetime | |
of the charm and does not rely on controller storage (`use_juju_for_storage=True`). It also asks nothing extra from the charm developer. | |
The naming schema, to guarantee uniqeness, is: `juju-service-mesh-{self._charm.app.name}-labels`. | |
Each charm unit should manage the labels on its own Pod, and keep track of what it has been managing using StoredState. The library implementing this would follow the procedure: | |
1. get `_service_mesh_cache` from StoredState (schema shown below). If it doesn't exist, default value is `{managed_labels: {}}`. | |
StoredState: { | |
_service_mesh_cache: { | |
managed_labels: { | |
label1: value1, # We only need labels here. Could omit the values | |
... | |
} | |
} | |
} | |
2. get `suggested_labels` from service mesh relation | |
3. for each key in `_service_mesh_cache` that is not in `suggested_labels`, delete that label from the Pod and delete it from the cache | |
4. for each key in `suggested_labels`, put those labels on this charm's Pod and save them to `_service_mesh_cache` | |
This procedure should execute on the following events: | |
* `service-mesh-relation-changed` (so that we catch any new/changed labels): run the procedure as outlined above | |
* `start` (so that we are guaranteed to run after a Pod restart, as that does not create a relation-changed event): do the procedure above, except since Pod restarts wipe our labels **we know we haven't set any managed labels yet**. This means we don't need to track what happened before the restart and can always reset `_service_mesh_cache={managed_labels: {}}`. Doing this reset is important so we don't accidentally inherit some stale cache from a previous execution. |
To me, this is a good compromise between complexity and robustness. Its possible we clobber someone's labels, but its unlikely (are they seriously setting their own istio labels?) and we warned them.
If we do not want to make this assertion and instead avoid clobbering their labels, I think there's a solution that's very similar to the above but manages to respect the labels. Its more complex, but I believe it exists. I haven't wrote it out because it is more complicated and I think not worth the effort, but if we want that solution instead I can write it up
No description provided.