Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decision: labels from libs #211

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

decision: labels from libs #211

wants to merge 4 commits into from

Conversation

dstathis
Copy link
Contributor

@dstathis dstathis commented Oct 9, 2024

No description provided.

@dstathis dstathis requested a review from a team as a code owner October 9, 2024 12:31

The issue stems from the fact that StoredState is scoped to the pod and not the charm. Even if `use_juju_for_storage` is
set to True, this issue will still persist. The usual solution would be to use a peer relation instead of StoredState
but since this is a library, we can't really impose this on the charm (or can we?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could. Make it a mandatory arg in the requirer constructor and force the charm to specify a peer relation we can use for persisting data. I think there's a precedent for this but I can't remember where.
It is however one extra step for the charm owner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another (likely bad) solution could be to create a secret and use it for persistent storage :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We considered the first solution and decided it is better not to require a peer relation. The second idea is basically the same as the accepted solution. The data isn't secret so we use a config map instead.

Comment on lines +33 to +34
The issue stems from the fact that StoredState is scoped to the pod and not the charm. Even if `use_juju_for_storage` is
set to True, this issue will still persist. The usual solution would be to use a peer relation instead of StoredState
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this issue persist if use_juju_for_storage=True? (I just feat stored state enough to avoid it now but...) I thought that put the stored state in the controller


## Accepted Solution

We should create a ConfigMap to be used as a way to store state. Thus we will have state which lives for the lifetime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this as a way to make things work, but I dislike it as a case where we're basically implementing our own version of a feature juju/ops ostensibly has. If the existing mechanism misses an edge case, we should try to add that edge case first. Especially since Istio isn't a GA product and has some time to be patient

If we do this, it would be nice to implement this as a general "persistent storage" mechanism that we could use in any charm instead of the charm's storedstate.

Copy link
Contributor

@ca-scribner ca-scribner Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some partly formed thoughts on this that I'm still articulating. I'll flesh them out more and write them in their own comments, but putting this here now just as a placeholder. The broad thoughts are:
  1. should we be editing Pod labels or the StatefulSet's Pod's Spec's labels?
  2. we are protecting ourselves against the case where Juju sees labels are not what it expects and thus it "fixes" them back to what it expects. Any time Juju does that, it'll induce a restart of all charm pods (and, if they have pebble services running, their workloads too).

(2) feels like a really big one. I think by doing this we're asking other people to add something that'll restart their workloads with undefined frequency, which sounds like a huge issue.

Comment on lines +22 to +25
1. Labels must match relation data after a leader change
2. Labels must match relation data after a pod restart
3. Labels must match relation data when relation data changes
4. Labels applied by anything outside the library should remain untouched
Copy link
Contributor

@ca-scribner ca-scribner Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the problem is simpler than we first thought.

Each unit of a charm runs in its own pod. If we put our labels on each Pod (not the StatefulSet's Pod template)1, then each unit of a charm can be responsible for labeling itself regardless of whether it is the the leader or not. Given that, we don't need to handle the leader change event at all (1).

If we've only labeled the Pod, Pod restarts (2) will always remove our custom labels (because the Pod will be restarted to the spec defined in the StatefulSet). Thus we don't need to look at any StoredState in this case because we know we cannot own any of the current labels. We must note though that a Pod restart does not trigger any relation events (I tried deleting a Pod just now and saw the following events: [stop, upgrade_charm, config_changed, start]). So to handle a restart, we have to watch at least one of those events.

Based on this, I'm proposing a new solution below in a separate thread that can work solely with StoredState.

Footnotes

  1. Manipulating a Pod's label does not induce a restart. So if a charm adds a label to its own Pod, no restart is triggered. Alternatively, we could control the labels by editing the StatefulSet's Pod template (StatefulSet.spec.template.metadata.labels), adding Istio's labels there. In this case we'd want only the leader to do this, so different units wouldn't fight over it. Doing this, however, triggers a rolling restart so all charm units will be restarted by this change. See the rejected solution proposed below for more details on negatives of editing the StatefulSet's template

Comment on lines +40 to +43
We should create a ConfigMap to be used as a way to store state. Thus we will have state which lives for the lifetime
of the charm and does not rely on controller storage (`use_juju_for_storage=True`). It also asks nothing extra from the charm developer.

The naming schema, to guarantee uniqeness, is: `juju-service-mesh-{self._charm.app.name}-labels`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion here, if we assert that using the service-mesh library will always create/modify if they exist the given labels (istio.io/dataplane-mode, etc), then I suggest we use this solution:

Suggested change
We should create a ConfigMap to be used as a way to store state. Thus we will have state which lives for the lifetime
of the charm and does not rely on controller storage (`use_juju_for_storage=True`). It also asks nothing extra from the charm developer.
The naming schema, to guarantee uniqeness, is: `juju-service-mesh-{self._charm.app.name}-labels`.
Each charm unit should manage the labels on its own Pod, and keep track of what it has been managing using StoredState. The library implementing this would follow the procedure:
1. get `_service_mesh_cache` from StoredState (schema shown below). If it doesn't exist, default value is `{managed_labels: {}}`.
StoredState: {
_service_mesh_cache: {
managed_labels: {
label1: value1, # We only need labels here. Could omit the values
...
}
}
}
2. get `suggested_labels` from service mesh relation
3. for each key in `_service_mesh_cache` that is not in `suggested_labels`, delete that label from the Pod and delete it from the cache
4. for each key in `suggested_labels`, put those labels on this charm's Pod and save them to `_service_mesh_cache`
This procedure should execute on the following events:
* `service-mesh-relation-changed` (so that we catch any new/changed labels): run the procedure as outlined above
* `start` (so that we are guaranteed to run after a Pod restart, as that does not create a relation-changed event): do the procedure above, except since Pod restarts wipe our labels **we know we haven't set any managed labels yet**. This means we don't need to track what happened before the restart and can always reset `_service_mesh_cache={managed_labels: {}}`. Doing this reset is important so we don't accidentally inherit some stale cache from a previous execution.

To me, this is a good compromise between complexity and robustness. Its possible we clobber someone's labels, but its unlikely (are they seriously setting their own istio labels?) and we warned them.

If we do not want to make this assertion and instead avoid clobbering their labels, I think there's a solution that's very similar to the above but manages to respect the labels. Its more complex, but I believe it exists. I haven't wrote it out because it is more complicated and I think not worth the effort, but if we want that solution instead I can write it up

@lucabello lucabello changed the title adr for labels from libs adr: labels from libs Nov 25, 2024
@simskij simskij changed the title adr: labels from libs decision: labels from libs Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants