-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash due to missing relation #65
Comments
Mitigation fix 1@@ -279,7 +280,15 @@ class AlertmanagerProvider(RelationManagerBase):
# a single consumer charm's unit may be related to multiple providers
if self.name in self.charm.model.relations:
for relation in self.charm.model.relations[self.name]:
- relation.data[self.charm.unit].update(self._generate_relation_data(relation))
+ # Sometimes there is a dangling relation, for which we get the following error:
+ # ops.model.ModelError: b'ERROR relation 17 not found (not found)\n'
+ # when trying to `network-get alerting`. Suppressing the ModelError in this
+ # case, with the expectation that Juju would resolve the dangling relation
+ # eventually.
+ with contextlib.suppress(ModelError):
+ relation.data[self.charm.unit].update(
+ self._generate_relation_data(relation)
+ ) Mitigation fix 2The following Provider code (alertmanager side) updates relation data with prometheus, so prometheus knows the (public) IP address of alertmanager: alertmanager-k8s-operator/lib/charms/alertmanager_k8s/v0/alertmanager_dispatch.py Lines 259 to 264 in 0dd0cbb
Iirc, In that case I could use the (private) HOWEVER, the Root causeCould this be a refcount issue due to the |
Mitigation: Don't use
If anything, this could be a regression in that Juju commit. Either way, using |
(Thanks @rbarry82, will follow up shortly after the following) After suppressing ModelError in alertmanager, I get something similar in prom:
|
Mitigation: Don't use |
Not much sense in updating relation data if the relation is gone. try:
relation.data[self.charm.unit].update(
self._generate_relation_data(relation)
)
except ModelError:
relation.data[self.charm.unit].pop("public_address") which gives the good old |
I guess I should clarify:
It has a limited number of "real" use cases. For what we've seen, the libjuju websocket thing where an unexpected/unsynchronized shutdown may remove the socket out from under you as part of shutdown (or the same with Pebble). Or a truly stateless application which is unaware of whether or not it is holding leadership and other instances may attempt to write. A Same question: address = relation.data[unit].get("public_address")
if address:
alertmanagers.append(address) If you inverted it to |
Bug Description
In the scenario where Alertmanager is related to something, in this example Prometheus, and the remote charm is removed, the next
update-status
event causes alertmanager to go into an error state.To Reproduce
Environment
Relevant log output
Additional context
No response
The text was updated successfully, but these errors were encountered: