Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relating several applications only stores dashboards and alerts from first relation #158

Open
zmraul opened this issue Mar 28, 2023 · 4 comments

Comments

@zmraul
Copy link

zmraul commented Mar 28, 2023

Bug Description

Relating several applications to grafana-agent will partially fail. Only the dashboards and alerts from the first relation will be sent over to COS.
Logs and metrics seem fine, I can see them from both applications.

To Reproduce

juju relate grafana-agent kafka
juju relate grafana-agent zookeeper
# This will lead to dashboards and alerts only from kafka
-------
juju relate grafana-agent zookeeper
juju relate grafana-agent kafka
# This will lead to dashboards and alerts only from zookeeper

Environment

grafana-agent: channel edge, revision 4
zookeeper: https://github.com/canonical/zookeeper-operator/tree/feature/grafana_agent_integration
kafka: https://github.com/deusebio/kafka-operator/tree/wip-logs-integration

Relevant log output

-

Additional context

No response

@zmraul zmraul changed the title Relating several applications will only pick up dashboards and alerts from first relation Relating several applications only stores dashboards and alerts from first relation Mar 28, 2023
@PietroPasotti
Copy link
Contributor

I think the issue is: the first application you relate to gagent will deploy gagent (which becomes leader. In the screenshot above, it was probably zookeeper. You did juju relate grafana-agent zookeeper first --> the grafana-agent unit that gets assigned to zookeeper (#15) becomes leader.
gagent integrates with COS lite over application databag, AKA only the leader can send data. Consequently non-leader-unit-owned data, such as in this case kafka's dashboards and rules, never make it across

@PietroPasotti
Copy link
Contributor

a similar effort for other pieces of the data was done over https://github.com/canonical/grafana-agent-k8s-operator/pull/142/files
at that stage we didn't consider also dashboards and alerts needed to be unit-based.

@sed-i
Copy link
Contributor

sed-i commented Mar 28, 2023

Here's a summary of my findings re relation data accessibility.

Deployment

Model                        Controller    Cloud/Region         Version  SLA          Timestamp
test-machine-juju-info-q8py  machineworld  localhost/localhost  2.9.38   unsupported  12:09:08-04:00

App        Version  Status   Scale  Charm          Channel        Rev  Exposed  Message
agent               blocked      3  grafana-agent  edge             4  no       Missing relation: 'logging-consumer'
another    22.04    active       1  ubuntu         edge            21  no       
principal  22.04    active       2  ubuntu         latest/stable   21  no       

Unit          Workload  Agent  Machine  Public address  Ports  Message
another/0*    active    idle   2        10.30.254.139          
  agent/2     blocked   idle            10.30.254.139          Missing relation: 'send-remote-write'
principal/0   active    idle   0        10.30.254.231          
  agent/1     blocked   idle            10.30.254.231          Missing relation: 'logging-consumer'
principal/1*  active    idle   1        10.30.254.121          
  agent/0*    blocked   idle            10.30.254.121          Missing relation: 'logging-consumer'

Machine  State    Address        Inst id        Series  AZ  Message
0        started  10.30.254.231  juju-08a5d7-0  jammy       Running
1        started  10.30.254.121  juju-08a5d7-1  jammy       Running
2        started  10.30.254.139  juju-08a5d7-2  jammy       Running

Relation provider    Requirer         Interface  Type         Message
another:juju-info    agent:juju-info  juju-info  subordinate  
principal:juju-info  agent:juju-info  juju-info  subordinate  

Relation view

Technically, relations are between charms, not between units (same as in k8s).

graph LR
principal ---|" juju-info:0  "| agent
another ---|" juju-info:3 "| agent
Loading

Machine view

  • There could be a different principal charm for every subord unit.
  • The subordiante leader is not necessarily in the same machine as the principal leader.
graph TD

subgraph machine/0
principal/0
agent/1
end

subgraph machine/1
principal/1
agent/0*
end

subgraph machine/2
another/0*
agent/2
end
Loading

Can a subordinate unit access another subordinate unit's data?

Hook tools

relation-ids

relation-ids only runs against a unit (running it against an app simply loops over the units):

$ juju exec --unit agent/0 -- relation-ids juju-info
juju-info:0

$ juju exec --unit agent/1 -- relation-ids juju-info
juju-info:0

$ juju exec --unit agent/2 -- relation-ids juju-info
juju-info:3

relation-list

Unlike in k8s, where a unit sees all the remote units on a given relation, with subordianted relations we only see the one princiapl unit it is subordinated to:

$ juju exec --unit agent/0 -- relation-list -r juju-info:0
principal/1

$ juju exec --unit agent/1 -- relation-list -r juju-info:0
principal/0

$ juju exec --unit agent/2 -- relation-list -r juju-info:3
another/0

relation-get

This hook tool seems provides the most convincing evidence that a subordiante (leader) unit cannot see other units' relation data (unlike in k8s):

$ juju exec --unit agent/0 -- relation-get -r juju-info:0 - agent/0
egress-subnets: 10.30.254.121/32
ingress-address: 10.30.254.121
private-address: 10.30.254.121

$ juju exec --unit agent/0 -- relation-get -r juju-info:0 - agent/1
ERROR cannot read settings for unit "agent/1" in relation "agent:juju-info principal:juju-info": unit "agent/1": settings not found

$ juju exec --unit agent/0 -- relation-get -r juju-info:0 - agent/2
ERROR cannot read settings for unit "agent/2" in relation "agent:juju-info principal:juju-info": unit "agent/2": settings not found

Having a peers: section in the subordiante's metadata does not change the above.
However, reading each other's peer data is possible:

$ juju exec --unit agent/2 -- relation-get -r cluster:4 - agent/0
egress-subnets: 10.30.254.121/32
ingress-address: 10.30.254.121
private-address: 10.30.254.121

model.relations

I added a few prints to grafana-agent's __init__:

if rels := self.model.relations.get("juju-info"):  
    logger.info("DBG juju-info rels: %s", rels)  
    for rel in rels:  
        logger.info("DBG juju-info units: %s", rel.units)

and in the log, every grafana-agent unit only sees one relation (the juju-info it is related over to the principal), and that one relation has only one unit (the principal unit):

unit.agent/0.juju-log DBG juju-info rels: [<ops.model.Relation juju-info:0>]
unit.agent/0.juju-log DBG juju-info units: {<ops.model.Unit principal/1>}

unit.agent/1.juju-log DBG juju-info rels: [<ops.model.Relation juju-info:0>]
unit.agent/1.juju-log DBG juju-info units: {<ops.model.Unit principal/0>}

unit.agent/2.juju-log DBG juju-info rels: [<ops.model.Relation juju-info:3>]
unit.agent/2.juju-log DBG juju-info units: {<ops.model.Unit another/0>}

juju show-unit

Here, unlike with the hook tools, juju lists all relations (both juju-info:0 and :3) under a single unit. We get a very similar output for all three agent units.
That is probably because juju itself has "full" visibility, but that does not help us much from within charm code, per previous sections.

$ juju show-unit agent/0
agent/0:
  machine: "1"
  opened-ports: []
  public-address: 10.30.254.121
  charm: local:jammy/grafana-agent-0
  leader: true
  life: alive
  relation-info:
  - relation-id: 0
    endpoint: juju-info
    related-endpoint: juju-info
    application-data: {}
    related-units:
      principal/0:
        in-scope: true
        data:
          egress-subnets: 10.30.254.231/32
          ingress-address: 10.30.254.231
          private-address: 10.30.254.231
      principal/1:
        in-scope: true
        data:
          egress-subnets: 10.30.254.121/32
          ingress-address: 10.30.254.121
          private-address: 10.30.254.121
  - relation-id: 3
    endpoint: juju-info
    related-endpoint: juju-info
    application-data: {}
    related-units:
      another/0:
        in-scope: true
        data:
          egress-subnets: 10.30.254.139/32
          ingress-address: 10.30.254.139
          private-address: 10.30.254.139

@sed-i
Copy link
Contributor

sed-i commented Mar 28, 2023

Per previous comment, @PietroPasotti is right because a subord (leader or not) does not see rel data from all other units.

Seems like we need to use a peer relation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants