Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable remote_ip dimension in rtr_clients #120

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

netixx
Copy link
Contributor

@netixx netixx commented Mar 11, 2024

This makes the rtr_clients track the number of connections per remote IP, enabling us to see if a client flaps, and what client it is.

I am not 100% sure that this is the right implementation, and ready to discuss better ideas :)

This makes the rtr_clients track the number of connections per
remote IP, enabling us to see if a client flaps, and what client it is.
@benjojo
Copy link
Collaborator

benjojo commented Mar 11, 2024

This is something that should be enabled behind a CLI flag, since there will be users who will not want the cardinality in their prometheus setups that this would cause. Overall I can see the use case for it though, just we should be mindful of large users of stayrtr and the impact that a change like this might have on their existing setups

@netixx
Copy link
Contributor Author

netixx commented Mar 11, 2024

For cardinality issues, unless it's a public server, it should more or less be fine, since the IPs of the routers will not change too often. There is also a possibility to do label_replace/drop on prometheus server. Nonetheless, maybe a flag is the way to go to make sure we avoid breaking people stuff :)

Also, I was thinking that a Gauge may not be the best approach here.
I think two counters, one "connects" and one for "disconnects" would be better, since it would allow us to see flaps that occur between 2 prometheus polling.
Those counters could be registered/Inc by a feature flag.

Could also be a single "flap" counters, that increments whenever a router connects or disconnects.

What are your thoughts on that ?

@sumkincpp
Copy link

Another +1 for that.

Tracking remote ips is really useful for operational needs: different client implementations behave differently, for example how many sessions they hold active.

I've ended up using customly written connections_exporter for all connections - when it is a private server, new connections, flaps and restarts are rare. Yet when they happen, it may of course flood prometheus database.

For cardinality issues one should consider dropping labels in prometheus as mentioned above.

Nevertheless something that is possible but not trivial to implement within prometheus - tracking of flapping/resets from remote clients.

  • "Flapping" consists of changing of remote port but persisting remote ip, when router decides to initiate a new connection. This may be covered with a new continuously increasing Counter metric rtr_client_flaps{remote_ip=XXX} 1
  • "Resets" are something similar, but they track the Reset PDU. IIRC some older Juniper implementations may often send resets. This may be also covered with a new continuously increasing Counter metric rtr_reset_pdus{remote_ip=XXX} 1

@netixx
Copy link
Contributor Author

netixx commented Mar 12, 2024

for resets, there is already

PDUsRecv = prometheus.NewCounterVec(
		prometheus.CounterOpts{
			Name: "rtr_pdus",
			Help: "PDU received.",
		},
		[]string{"type"},
	)

which is counter, but doesn't have the remote_ip label to it. I guess I could add it there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants