-
Notifications
You must be signed in to change notification settings - Fork 577
Per-database incoming and outgoing queue length metrics #2773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per-database incoming and outgoing queue length metrics #2773
Conversation
It's a metric that tracks the size of the incoming per-client `message_queue`. We've been concerned about not having visibility into this queue's length, as currently we only have the length of the per-database reducer queue, but each client can only have a single reducer in that queue, and may have additional messages waiting in its per-client queue. The new metric, `spacetime_client_connection_incoming_queue_length`, is an `IntGaugeVec` with the labels: `db: Identity, client_identity: Identity, connection_id: ConnectionId`. My theory is that in our viewer we can inspect the average and the sum per database, and it also may be interesting to be able to look at individual clients, as e.g. the BitCraft mob monitor may be a notable outlier. This is not the same pattern as most of our other metrics, though, which tend to only offer per-database granularity. It's possible that this new metric should also have the last two labels removed, and be labeled only on `db: Identity`.
Like the previous metric added in this branch, it's per-client, so we'll use it for testing, but likely not merge it into master. I'll follow up in a separate PR with a version that's per-database instead.
This commit alters the message queue length metrics introduced by #2754 to be per-database, rather than per-client. This should limit the cardinality of these metrics, and better lines up with the labels of our other metrics. Because the metrics are now per-database rather than per-client, it's no longer correct to just drop the label when the client disconnects. Instead, care must be taken to decrement the metric by the number of messages which were waiting in the queue at the time of the disconnection. I've added comments to call attention to this complexity.
…ngth Incl. resolving conflicts in client_connection.rs, moving the new metric into the `ClientConnectionMetrics` struct.
Do I need to |
I'm guessing in |
When a database is deleted, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gefjon we should be removing these label values when a database is deleted, but currently we don't do that for any of our metrics, so I think it's fine to just leave this a TODO for now. Ultimately the logic will have to be duplicated in delete_database
for both standalone and cloud.
Can you create a ticket to track?
Opened #2807 . |
This reverts commit ac18790.
This reverts commit ac18790.
Description of Changes
Add two new metrics,
spacetime_total_incoming_queue_length
andspacetime_total_outgoing_queue_length
. These are similar to the metrics added in #2754 , except that these are per-database, not per-client. This will reduce cardinality at the expense of less granular reporting.Because the metrics are per-database rather than per-client, there's some tricky code related to cleaning up when a client disconnects.
API and ABI breaking changes
N/a
Expected complexity level and risk
1 - new metrics with low cardinality and infrequent
with_label_values
calls.Testing
None - I am not sure how to test metrics.