Skip to content

Per-database incoming and outgoing queue length metrics #2773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 28, 2025

Conversation

gefjon
Copy link
Contributor

@gefjon gefjon commented May 22, 2025

Description of Changes

Add two new metrics, spacetime_total_incoming_queue_length and spacetime_total_outgoing_queue_length. These are similar to the metrics added in #2754 , except that these are per-database, not per-client. This will reduce cardinality at the expense of less granular reporting.

Because the metrics are per-database rather than per-client, there's some tricky code related to cleaning up when a client disconnects.

API and ABI breaking changes

N/a

Expected complexity level and risk

1 - new metrics with low cardinality and infrequent with_label_values calls.

Testing

None - I am not sure how to test metrics.

gefjon added 4 commits May 19, 2025 10:55
It's a metric that tracks the size of the incoming per-client `message_queue`.
We've been concerned about not having visibility into this queue's length,
as currently we only have the length of the per-database reducer queue,
but each client can only have a single reducer in that queue,
and may have additional messages waiting in its per-client queue.

The new metric, `spacetime_client_connection_incoming_queue_length`,
is an `IntGaugeVec` with the labels:
`db: Identity, client_identity: Identity, connection_id: ConnectionId`.
My theory is that in our viewer we can inspect the average and the sum per database,
and it also may be interesting to be able to look at individual clients,
as e.g. the BitCraft mob monitor may be a notable outlier.
This is not the same pattern as most of our other metrics, though,
which tend to only offer per-database granularity.
It's possible that this new metric should also have the last two labels removed,
and be labeled only on `db: Identity`.
Like the previous metric added in this branch, it's per-client,
so we'll use it for testing, but likely not merge it into master.
I'll follow up in a separate PR with a version that's per-database instead.
This commit alters the message queue length metrics
introduced by #2754
to be per-database, rather than per-client.
This should limit the cardinality of these metrics,
and better lines up with the labels of our other metrics.

Because the metrics are now per-database rather than per-client,
it's no longer correct to just drop the label when the client disconnects.
Instead, care must be taken to decrement the metric
by the number of messages which were waiting in the queue
at the time of the disconnection.
I've added comments to call attention to this complexity.
…ngth

Incl. resolving conflicts in client_connection.rs,
moving the new metric into the `ClientConnectionMetrics` struct.
@gefjon gefjon requested review from joshua-spacetime and jsdt May 22, 2025 15:52
@gefjon
Copy link
Contributor Author

gefjon commented May 22, 2025

Do I need to remove_label_values when the database is deleted somehow? Where would I do that?

@joshua-spacetime joshua-spacetime linked an issue May 23, 2025 that may be closed by this pull request
@bfops bfops added the release-any To be landed in any release window label May 27, 2025
@joshua-spacetime
Copy link
Collaborator

Do I need to remove_label_values when the database is deleted somehow? Where would I do that?

I'm guessing in delete_database although I'm not sure if anything extra is needed for cloud. Perhaps @kim would know.

@kim
Copy link
Contributor

kim commented May 27, 2025

When a database is deleted, the ws_client_actor loop should exit (eventually), so I would think that nothing more needs to be done?

Copy link
Collaborator

@joshua-spacetime joshua-spacetime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gefjon we should be removing these label values when a database is deleted, but currently we don't do that for any of our metrics, so I think it's fine to just leave this a TODO for now. Ultimately the logic will have to be duplicated in delete_database for both standalone and cloud.

Can you create a ticket to track?

@gefjon
Copy link
Contributor Author

gefjon commented May 28, 2025

@gefjon we should be removing these label values when a database is deleted, but currently we don't do that for any of our metrics, so I think it's fine to just leave this a TODO for now. Ultimately the logic will have to be duplicated in delete_database for both standalone and cloud.

Can you create a ticket to track?

Opened #2807 .

@gefjon gefjon added this pull request to the merge queue May 28, 2025
Merged via the queue into master with commit ac18790 May 28, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-any To be landed in any release window
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Track the size of a client's send_message queue
4 participants