Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oximeter collector needs a way to lookup ClickHouse more often #6407

Closed
bnaecker opened this issue Aug 21, 2024 · 2 comments · Fixed by #6745
Closed

Oximeter collector needs a way to lookup ClickHouse more often #6407

bnaecker opened this issue Aug 21, 2024 · 2 comments · Fixed by #6745
Assignees
Labels
oximeter qorb Connection Pooling
Milestone

Comments

@bnaecker
Copy link
Collaborator

Today, the oximeter collector looks up ClickHouse in DNS on startup, and uses that address throughout:

let db_address = if let Some(address) = db_config.address {
address
} else if replicated {
SocketAddr::V6(
resolver
.lookup_socket_v6(ServiceName::ClickhouseServer)
.await?,
)
} else {
SocketAddr::V6(
resolver.lookup_socket_v6(ServiceName::Clickhouse).await?,
)
};

That means the insertion task will always use the same address, even if we move the ClickHouse zones around. We should pass the resolver into that task, or have another way for it to know that it should re-resolve ClickHouse again. It's also important to note this is low priority. We can work around it pretty easily by just restarting the oximeter SMF service, which will cause it to resolve ClickHouse again. That may lose a small amount of data buffered in the collector, but it's pretty minor.

@jgallagher
Copy link
Contributor

Can we use qorb for this once it lands? (This sounds quite similar to #3763 which led to it.)

@bnaecker
Copy link
Collaborator Author

I started on this today, and I think it'll need a bit of a more invasive change than I was hoping. I wanted to keep the existing reqwest::Client for now, and set the DNS resolver it uses to the one from the internal_dns crate. Unfortunately, I don't think that works directly, because reqwest does not use the port numbers returned from an SRV record to make requests -- it seems to basically throw the port away, resolve the AAAA that the SRV record points to, and then use the default for the URI schema (HTTP, in this case). I'm not sure why it does that.

To use qorb, we need two things: a resolver, which knows how to lookup backends, and a connector, that knows how to connect to them. The resolver is very easy, that's basically just a small wrapper around the existing Resolver in the internal_dns crate. But the connector is less clear. There's no way I can see to pass reqwest::Client an actual TCP stream to use. Even if there were, then we're obviously not using any of the existing connection pooling in that crate at all, which seems wasteful. We could instead switch to using a lower-level HTTP library like hyper, which is very low-level and does directly operate over a TCP socket. That might be the best approach, but it's certainly a lot of additional complexity over reqwest.

An alternative is to eschew the connection pooling for the HTTP interface at all, and instead move all the internals over to the new TCP connection. Integrating qorb with that new connection type is much more straightforward, since it's really just some framing over a TCP stream. That does put resolving this PR behind all that work though. That's not terrible, but is more than I originally anticipated.

@smklein smklein self-assigned this Oct 1, 2024
@davepacheco davepacheco added this to the 12 milestone Oct 4, 2024
@smklein smklein added the qorb Connection Pooling label Oct 8, 2024
smklein added a commit that referenced this issue Oct 16, 2024
Uses [qorb](https://crates.io/crates/qorb) to dynamically discover the
location of Clickhouse and Nexus within the Oximeter Collector.

Fixes #6407
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oximeter qorb Connection Pooling
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants