Skip to content

Add SubnetAddressTranslator to translate Cassandra node IPs from private network based on its subnet mask #2013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 4.x
Choose a base branch
from

Conversation

jahstreet
Copy link
Contributor

@jahstreet jahstreet commented Feb 10, 2025

When running Cassandra in a private network and accessing it from outside of that private network via some kind of proxy, we have an option to use FixedHostNameAddressTranslator. But when we want to set it up in a HA way and have more control over latencies in multi-datacenter deployments, that is not enough.

This PR proposes a SubnetAddressTranslator, which translates Cassandra node IP addresses based on the match to the configured subnet IP range (CIDR notation). The assumption is that each Cassandra datacenter nodes belong to different subnets not having intersecting IP ranges, which is the usual configuration for multi-DC Kubernetes and K8ssandra, for example.

Copy link
Contributor Author

@jahstreet jahstreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any additional documentation I should update with this change?

Copy link
Contributor

@tolbertam tolbertam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work @jahstreet, thank you! Have some suggestions, but I'm +1 either way.

# default-address = "cassandra.datacenter1.com:9042"
# Whether to resolve the addresses once on initialization (if true) or on each node (re-)connection (if false).
# If not configured, defaults to false.
# resolve-addresses = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 while resolve-contact-points defaults to true, I was thinking about whether we should do the same here, but I think you probably will generally use a DNS name that might change its backing IPs from time to time, so makes sense for this to be false. Was that what you were thinking as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100%, I expect proxy to have at least 2 replicas eligible for "periodic" restarts, that was the intuition to decide here.

@jahstreet
Copy link
Contributor Author

Excellent work @jahstreet, thank you! Have some suggestions, but I'm +1 either way.

Thanks! I will push a commit with annotations till Monday morning.

Copy link
Contributor

@absurdfarce absurdfarce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the overall idea, just a few things I think need to be tweaked here. Moving the DriverOptions around shouldn't be too bad but I am a bit concerned about adding a new dependency just for this. I'm also not sure I love the additional exception handling code that's been added now... but I can be convinced on either point.

core/pom.xml Outdated
<groupId>com.github.seancfoley</groupId>
<artifactId>ipaddress</artifactId>
<optional>true</optional>
</dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't love the inclusion of another dependency here, even if it's an optional one. It's only used in one class (near as I can tell)... is there really no way to get the functionality we need without adding this in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First thing that comes to mind is implementing it ourselves (or in other words copying it over from the library). Lemme evaluate how much of the util code is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need at least the following functionality to work with subnets here:

  • Validate subnet string is in a prefix block format
  • Check if subnet contains IP address
  • All for IPv4 and IPv6

The library is quite big, so copying over its parts is an overkill.
Then the alternative is to implement these functions ourselves.
Looking into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of vibe-coding and we can have it with around 100 lines of code. Will work on integrating a change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitted the change and dropped the dependency, PTAL 🙏 .

addresses = AddressUtils.extract(spec, resolve);
} catch (RuntimeException e) {
LOG.warn("Ignoring invalid contact point {} ({})", spec, e.getMessage(), e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just continue here to next iteration of the outer for loop. You know addresses is the empty set at this point so there's no point iterating over it below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I look at this now it feels like we had to make this a bit more complicated because AddressUtils.extract() now throws exceptions in most cases rather than just logging errors and returning an empty set. Was there a particular reason for this change? It's not immediately clear the exceptions buy you much here.

Copy link
Contributor Author

@jahstreet jahstreet Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, #extract code was used only in this class and we logged errors together with reasons of these errors. Now the info about reasons is moved to util method, which is called from multiple places. In this class, I aimed to keep logging (as well as other functionality) as close to the origin as seemed possible to avoid opinionated refactoring, so I needed a way to get reasons of errors from the utility #extract to log them together with the context logs.
Happy to agree on the way it should look like and change accordingly.

"Contact point {} resolves to multiple addresses, will use them all ({})",
spec,
addresses);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this log message offer us much useful information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, it was there so I kept it as is.
As for me, this log is a good additional info when debugging failed to connect issues. Like, one could be surprised to see the client failed to connect logs where contact points do not match the configured ones.
What is your opinion on the need of it?

@jahstreet jahstreet force-pushed the add-subnet-address-translator branch from cf06929 to 85d5931 Compare March 23, 2025 11:55
@absurdfarce
Copy link
Contributor

Apologies, this is on my list but I haven't made it back to reconsider the updated comments in this review. I appreciate your patience @jahstreet!

FWIW I have added this to the 4.19.1 release planning doc under the working assumption that we'll almost certainly get this in in some form we can all agree on.

@jahstreet jahstreet requested a review from absurdfarce May 16, 2025 11:19
@tolbertam tolbertam self-requested a review June 2, 2025 17:40
@aratno aratno self-requested a review June 2, 2025 17:41
Copy link
Contributor

@absurdfarce absurdfarce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @jahstreet! I very much like the idea of bringing the subnet management logic into code in this PR; I appreciate the efforts to avoid the introduction of an extra external dependency.

I want to run this through a full set of unit and integration tests but it looks like there's a few steps that need to happen before we can get this to build with Maven. The changes I'm asking for in this PR should get you to a spot where we can build and run tests... well, these changes plus a "mvn com.coveo:fmt-maven-plugin:format" but you get my point. :)

@jahstreet
Copy link
Contributor Author

jahstreet commented Jun 7, 2025

Deal, thx for the feedback.
Will address it asap.

UPD: done.

@jahstreet
Copy link
Contributor Author

Suggested commit message:

"Add SubnetAddressTranslator to translate Cassandra node IPs from private network based on its subnet mask"

@jahstreet jahstreet force-pushed the add-subnet-address-translator branch from 49a2ebf to 0f00215 Compare June 10, 2025 19:37
@absurdfarce
Copy link
Contributor

Confirmed that the build looks good locally with the recent changes from @jahstreet . Kicking off a DataStax Jenkins run now to confirm that we haven't had any unexpected regressions.

@absurdfarce
Copy link
Contributor

Bah, Jenkins failed with complaints like the following:

[2025-06-10T22:25:35.767Z] [INFO] --- maven-compiler-plugin:3.8.1:compile (default-compile) @ java-driver-core ---
[2025-06-10T22:25:35.767Z] [INFO] Compiling 796 source files to /home/jenkins/workspace/drivers_java_oss_PR-2013/core/target/classes
[2025-06-10T22:25:38.559Z] [INFO] -------------------------------------------------------------
[2025-06-10T22:25:38.559Z] [ERROR] COMPILATION ERROR : 
[2025-06-10T22:25:38.559Z] [INFO] -------------------------------------------------------------
[2025-06-10T22:25:38.559Z] [ERROR] /home/jenkins/workspace/drivers_java_oss_PR-2013/core/src/main/java/com/datastax/oss/driver/internal/core/addresstranslation/Subnet.java:[21,30] package com.google.common.base does not exist
[2025-06-10T22:25:38.559Z] [INFO] 1 error
[2025-06-10T22:25:38.559Z] [INFO] -------------------------------------------------------------

Weird thing was that local builds were just fine. This made absolutely no sense to me; I spent the afternoon trying various combinations of Maven versions and POM settings to see what might account for the difference.

Naturally the answer was pretty much right in front of me the whole time. This PR was branched off from a point in 4.x before this commit went in. And that commit fixes precisely this behaviour. Once I included this change in my local checkout of the PR branch I was able to easily reproduce the failure above in my local build.

@jahstreet can you merge 4.x into your PR branch so that we can get this fix on your branch as well? I think such a merge + the other work you've already done should enable us to run a full build on our Jenkins server.

Thanks!

@absurdfarce
Copy link
Contributor

Oh, almost forgot... the reason the build now fails with that change in place is because you're using the normal Guava packages here (and potentially other places in your PR... I admit I haven't checked yet). This should be changed to use the shaded packages for Guava along the lines of the VisibleForTesting import just above your addition.

@jahstreet
Copy link
Contributor Author

Sorry to make you spend much time time on it. Rebase is obviously a good thing to have always. On it.

@absurdfarce
Copy link
Contributor

No worries @jahstreet ... I'm just happy we figured it out and have a clear path now to get this in!

@jahstreet jahstreet changed the title Add SubnetAddressTranslator Add SubnetAddressTranslator to translate Cassandra node IPs from private network based on its subnet mask Jun 11, 2025
@jahstreet
Copy link
Contributor Author

Did the visual check and corrected guava references. Hope now it is in a good shape 🤞 .

@absurdfarce how (with which maven args) do you build/compile locally? I wasn't able to reproduce the dependency errors with basic mvn clean package -DskipTests

@jahstreet jahstreet force-pushed the add-subnet-address-translator branch from 0f00215 to c72413e Compare June 11, 2025 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants