Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session Synchronization #385

Closed
suffieldacademy opened this issue Aug 26, 2022 · 3 comments
Closed

Session Synchronization #385

suffieldacademy opened this issue Aug 26, 2022 · 3 comments
Labels
Support User needs help

Comments

@suffieldacademy
Copy link

Hello,

We're busy setting up a multi-node multi-instance NAT64 cluster. Most of the pieces are in place, but we're having trouble with the session synchronization. I believe we've followed the instructions, but we are not seeing the session states from one node showing up on the other. Additionally, we're seeing session traffic from multiple instances even though we're only creating sessions in a single instance. I'm wondering if there might be an issue with per-instance sessions.

Quick background on the setup:

Both nodes have the following instances defined:

  • nat64-wkp-lower
  • nat64-wkp-upper
  • siit-dmz

Both NAT64 instances have joold configured to run session synchronization:

	"ss-enabled": true,
	"ss-flush-asap": false,
	"ss-flush-deadline": 2000,
	"ss-max-payload": 1446

The instances use the same multicast destination, but different ports:

// instance nat64-wkp-lower
 "multicast address": "ff08::db8:64:64",
 "multicast port": "6240",
 "in interface": "eno3",
 "out interface": "eno3",
 "reuseaddr": 1
// instance nat64-wkp-upper
 "multicast address": "ff08::db8:64:64",
 "multicast port": "6241",
 "in interface": "eno3",
 "out interface": "eno3",
 "reuseaddr": 1

The two nodes are directly connected to each other via ethernet. We have not assigned an addresses; only link-local IPv6 are automatically assigned.

When we start the instances and generate traffic, the translation is occurring. On the node that is translating the traffic, we see session entries being generated:

Expires in 0:00:58.376
Remote: dns.google#38007	...#42020
Local: ...#38007	64:ff9b::808:808#42020

Additionally, on the failover host we see the multicast packets arriving on the interface with the correct multicast destination and port number.

However, we are not seeing any sessions being created in the failover host (the session table is empty). Is there any debugging or other information we can enable to try to find where the packets might be getting lost?

One other oddity that we noticed is that even though our test traffic is only going through a single instance on the primary box, BOTH instances are generating session sync traffic. This happens even if we set ss-enabled=false on one of the instances (traffic is still generated to both ports). I'm wondering if perhaps joold is receiving session updates for all instances and forwarding them, rather than only propagating changes for a particular instance.

However, even if that were the case, I'm not sure why the other instances aren't seeing any sessions arrive (I would instead expect to see too many if all instances were generating duplicate traffic).

@suffieldacademy
Copy link
Author

Brief update. We've peeled back as much of the configuration as possible, down to a single netfilter (not iptables) instance named "default", so we're as simple a setup as possible.

We are now seeing syslog entries from the primary host:

joold: Received a packet from kernelspace.
joold: Sending 280 bytes to the network...
joold: Sent 280 bytes to the network.

However, the failover machine is failing to process them, but is logging:

joold: Received 280 bytes from the network.
joold: Error receiving packet from kernelspace: Invalid input data or parameter

I'm not much of a C programmer, but looking for that error in the source brings me to usr/joold/modsocket.c, but I can't figure out much more from there.

We are isolating the forwarding interfaces, joold, etc all inside a network namespace (netns) as shown in the documentation. Is there any known odd behavior with netns and joold? Otherwise, I'm not sure why the packets aren't being processed.

@suffieldacademy
Copy link
Author

Now that I found that more specific error, I see this is referenced in #362. I am running 4.1.5 on Debian stable, so I will try to upgrade to a more recent version and see if I can unravel this further.

@suffieldacademy
Copy link
Author

OK, re-constituted the full multi-instance setup under v4.1.8 and having much better luck. Apologies for not starting with the most recent release, but usually try to stick to the Debian repos.

Sorry for the noise, but sometimes typing it all out helps me work through it!

@ydahhrk ydahhrk added the Support User needs help label Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Support User needs help
Projects
None yet
Development

No branches or pull requests

2 participants