Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Lose Update to 0.23.0 (lost mtu) #2250

Open
3 of 4 tasks
pstvasko opened this issue Nov 21, 2024 · 2 comments
Open
3 of 4 tasks

[BUG] Lose Update to 0.23.0 (lost mtu) #2250

pstvasko opened this issue Nov 21, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@pstvasko
Copy link

pstvasko commented Nov 21, 2024

Is this a support request?

  • This is not a support request

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

The tunnel breaks on gate1.

Expected Behavior

Dont lost traffic

Steps To Reproduce

Hi. After updating to version 23, there is an issue in the Tailscale network.
I have a complex network connecting two Tailscale installations:
100.64.0.0 - headscale1 - gate1 - gate2 - headscale2 - 100.80.0.0

When I download between 100.64.0.0 and 100.80.0.0, the speed reaches a maximum of 2 Gbps, and some issues start occurring. Packets stop flowing on the segment 100.64.0.0 - headscale1 (although if I reduce the MTU to 932, pings work).

There are about 500 clients in the network. Could you advise in which direction I should investigate?

Environment

- OS: AlmaLinux8
- Headscale version: 0.23.0
- Tailscale version: 1.76.6

Runtime environment

  • Headscale is behind a (reverse) proxy
  • Headscale runs in a container

Anything else?

Headscale

2024-11-21T21:31:33Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771
2024-11-21T21:31:35Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771
2024-11-21T21:31:37Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771
2024-11-21T21:31:38Z ERR Failed to fetch node from the database with node key: nodekey:10715a5defd407c11146b436449e3fdc771d8e4adc68b8dac0077e5e3d64d370 handler=NoisePollNetMap
2024-11-21T21:31:39Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771
2024-11-21T21:31:40Z INF home/runner/work/headscale/headscale/hscontrol/auth_noise.go:44 > unsupported client connected client_version=58 min_version=61
2024-11-21T21:31:41Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771
2024-11-21T21:31:41Z INF home/runner/work/headscale/headscale/hscontrol/auth_noise.go:44 > unsupported client connected client_version=58 min_version=61
2024-11-21T21:31:42Z INF home/runner/work/headscale/headscale/hscontrol/auth.go:28 > Successfully sent auth url: https://headscale.*****/oidc/register/mkey:ad30ca2d2f62ca426624930d6455211e40554add598c1a99420ffc8e6a2d8c0c expiry=-62135596800 followup=https://headscale.*****/oidc/register/mkey:ad30ca2d2f62ca426624930d6455211e40554add598c1a99420ffc8e6a2d8c0c machine_key=[rTDKL] node=vm-po4 node_key=[QxxVm] node_key_old=[bYjMr]
2024-11-21T21:31:43Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771
2024-11-21T21:31:45Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771```

tailscale:

```22 00:31:10 tailscaled[1378041]: wgengine: idle peer [Jpvng] now active, reconfiguring WireGuard 
22 00:31:10 tailscaled[1378041]: wgengine: Reconfig: configuring userspace WireGuard config (with 70/459 peers) 
22 00:31:20 tailscaled[1378041]: wgengine: Reconfig: configuring userspace WireGuard config (with 69/459 peers) 
22 00:31:35 tailscaled[1378041]: open-conn-track: flow TCP (TCP 10.10.0.105:44438 => 100.80.0.2:9188) got RST by peer 
22 00:31:38 tailscaled[1378041]: open-conn-track: flow TCP (TCP 10.10.0.105:55990 => 100.80.0.2:9187) got RST by peer 
22 00:31:38 tailscaled[1378041]: control: NetInfo: NetInfo{varies=false hairpin= ipv6=false ipv6os=false udp=true icmpv4=false derp=#999 portmap= link="" firewallmode="ipt-default"} 
22 00:31:53 tailscaled[1378041]: wgengine: idle peer [Qif6f] now active, reconfiguring WireGuard 
22 00:31:53 tailscaled[1378041]: wgengine: Reconfig: configuring userspace WireGuard config (with 69/459 peers) 
22 00:32:05 tailscaled[1378041]: open-conn-track: flow TCP (TCP 10.10.0.105:36450 => 100.80.0.2:9188) got RST by peer 
22 00:32:08 tailscaled[1378041]: open-conn-track: flow TCP (TCP 10.10.0.105:43332 => 100.80.0.2:9187) got RST by peer 
22 00:32:10 tailscaled[1378041]: wgengine: idle peer [qoqFH] now active, reconfiguring WireGuard 
22 00:32:10 tailscaled[1378041]: wgengine: Reconfig: configuring userspace WireGuard config (with 70/459 peers)```
@pstvasko pstvasko added the bug Something isn't working label Nov 21, 2024
@kradalby
Copy link
Collaborator

Have you changed the Tailscale version between these nodes recently? I am not rolling out that it would be some parameter changed in Headscale, but I am surprised if that change would have any real impact on the client side, changes on the client would be more expected if I was to guess.

Can you try with multiple Tailscale versions vs multiple Headscale versions?

@pstvasko
Copy link
Author

I tested all possible versions supported by 0.23.
Additionally, the connection is restored after reconnecting to the network.

I’m having issues with Tailscale. Specifically, after reaching a speed of 2 Gbps, the connection drops after about a minute and doesn’t recover until I restart Telnet itself.

At the same time, I can ping the node like this:
ping -s 900 100.64.0.1

But not like this:
ping -s 1400 100.64.0.1

I have around 500 clients and a complex network between data centers.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants