-
-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: dns over tls timing out on latest image (TLS handshake) #2533
Comments
@qdm12 is more or less the only maintainer of this project and works on it in his free time.
|
I have a similar (and probably related) bug. Also using Surfshark. For me though, explicitly setting it to version 3.39.1 works but setting it to the latest seems to make it break.
|
I have the same issue too |
is it surfshark for you too ? |
|
Same issue for me with Surfshark/wireguard But when I run the latest, my log says |
This comment was marked as off-topic.
This comment was marked as off-topic.
@screamjojo you can try to solve the problem by yourself for now. Try a specific version tag instead of the latest tag ( I don't think you check your logs often, otherwise those warning should have your attention and should be solved allready to:
So, you could solve your problem by changing the version and wait for @qdm12 to solve the problem in a later update. kr., |
This comment was marked as off-topic.
This comment was marked as off-topic.
Hello there, thanks @frepke for the help! By the way @frepke are you using surfshark as well? Does it work for both v3.39.1 and the latest image? The v3.39.1 should closely work the same as v3.39.0, but the latest image has substantial changes especially the dns server/forwarder is completely changed, so that could be a reason? Maybe try with Regarding
This happens when the last commits are not triggering an image build, for example documentation or development setup commits. I could eventually fix it, but it does rarely happen 😉 Ps: Also just rechecked it works fine on my side with Mullvad wireguard for the sake of narrowing this down |
This comment was marked as off-topic.
This comment was marked as off-topic.
Yeah, still using Surfshark (unfortunately AdguardVPN isn't working with Gluetun 😔)
If I have to check/test something, let me know 😉 |
I'm having the same issue with Surfshark - v3.39 tag works fine, beyond does not and I get the same. Using Wireguard as the protocol. |
I can say that I also see this behavior |
Reading all this all over again, there seem to be 2 issues, most likely unrelated: These two errors
Despite the VPN connection actually working to get the public IP address and the TCP dial to cloudflare.com (aka health check):
I've seen this behavior, and it's most likely due to your MTU, so either try:
Also please double check if you can make it work with the image tag @epic0421 @haitham506 @frepke @the-jeffski (and more to come likely): It looks like your error is really just/mostly Now a few things on this:
PS: what you can try is the following to see if it works outside the custom DNS forwarder code:
This would run a DNS over TLS query to cloudflare (1.1.1.1) to resolve github.com: does this work when gluetun fails to resolve things? |
For me, with DOT=on with v3.39.1, it's not possible to setup a connection at all. |
For me, v3.39.1 works fine (DOT on/off). |
Actually now that I am testing it further, the connection does get established and is initially healthy, but becomes unhealthy very quickly, and then becomes healthy about a minute later. That error message keeps getting spammed though.
At the end, it does this and then the error messages stop. It then starts doing it again, making the container unhealthy and the cycle repeats.
|
The connection gets established (healthy) and than becomes (unhealthy) after seconds, it restarted 6 times after that it stayed connected but the dns errors keeps showing up but not spammed. :latest
|
Thanks for the reply, my homelab is currently out of order because of some infrastructure changes im making here at home, once its back in action in a couple of days i will do what you propose |
@Dreadwolf91 in my case lowering WIREGUARD_MTU from the default 1400 to 1320 fixed it. For Openvpn, you could try OPENVPN_MSSFIX=1320 I think (not exactly the same as the WIREGUARD_MTU but it should work). I'm also running over Wifi right now, so it may be related to that. Now, I also noticed the error came up in v3.39.x releases, it's just that a block list failed update would be logged as warning and not considered as "failed to setup the dns server" thing, unlike in the latest image. Before it was just an (obscure) warning logged:
And now it's
Plus an attempt to re-setup the DNS server completely. Others: please try lowering your MTU (WIREGUARD_MTU or OPENVPN_MSSFIX) to see if it helps?? |
With WIREGUARD_MTU=1320 the latest version is working for me |
WIREGUARD_MTU=1320 also works for me on latest. I was able to raise it to 1370 without any issues. |
That's a pretty strange fix, given it was working fine with an MTU of 1400 (for wireguard) with Unbound. Plaintext DNS (aka DOT=off) most likely works fine because it uses a lot less data (just UDP traffic without all the TLS stuff). |
Maybe this is nonsense (if so, @qdm12, please delete this comment) , but is it possible to make an automatic MTU adjuster: package main
import (
"context"
"crypto/tls"
"fmt"
"net"
"os/exec"
"strconv"
"strings"
"time"
)
func findOptimalMTU(serverAddress string) int {
minMTU, maxMTU := 1200, 1500 // Typical VPN MTU range; adjust as needed
for minMTU <= maxMTU {
midMTU := (minMTU + maxMTU) / 2
if isMTUSupported(serverAddress, midMTU) {
minMTU = midMTU + 1 // Try larger MTU
} else {
maxMTU = midMTU - 1 // Try smaller MTU
}
}
return maxMTU
}
func isMTUSupported(serverAddress string, mtu int) bool {
// Runs a ping command with the specified MTU
// Adjust the command for your system if necessary
cmd := exec.Command("ping", serverAddress, "-c", "1", "-M", "do", "-s", strconv.Itoa(mtu-28))
output, err := cmd.CombinedOutput()
if err != nil {
return false
}
return strings.Contains(string(output), "1 packets transmitted, 1 received")
}
func dialWithOptimalMTU(ctx context.Context, serverAddress, serverName string) (*tls.Conn, error) {
// Step 1: Find optimal MTU
optimalMTU := findOptimalMTU(serverAddress)
fmt.Printf("Optimal MTU found: %d\n", optimalMTU)
// Step 2: Configure network dialer with MTU if necessary
// This example doesn’t apply MTU directly to the connection, as Go’s net package does not support direct MTU settings
// Alternative libraries may be required for true MTU control on dialed connections
dialer := &net.Dialer{Timeout: 10 * time.Second}
conn, err := dialer.DialContext(ctx, "tcp", serverAddress)
if err != nil {
return nil, err
}
// Step 3: Wrap connection with TLS
tlsConf := &tls.Config{
MinVersion: tls.VersionTLS12,
ServerName: serverName,
}
return tls.Client(conn, tlsConf), nil
}
func main() {
ctx := context.Background()
serverAddress := "example.com:443" // Replace with actual server address
serverName := "example.com" // Replace with actual server name
conn, err := dialWithOptimalMTU(ctx, serverAddress, serverName)
if err != nil {
fmt.Println("Failed to connect:", err)
return
}
defer conn.Close()
fmt.Println("Connection successful with optimal MTU")
} |
@frepke I thought about it like 10 minutes ago 😄 That would be a nice addition, even without that bug we are facing. We could do this as soon as the VPN is up and restart the VPN (with the same exact settings, only the MTU changed), that would be cool but would require quite a bit of code changes. Anyway, before jumping into this (btw nice code!), I would prefer (ideally, if possibly at all) to understand why Unbound was okay communicating with DNS over TLS fine but the new Go code (really just TCP dial with TLS 🤷) doesn't make it, both with the same MTU. Since I cannot reproduce the exact error you have (the
To see if it works (and also how long it takes??? - the read timeout now is setup to 2 seconds, maybe that's too low) |
Thanks for the code compliment, but all credits belongs to ChatGPT 😔 |
TLDR: Please try running the latest image with More details: Root causes found at least on my side: Go 1.23
and
Running Gluetun with the environment variable I went down this rabbit hole because I noticed https (not just dns over tls) would fail downloading files (like in the original issue logs) with tls handshake timeout errors. That was hinting the DNS over TLS implementation might be okay, it's just that TLS was not behaving right. Reverting back to Gluetun v3.39 still worked fine regarding https, with the same MTU (1400) setting. So I went to look into the changes since Gluetun v3.39.0 and noticed Go was upgraded from 1.22 to 1.23; then went to check the Go 1.23 release notes; then ran with those few GODEBUG options to check which ones were necessary to make Gluetun great again 😄 And to my surprise, it worked out (at least on my side)!! 🎉 Now if this actually solves the problem with an MTU of 1400, I think the best course of action would be to:
Reasons being:
|
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Also, the latest image now default MTU is 1320 instead of 1400. Before closing this issue, I'll implement a "best MTU" mechanism with icmp pings as @frepke suggested though, since it seems like a great feature and would remove a lot of potential issues. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
It is not reliable? See qdm12/gluetun#2533 maybe.
It is not reliable? See qdm12/gluetun#2533 maybe.
Just for a small update: |
Not 100% sure if the issue should already be fixed by "latest image now default MTU is 1320", but I get a lot of warnings in regard to DoT with the current latest. The log message is not exactly the same as it also includes
docker-compose.yml: services:
gluetun:
image: qmcgaw/gluetun
cap_add:
- NET_ADMIN
devices:
- /dev/net/tun:/dev/net/tun
environment:
- VPN_SERVICE_PROVIDER=nordvpn
- VPN_TYPE=wireguard
- WIREGUARD_PRIVATE_KEY=#####
- SERVER_COUNTRIES=##### |
|
Also seeing this issue with airvpn. MTU on tun0 is 1320, for all three of my gluetun containers. |
Cloudflare does
I am not fully sure, what you mean with "When this happens". I assume it's in regard to the phases with the
I also had some 15 minute phases where there were a lot of healthcheck issues with
Looks like that is the case:
|
Not sure of this helps as I'm using k3s but here is my output of the same command. Please note I'm using surfshark as a provider.
Let me know if there is anything else I can provide to help further. |
I'm still having the same issue as @epic0421 with :latest. ProtonVPN + Wireguard + 1320 WIREGUARD_MTU. |
I am also getting: Did you ever find anything out? I am using nordvpn with wiregaurd, I have mtu 1320 |
@neal421 can you provide the output of |
@floriegl Sure thing, this is what I got.
|
@qdm12 I assume that the |
This version (2024-12-27T20:18:46.989Z (commit 61b053f)) finally fixed all the chaos, warnings, and errors in my logs. EDIT: Oops, spoke too soon. It's still happening (the following repeats). This is with "MSS Fix: 1320" in the openvpn settings:
Although the log reports MSS Fix as 1320, tun0 seems to be MTU 1500:
I tried reverting to v3.39.1, but the same thing happens, keeps restarting. |
I can also still replicate |
I'm having these same AAAA error messages as soon as I spin up anything that indexes torrent files. My synology slows to a crawl. Any suggestions? |
Is this urgent?
No
Host OS
Ubuntu 64-bit
CPU arch
x86_64
VPN service provider
Surfshark
What are you using to run the container
docker-compose
What is the version of Gluetun
v3.39.1
What's the problem 🤔
When using the latest image i get no internet connection. I don't know what the exact problem is but when i use for example v3.39.0 everything works fine.
Share your logs (at least 10 lines)
Share your configuration
The text was updated successfully, but these errors were encountered: