Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAOS reverse DNS takes 7.5s before it fails, making lots of things painfully slow #3768

Open
Ten0 opened this issue Dec 29, 2024 · 0 comments
Labels

Comments

@Ten0
Copy link

Ten0 commented Dec 29, 2024

Describe the issue you are experiencing

Logging in to the HAOS host (https://developers.home-assistant.io/docs/operating-system/debugging/), I can see that:

  1. Depending on the network I'm coming from, it takes forever to login. (This might be because the SSH server (dropbear?) makes reverse DNS lookups)
  2. iptables -L is extremely slow for all the Chain DOCKER section (same delay of 7.5s as above for every line of that section). Running it without the reverse DNS lookups is fast: iptables -L -n
  3. This is the systemd-resolved part that is slow. Indeed resolvectl query 172.30.33.1 takes 7.5s to return.

/etc/nssiwtch.conf (root path for hostname resolution) delegates to systemd-resolved via the resolve statement here:

hosts:          resolve [!UNAVAIL=return] files myhostname dns

Removing resolve [!UNAVAIL=return] from there, this removes usage of systemd-resolved, and iptables becomes fast again. (had to mount --bind the file to test this because otherwise fs is read-only)

/etc/systemd/resolved.conf contains:

[Resolve]
#DNS=
#FallbackDNS=1.1.1.1 8.8.8.8 1.0.0.1 8.8.4.4 2606:4700:4700::1111 2001:4860:4860::8888 2606:4700:4700::1001 2001:4860:4860::8844
#Domains=
DNSSEC=no
DNSOverTLS=no
#MulticastDNS=yes
#LLMNR=yes
#Cache=yes
DNSStubListener=no
#ReadEtcHosts=yes
#ResolveUnicastSingleLabel=no

Adding to this configuration file (bind mount again) then systemctl restart systemd-resolved:

MulticastDNS=no
LLMNR=no

solves the issue, it's very fast again.
Specifically, LLMNR seems to be responsible for 0.5s of the delay (I still get a painful 0.5s delay per iptables line), while mDNS seems to be responsible for 7s of it.

It is notable that the startup log of systemd-resolved gives:

homeassistant systemd-resolved[1478909]: mDNS-IPv4: There appears to be another mDNS responder running, or previously systemd-resolved crashed with some outstanding transfers.
homeassistant systemd-resolved[1478909]: mDNS-IPv6: There appears to be another mDNS responder running, or previously systemd-resolved crashed with some outstanding transfers.

Running the following command:

systemd-resolve --set-mdns=no --set-llmnr=no --interface=hassio

disables mDNS and LLMNR for the hassio interface, avoiding the issue. I don't expect this to survive a reboot.

It seems that we might be able to ~fix this by disabling llmnr and mdns on that interface like this. That being said, there may be a better way to solve this that may be achieved by understanding why these are both so slow on this interface, when they normally are not.

What operating system image do you use?

13.2 for rockpi-4a-plus https://github.com/citruz/haos-rockpi/releases/tag/13.2%2B20241104

What version of Home Assistant Operating System is installed?

Home Assistant OS 13.2.dev20241104

Did the problem occur after upgrading the Operating System?

No, but my installation is recent so I wouldn't know.

Hardware details

I'm running on a rockpi-4a-plus via https://github.com/citruz/haos-rockpi

Steps to reproduce the issue

  1. ssh to the HAOS host (https://developers.home-assistant.io/docs/operating-system/debugging/)
  2. resolvectl query 172.30.33.1 -> See how that's super slow
  3. systemd-resolve --set-mdns=no --set-llmnr=no --interface=hassio -> See how running 2 again isn't super slow anymore

Anything in the Supervisor logs that might be useful for us?

no

Anything in the Host logs that might be useful for us?

2024-12-29 15:44:32.301 homeassistant systemd[1]: Stopping Network Name Resolution...
2024-12-29 15:44:32.312 homeassistant systemd[1]: systemd-resolved.service: Deactivated successfully.
2024-12-29 15:44:32.313 homeassistant systemd[1]: Stopped Network Name Resolution.
2024-12-29 15:44:32.331 homeassistant kernel: audit: type=1334 audit(1735487072.317:7235): prog-id=2108 op=LOAD
2024-12-29 15:44:32.376 homeassistant systemd[1]: Starting Network Name Resolution...
2024-12-29 15:44:32.381 homeassistant kernel: audit: type=1334 audit(1735487072.367:7236): prog-id=2107 op=UNLOAD
2024-12-29 15:44:32.730 homeassistant systemd-resolved[1478909]: Positive Trust Anchors:
2024-12-29 15:44:32.731 homeassistant systemd-resolved[1478909]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d
2024-12-29 15:44:32.731 homeassistant systemd-resolved[1478909]: Negative trust anchors: home.arpa 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arpa 20.172.in-addr.arpa 21.172.in-addr.arpa 22.172.in-addr.arpa 23.172.in-addr.arpa 24.172.in-addr.arpa 25.172.in-addr.arpa 26.172.in-addr.arpa 27.172.in-addr.arpa 28.172.in-addr.arpa 29.172.in-addr.arpa 30.172.in-addr.arpa 31.172.in-addr.arpa 170.0.0.192.in-addr.arpa 171.0.0.192.in-addr.arpa 168.192.in-addr.arpa d.f.ip6.arpa ipv4only.arpa resolver.arpa corp home internal intranet lan local private test
2024-12-29 15:44:32.736 homeassistant systemd-resolved[1478909]: Using system hostname 'homeassistant'.
2024-12-29 15:44:32.737 homeassistant systemd-resolved[1478909]: mDNS-IPv4: There appears to be another mDNS responder running, or previously systemd-resolved crashed with some outstanding transfers.
2024-12-29 15:44:32.738 homeassistant systemd-resolved[1478909]: mDNS-IPv6: There appears to be another mDNS responder running, or previously systemd-resolved crashed with some outstanding transfers.
2024-12-29 15:44:32.743 homeassistant systemd[1]: Started Network Name Resolution.

System information

System Information

version core-2024.12.5
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.13.0
os_name Linux
os_version 6.6.54-haos
arch aarch64
timezone Europe/Paris
config_dir /config
Home Assistant Cloud
logged_in false
can_reach_cert_server ok
can_reach_cloud_auth ok
can_reach_cloud ok
Home Assistant Supervisor
host_os Home Assistant OS 13.2.dev20241104
update_channel stable
supervisor_version supervisor-2024.12.0
agent_version 1.6.0
docker_version 27.2.0
disk_total 13.5 GB
disk_used 5.5 GB
healthy true
supported true
host_connectivity true
supervisor_connectivity true
ntp_synchronized true
virtualization
board rockpi-4a-plus
supervisor_api ok
version_api ok
installed_addons Advanced SSH & Web Terminal (19.0.0), File editor (5.8.0), Home Assistant Google Drive Backup (0.112.1), Mosquitto broker (6.4.1), Zigbee2MQTT (1.42.0-2), AppDaemon (0.16.7), WireGuard (0.10.2), Duck DNS (1.18.0), NGINX Home Assistant SSL proxy (3.11.1)
Dashboards
dashboards 3
resources 0
views 1
mode storage
Recorder
oldest_recorder_run December 13, 2024 at 15:28
current_recorder_run December 28, 2024 at 03:21
estimated_db_size 24.19 MiB
database_engine sqlite
database_version 3.45.3

Additional information

Output of resolvectl status:

Global
           Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
    resolv.conf mode: foreign
  Current DNS Server: 192.168.1.1
         DNS Servers: 192.168.1.1 2a01:cb04:5e4:2700:861e:a3ff:fef6:3b70
Fallback DNS Servers: 1.1.1.1#cloudflare-dns.com 8.8.8.8#dns.google 1.0.0.1#cloudflare-dns.com 8.8.4.4#dns.google
                      2606:4700:4700::1111#cloudflare-dns.com 2001:4860:4860::8888#dns.google 2606:4700:4700::1001#cloudflare-dns.com
                      2001:4860:4860::8844#dns.google
          DNS Domain: home

Link 2 (end0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6 mDNS/IPv4 mDNS/IPv6
         Protocols: +DefaultRoute +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 192.168.1.1
       DNS Servers: 192.168.1.1 2a01:cb04:5e4:2700:861e:a3ff:fef6:3b70 fe80::861e:a3ff:fef6:3b70
        DNS Domain: home

Link 3 (hassio)
    Current Scopes: LLMNR/IPv4 LLMNR/IPv6 mDNS/IPv4 mDNS/IPv6
         Protocols: -DefaultRoute +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 4 (docker0)
    Current Scopes: LLMNR/IPv4 LLMNR/IPv6 mDNS/IPv4 mDNS/IPv6
         Protocols: -DefaultRoute +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 6 (veth6788378)
    Current Scopes: LLMNR/IPv6 mDNS/IPv6
         Protocols: -DefaultRoute +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 12 (veth2e9348c)
    Current Scopes: LLMNR/IPv6 mDNS/IPv6
         Protocols: -DefaultRoute +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported

... and a bunch of other similar veth interfaces.

After disabling mDNS and LLMNR, we get:

Global
           Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
    resolv.conf mode: foreign
  Current DNS Server: 192.168.1.1
         DNS Servers: 192.168.1.1 2a01:cb04:5e4:2700:861e:a3ff:fef6:3b70
Fallback DNS Servers: 1.1.1.1#cloudflare-dns.com 8.8.8.8#dns.google 1.0.0.1#cloudflare-dns.com 8.8.4.4#dns.google 2606:4700:4700::1111#cloudflare-dns.com 2001:4860:4860::8888#dns.google 2606:4700:4700::1001#cloudflare-dns.com 2001:4860:4860::8844#dns.google
          DNS Domain: home

Link 2 (end0)
    Current Scopes: DNS
         Protocols: +DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
       DNS Servers: 192.168.1.1 2a01:cb04:5e4:2700:861e:a3ff:fef6:3b70 fe80::861e:a3ff:fef6:3b70
        DNS Domain: home

Link 3 (hassio)
    Current Scopes: none
         Protocols: -DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 4 (docker0)
    Current Scopes: none
         Protocols: -DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 6 (veth6788378)
    Current Scopes: none
         Protocols: -DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupport
@Ten0 Ten0 added the bug label Dec 29, 2024
sairon added a commit that referenced this issue Jan 3, 2025
With "cgroup: Use kernel command line to disable memory cgroup" merged to RPi
kernel as 86099de [1], the device tree now contains "cgroup_disable=memory"
parameter. The parameters are parsed in the order defined in the cmdline and
with the previous order, the memory CG ends up disabled. Switching the order
fixes that and makes the order similar to what we get with standard bootloader
and parameters in cmdline.txt only.

The possible downside is that it won't be possible to override parameters from
hardcoded bootargs_hassos using cmdline.txt anymore, however, it's unlikely any
of these parameters will need to be adjusted by users.

Fixes #3768

[1] raspberrypi/linux@86099de
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant