Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeLocal DNS container hung on SIGTERM #394

Closed
rtheis opened this issue Aug 10, 2020 · 26 comments
Closed

NodeLocal DNS container hung on SIGTERM #394

rtheis opened this issue Aug 10, 2020 · 26 comments

Comments

@rtheis
Copy link

rtheis commented Aug 10, 2020

We've seen the NodeLocal DNS container hang when receiving a SIGTERM. This causes the container to eventually be terminated via SIGKILL thus leaving iptables rules behind. Here is an example of the container logs captured during this hang.

2020/08/10 13:36:53 [INFO] Using Corefile /etc/Corefile
2020/08/10 13:36:53 [INFO] Updated Corefile with 0 custom stubdomains and upstream servers /etc/resolv.conf
2020/08/10 13:36:53 [INFO] Using config file:
cluster.local:53 {
    errors
    cache {
            success 9984 30
            denial 9984 5
    }
    reload
    loop
    bind 169.254.20.10 172.21.0.10
    forward . 172.21.122.96 {
            force_tcp
    }
    prometheus :9253
    health 169.254.20.10:8080
    }
in-addr.arpa:53 {
    errors
    cache 30
    reload
    loop
    bind 169.254.20.10 172.21.0.10
    forward . 172.21.122.96 {
            force_tcp
    }
    prometheus :9253
    }
ip6.arpa:53 {
    errors
    cache 30
    reload
    loop
    bind 169.254.20.10 172.21.0.10
    forward . 172.21.122.96 {
            force_tcp
    }
    prometheus :9253
    }
.:53 {
    errors
    cache 30
    reload
    loop
    bind 169.254.20.10 172.21.0.10
    forward . /etc/resolv.conf {
            force_tcp
    }
    prometheus :9253
    }
2020/08/10 13:36:53 [INFO] Updated Corefile with 0 custom stubdomains and upstream servers /etc/resolv.conf
2020/08/10 13:36:53 [INFO] Using config file:
cluster.local:53 {
    errors
    cache {
            success 9984 30
            denial 9984 5
    }
    reload
    loop
    bind 169.254.20.10 172.21.0.10
    forward . 172.21.122.96 {
            force_tcp
    }
    prometheus :9253
    health 169.254.20.10:8080
    }
in-addr.arpa:53 {
    errors
    cache 30
    reload
    loop
    bind 169.254.20.10 172.21.0.10
    forward . 172.21.122.96 {
            force_tcp
    }
    prometheus :9253
    }
ip6.arpa:53 {
    errors
    cache 30
    reload
    loop
    bind 169.254.20.10 172.21.0.10
    forward . 172.21.122.96 {
            force_tcp
    }
    prometheus :9253
    }
.:53 {
    errors
    cache 30
    reload
    loop
    bind 169.254.20.10 172.21.0.10
    forward . /etc/resolv.conf {
            force_tcp
    }
    prometheus :9253
    }
2020/08/10 13:36:53 [INFO] Added interface - nodelocaldns
in-addr.arpa.:53 on 169.254.20.10
ip6.arpa.:53 on 169.254.20.10
.:53 on 169.254.20.10
cluster.local.:53 on 169.254.20.10
ip6.arpa.:53 on 172.21.0.10
.:53 on 172.21.0.10
cluster.local.:53 on 172.21.0.10
in-addr.arpa.:53 on 172.21.0.10
[INFO] plugin/reload: Running configuration MD5 = e86dda246de588881b4e3ac178781999
CoreDNS-1.6.7
linux/amd64, go1.11.13, 
[INFO] Added back nodelocaldns rule - {raw PREROUTING [-p tcp -d 169.254.20.10 --dport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw PREROUTING [-p udp -d 169.254.20.10 --dport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {filter INPUT [-p tcp -d 169.254.20.10 --dport 53 -j ACCEPT]}
[INFO] Added back nodelocaldns rule - {filter INPUT [-p udp -d 169.254.20.10 --dport 53 -j ACCEPT]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p tcp -s 169.254.20.10 --sport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p udp -s 169.254.20.10 --sport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {filter OUTPUT [-p tcp -s 169.254.20.10 --sport 53 -j ACCEPT]}
[INFO] Added back nodelocaldns rule - {filter OUTPUT [-p udp -s 169.254.20.10 --sport 53 -j ACCEPT]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p tcp -d 169.254.20.10 --dport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p udp -d 169.254.20.10 --dport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p tcp -d 169.254.20.10 --dport 8080 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p tcp -s 169.254.20.10 --sport 8080 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw PREROUTING [-p tcp -d 172.21.0.10 --dport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw PREROUTING [-p udp -d 172.21.0.10 --dport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {filter INPUT [-p tcp -d 172.21.0.10 --dport 53 -j ACCEPT]}
[INFO] Added back nodelocaldns rule - {filter INPUT [-p udp -d 172.21.0.10 --dport 53 -j ACCEPT]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p tcp -s 172.21.0.10 --sport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p udp -s 172.21.0.10 --sport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {filter OUTPUT [-p tcp -s 172.21.0.10 --sport 53 -j ACCEPT]}
[INFO] Added back nodelocaldns rule - {filter OUTPUT [-p udp -s 172.21.0.10 --sport 53 -j ACCEPT]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p tcp -d 172.21.0.10 --dport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p udp -d 172.21.0.10 --dport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p tcp -d 172.21.0.10 --dport 8080 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p tcp -s 172.21.0.10 --sport 8080 -j NOTRACK]}
[INFO] Updated Corefile with 0 custom stubdomains and upstream servers 8.8.8.8 8.8.4.4
[INFO] Using config file:
cluster.local:53 {
    errors
    cache {
            success 9984 30
            denial 9984 5
    }
    reload
    loop
    bind 169.254.20.10 172.21.0.10
    forward . 172.21.122.96 {
            force_tcp
    }
    prometheus :9253
    health 169.254.20.10:8080
    }
in-addr.arpa:53 {
    errors
    cache 30
    reload
    loop
    bind 169.254.20.10 172.21.0.10
    forward . 172.21.122.96 {
            force_tcp
    }
    prometheus :9253
    }
ip6.arpa:53 {
    errors
    cache 30
    reload
    loop
    bind 169.254.20.10 172.21.0.10
    forward . 172.21.122.96 {
            force_tcp
    }
    prometheus :9253
    }
.:53 {
    errors
    cache 30
    reload
    loop
    bind 169.254.20.10 172.21.0.10
    forward . 8.8.8.8 8.8.4.4
    prometheus :9253
    }
[INFO] Reloading
[INFO] SIGTERM: Shutting down servers then terminating

I believe the hang is occurring when SIGTERM is received during a reload of the config.

@prameshj
Copy link
Contributor

prameshj commented Aug 18, 2020

What image version is this on?

@rtheis
Copy link
Author

rtheis commented Aug 19, 2020

Image version 1.15.13

@prameshj
Copy link
Contributor

How are you updating nodelocaldns config file? Is it by editing the configmap/coreFile directly?

I tried updating kube-dns configmap (which will trigger an update to the coreFile to pick up the new upstreamservers), and that reloads fine. I also passed in SIGTERM via docker to the node-local-dns pod and that was handled fine too. I tried with 1.15.13 image as well.

docker kill --signal=SIGTERM <podID>

Logs:

[INFO] SIGTERM: Shutting down servers then terminating
[INFO] Tearing down
[WARNING] Exiting iptables/interface check goroutine

Do the rules get cleaned up when deleting the pod via kubectl? What env is this being run on?

@rtheis
Copy link
Author

rtheis commented Aug 19, 2020

@prameshj I updated the configmap via kubectl apply. Both the config update and SIGTERM are handled fine by themselves. This is a timing bug where a SIGTERM during the config update hangs.

@prameshj
Copy link
Contributor

is there a way to reproduce it?

@rtheis
Copy link
Author

rtheis commented Aug 19, 2020

Yes, but you'll likely have to run the test many times. Basically update the configmap then terminate the pod at about the same time.

@prameshj
Copy link
Contributor

ok, few more questions:

How is it manifesting in your environment? Is the config being changed frequently/in an automated mechanism?
Any other pods running in hostNetwork mode that could be modifying iptables rules ? It could be contention for the iptables lock.
Why does the pod get killed with SIGKILL? is that manual?
Does it happen on all node-local-dns pods in the cluster or just on one node at random?

@rtheis
Copy link
Author

rtheis commented Aug 20, 2020

@prameshj We see the failure in our automated testing. We run Calico which uses the hostNetwork and modifies iptables rules. The SIGKILL happens because the Kubernetes termination grace period expires (default 30 seconds and we raised it to 15 minutes and still see the hang). The hang normally only happens on one node during the test.

@rtheis
Copy link
Author

rtheis commented Aug 27, 2020

Update: This may not be related to the config reload. I've seen a couple instances of this problem occur when there was no reload in progress. It seems to me that this hang may be related to iptables lock contention rather than config reload.

@prameshj
Copy link
Contributor

seems flannel-io/flannel#988 might be related. Is there any "resource unavailable" error in any of the hostNetwork pod or in journalctl/syslog on the node?

@rtheis
Copy link
Author

rtheis commented Aug 28, 2020

@prameshj I'll check the next time I see the problem.

@rtheis
Copy link
Author

rtheis commented Sep 10, 2020

@prameshj I didn't see any "resource unavailable" errors during the latest recreate of this problem.

@raul-te
Copy link

raul-te commented Oct 23, 2020

I'm experiencing a similar issue. I can confirm that this does not happen on config reloads. From a Kubernetes point of view, the pod is marked as Completed and it does not get recreated:

Status:               Running
Controlled By:  DaemonSet/node-local-dns
Containers:
  node-cache:
    Container ID:  docker://99dcfc63ff748bb5e9aacfa415d70a3754523a6ddd6d8ac67f923fc0c5f169c0
    Image:         k8s.gcr.io/dns/k8s-dns-node-cache:1.15.14
    Image ID:      docker-pullable://k8s.gcr.io/dns/k8s-dns-node-cache@sha256:140f63dddfcdeb898cd3ae4a505988bdca29f79cd43d08f506936db1c94cadab
    Ports:         53/UDP, 53/TCP, 9253/TCP
    Host Ports:    53/UDP, 53/TCP, 9253/TCP
    Args:
      -localip
      169.254.20.10
      -conf
      /etc/Corefile
      -upstreamsvc
      coredns
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 07 Oct 2020 17:24:10 -0700
      Finished:     Sun, 18 Oct 2020 17:00:08 -0700
    Ready:          False

You can see that it finished on Oct 18th, and my last config change was on the 13th.

kube-system   node-local-dns-t8xs8                                                 0/1     Completed           0          14d
 docker ps | grep node-cache
99dcfc63ff74        d57f36ff0fbd                                                                         "/node-cache -locali…"   2 weeks ago         Up 2 weeks                              k8s_node-cache_node-local-dns-t8xs8_kube-system_1f62dea8-2ee2-41a9-ab4f-221e9fdc21bb_0

Extract from docker inspect:

       "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2020-10-08T00:24:10.190189725Z",
            "FinishedAt": "2020-10-19T00:00:08.063288024Z"
        },

It's interesting that Docker says the container's been up for 2 weeks but it also says that the status is exited.

docker logs 99dcfc63ff74 --tail 15
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p tcp -s 169.254.20.10 --sport 8080 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw PREROUTING [-p tcp -d 169.254.20.10 --dport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw PREROUTING [-p udp -d 169.254.20.10 --dport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {filter INPUT [-p tcp -d 169.254.20.10 --dport 53 -j ACCEPT]}
[INFO] Added back nodelocaldns rule - {filter INPUT [-p udp -d 169.254.20.10 --dport 53 -j ACCEPT]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p tcp -s 169.254.20.10 --sport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p udp -s 169.254.20.10 --sport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {filter OUTPUT [-p tcp -s 169.254.20.10 --sport 53 -j ACCEPT]}
[INFO] Added back nodelocaldns rule - {filter OUTPUT [-p udp -s 169.254.20.10 --sport 53 -j ACCEPT]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p tcp -d 169.254.20.10 --dport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p udp -d 169.254.20.10 --dport 53 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p tcp -d 169.254.20.10 --dport 8080 -j NOTRACK]}
[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p tcp -s 169.254.20.10 --sport 8080 -j NOTRACK]}
[INFO] Error checking/adding iptables rule {raw PREROUTING [-p tcp -d 169.254.20.10 --dport 53 -j NOTRACK]}, due to xtables lock in use, retrying in 60ns
[INFO] SIGTERM: Shutting down servers then terminating

The logs don't contain a timestamp, but I checked on Kibana and managed to extract them:

October 16th 2020, 19:05:10.295[INFO] Added back nodelocaldns rule - {raw OUTPUT [-p tcp -s 169.254.20.10 --sport 8080 -j NOTRACK]}
October 18th 2020, 14:11:12.259[INFO] Error checking/adding iptables rule {raw PREROUTING [-p tcp -d 169.254.20.10 --dport 53 -j NOTRACK]}, due to xtables lock in use, retrying in 60ns
October 18th 2020, 17:00:07.627[INFO] SIGTERM: Shutting down servers then terminating

Based on the time difference I think those messages regarding xtables lock in use are unrelated.

At this point it's unclear to me what's sending the SIGTERM.

This server is running on AWS, but not EKS. I'm running k8s 1.16.13. I don't see any "resource unavailable" logs.

I have this node cordoned and with the pod in Completed state. @prameshj please let me know if you want me to get any extra information before I restart the pod.

@raul-te
Copy link

raul-te commented Oct 23, 2020

I’ve found out that the node became NotReady shortly before the node-local-dns-cache container received the SIGTERM. This happened on the two nodes were we experienced this issue.

It’s interesting to note that the pod that received the SIGTERM has the node.kubernetes.io/not-ready toleration given that it belongs to a DaemonSet.

@prameshj
Copy link
Contributor

That's interesting. Did the Node become ready again? Is node-local-dns pod still stuck while terminating?
If node-local-dns got SIGKILL'ed, it should have started up again on the same node and taken care of cleaning up extra iptables rules?

@raul-te
Copy link

raul-te commented Oct 26, 2020

Yes, the node became Ready again and all the pods that needed DNS were in CrashLoopBackOff. As shown in the logs I shared, the container was not running — there was no process for it — but according to docker ps the container has been Up 2 weeks.

There's no trace of any SIGKILL in my case; only the SIGTERM I shared.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 24, 2021
@rtheis
Copy link
Author

rtheis commented Jan 25, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2021
@prameshj
Copy link
Contributor

Yes, the node became Ready again and all the pods that needed DNS were in CrashLoopBackOff. As shown in the logs I shared, the container was not running — there was no process for it — but according to docker ps the container has been Up 2 weeks.

I realize this is from several weeks back, but any idea why they were in crashloop backoff? Was it because port 53 was in use? Is there a reliable way to repro this?

@raul-te
Copy link

raul-te commented Feb 12, 2021

Hey @prameshj ! I have not seen this problem again. Port 53 was not in use.

Unfortunately I'm no able to reproduce this condition.

@prameshj
Copy link
Contributor

prameshj commented Apr 1, 2021

Thanks, closing for now.

@prameshj prameshj closed this as completed Apr 1, 2021
@rtheis
Copy link
Author

rtheis commented Apr 1, 2021

We have not been able to recreate this problem on versions 1.16 and later. Thank you.

@rtheis
Copy link
Author

rtheis commented Apr 20, 2021

@prameshj Can you please reopen this issue? I have recreated the problem on version 1.17.3. Thank you.

@prameshj
Copy link
Contributor

Should we reopen this or do you want to create a new issue? I see there are 2 different scenarios described in this issue. Might be cleaner to open a new one with the symptoms you are seeing.

@rtheis
Copy link
Author

rtheis commented Apr 30, 2021

@prameshj Thanks, I'll create a new issue on the next recreate on version 1.17.3.

@rtheis
Copy link
Author

rtheis commented Jun 4, 2021

#453

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants