Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiregion cluster on EKS: service port type should be UDP #69847

Closed
junaid-ali opened this issue Sep 4, 2021 · 2 comments
Closed

Multiregion cluster on EKS: service port type should be UDP #69847

junaid-ali opened this issue Sep 4, 2021 · 2 comments
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. no-issue-activity O-community Originated from the community T-multiregion X-stale

Comments

@junaid-ali
Copy link

junaid-ali commented Sep 4, 2021

Describe the problem

The doc and manifest suggest to use TCP protocol for port 53 to handle DNS queries. I got i/o timeout errors in CoreDNS logs, and the cross-region DNS resolution kept failing.

Tried to add UDP port alongside the TCP port but mixed ports are only available as part of a feature gate in Kubernetes version 1.20 (kubernetes/kubernetes#23880), and in EKS it doesn't seem to have been enabled yet (kubernetes-sigs/aws-load-balancer-controller#1608)

Ideally, a TCP_UDP protocol would have been the appropriate one, but the aws-loadbalancer-controller needs to have that support: kubernetes-sigs/aws-load-balancer-controller#1608 (comment)

For me, with EKS version 1.20, I was able to resolve the cross-region DB pods only when I recreated the DNS service with the protocol to UDP here:

And then we have to remove the force_tcp config under the server blocks:

<cluster-namespace-2>.svc.cluster.local:53 { # <---- Modify
log
errors
ready
cache 10
forward . <ip1> <ip2> <ip3> { # <---- Modify
force_tcp # <---- Modify
}
}

Also, in the docs there should be an example for testing the cross-region DNS resolution (via the NLB), for example:

$ dig @<one-of-the-NLB-IP> cockroachdb.com

$ # Also for pod dns resolution from eu-west-1 to us-west-1, dig via a pod in the eu-west-1 cluster
$ dig @<one of the us-west-1 NLB IP> cockroachdb-0.cockroachdb.<cluster-namespace-name in us-west-1>.svc.cluster.local

To Reproduce
Followed the documentation, NLB with TCP protocol for port 53, CoreDNS will have errors resolving k8s services in the other regions:

$ kubectl -n kube-system logs <coredns pod name> -f
...
[ERROR] plugin/errors: 2 cockroachdb-0.cockroachdb.testdb-us-west-1.svc.cluster.local. AAAA: read tcp 10.13.2.140:40894->44.xx.xx.xx:53: i/o timeout

Expected behavior
Cross-region k8s DNS resolution working seamlessly.

Environment:

  • CockroachDB version v21.1.8
  • Server OS: Kubernetes EKS v1.20.7-eks-d88609

Jira issue: CRDB-9828

@junaid-ali junaid-ali added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Sep 4, 2021
@blathers-crl
Copy link

blathers-crl bot commented Sep 4, 2021

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I was unable to automatically find someone to ping.

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-untriaged blathers was unable to find an owner labels Sep 4, 2021
@RichardJCai RichardJCai added T-multiregion and removed X-blathers-untriaged blathers was unable to find an owner labels Oct 11, 2021
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. no-issue-activity O-community Originated from the community T-multiregion X-stale
Projects
None yet
Development

No branches or pull requests

2 participants