-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
supervisor generates invalid DNS names that break reverse DNS lookups #2077
Comments
[ramirogm] This has attached https://jel.ly.fish/f9dc24eb-389d-4e7a-905f-b33d6e4c2b5e |
Fixes #2077 Change-type: patch Signed-off-by: Ramiro Gonzalez <[email protected]>
So if I understand correctly, Instead of shortening the commit in the container name, which we cannot do right now, maybe we can tell the engine not to use the container name for DNS resolution? Not sure if that's an option |
I replied to the user in that ticket above, which corresponds to https://forums.balena.io/t/352814, but I wasn't able to reproduce the DNS resolution issue when switching the base image used by |
it's possible that is a behavior of a specific build or version of ping |
ubi-minimal doesn't even ship with ping; I had to install it. I had to switch to the regular |
Clarification -- I switched the image from |
Closing as the issue isn't caused by the Supervisor, but by a specific base image. |
Reopening as I can't determine quite yet whether the issue is due to the base image. |
Hi @cywang117 @pipex I'll try to explain in more detail what we found while working on the support thread: The issue is related to the container name being longer than 63 chars, and the fact that the docker embedded DNS resolver is used. The embedded DNS resolver logs an error with invalid DNS names that can be seen in jounrnalctl output. I can reproduce the example using version: "2.1"
services:
test1:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
test2:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
zookeeperXXXX64:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
zookeeperXXX63:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
zookeeperXX62:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
Once instantiated, I get the following - note the container names:
where:
If I ssh into the
and if you check the last entries in
A reference to this error was found by balena user hesch on https://forums.docker.com/t/reverse-dns-fails-dns-bad-rdata-if-container-name-is-larger-than-62-characters/107330 Based on that, I found on this docker guide Container name configured using --name is used to discover a container within an user-defined docker network. The embedded DNS server maintains the mapping between the container name and its IP address (on the network the container is connected to). From there:
The difference is 8 chars @cywang117 In your docker-compose, I think that what's happening is that if you |
Hi I'm hesch from the support thread in Balena forums. I completely support what @ramirogm wrote, it's not related to the ubi8 image, that was just a coincidence which led me to the wrong assumption at the beginning (the ubi8 containers had longer container names by coincidence). @pipex I understand that limiting the container name to 63 chars will cut of randomly parts that you might need in other places of the code. From a short glance at the code I assume you use the commit id to distinguish different container commit versions. For that I'd suppose that also a shortened commit ID should provide safe collision avoidance. Something like that: |
@hs-neax unfortunately is not just collision avoidance on the supervisor code. The release uuid / commit is an identifier used when reporting information to the API, which means that we would also need to modify the API to accept the shortened commit. Removing the
Yeah, I took a quick glance and I haven't found a way to do it via docker options, so that might be a dead end |
@ramirogm From that guide, in the table under the
As services are created with aliases too, it sounds like this would work as well. I modified the docker-compose above to run with Supervisor in local mode, where service names are tailed with
When pinging the longest name from
Try waiting for a bit after pinging the service with the long name, and see if the behavior is the same for you. I closed this issue before because I didn't think it was related to container names; I still think it's not, because aliases according to the docker guide are also known to the DNS resolver. Perhaps it's the order of resolution that causes the error logs in the journal -- when pinging by service name, the resolver will always first resolve by full container name, and that's where the error appears. Thoughts? |
Hi @cywang117 From what we found, what fails is the reverse name lookup. In the following example, I ran All three fail to perform the reverse DNS lookup, with the error showing up in You can compare a successful ping, where the response prints the hostname, with a failed one that just prints the IP address.
|
I see, thanks for clarifying @ramirogm. @pipex @kb2ma , Thinking of solutions here, could we perhaps name a service by its docker-compose service name first, and alias it with the full |
We use the container name to figure out the current state of the app install, getting that info from the aliases might not be as easy. Also the service name avoids collisions, when moving between releases where you can have an old and a new copy of the service. For multi-app, we will definitely want to avoid collisions between similarly named services in different apps. I took a quick look at removing The These are not small changes but it would be great to get rid of these database dependent ids and it would help with this issue here |
That being said, I also like the idea of compressing release uuids. We would need to modify the target state patch to work with the shortened UUIDs, but it would save a few bytes of bandwidth per PATCH as well, which is nice |
One more thing. For any changes to the container name, we also need to modify the way services are accessed via the web terminal |
@pipex Thank you very much for the detailed feedback. Since there's apparently no quick fix available I tested some workarounds. Since our software stack uses kafka which heavily depends on network communication between containers this bug actually bricks our stack completely. Use Balena push Use short container names and network aliases
However in this thread a merged PR is mentioned that implements network aliases. I tested it and seems like the following docker-compose.yml does work:
Seems like the documentation needs to be updated here. Can someone confirm that? |
The release id is not really necessary for reporting the current state since v3 (it has been replaced by `releaseUuid`) and is only kept for backwards compatibility for some supervisor API endpoints. Until now, the `releaseId` was part o the container name, leading to longer names than desired. This change removes the value from the container name and modifies queries to look for the value in the image database. Change-type: minor Relates-to: #2077
The release id is not really necessary for reporting the current state since v3 (it has been replaced by `releaseUuid`) and is only kept for backwards compatibility for some supervisor API endpoints. Until now, the `releaseId` was part o the container name, leading to longer names than desired. This change removes the value from the container name and modifies queries to look for the value in the image database. Change-type: minor Relates-to: #2077
The release id is not really necessary for reporting the current state since v3 (it has been replaced by `releaseUuid`) and is only kept for backwards compatibility for some supervisor API endpoints. Until now, the `releaseId` was part o the container name, leading to longer names than desired. This change removes the value from the container name and modifies queries to look for the value in the image database. Change-type: minor Relates-to: #2077
The release id is not really necessary for reporting the current state since v3 (it has been replaced by `releaseUuid`) and is only kept for backwards compatibility for some supervisor API endpoints. Until now, the `releaseId` was part o the container name, leading to longer names than desired. This change removes the value from the container name and modifies queries to look for the value in the image database. Change-type: minor Relates-to: #2077
The release id is not really necessary for reporting the current state since v3 (it has been replaced by `releaseUuid`) and is only kept for backwards compatibility for some supervisor API endpoints. Until now, the `releaseId` was part o the container name, leading to longer names than desired. This change removes the value from the container name and modifies queries to look for the value in the image database. Change-type: minor Relates-to: #2077
The release id is not really necessary for reporting the current state since v3 (it has been replaced by `releaseUuid`) and is only kept for backwards compatibility for some supervisor API endpoints. Until now, the `releaseId` was part o the container name, leading to longer names than desired. This change removes the value from the container name and modifies queries to look for the value in the image database. Change-type: minor Relates-to: #2077
The release id is not really necessary for reporting the current state since v3 (it has been replaced by `releaseUuid`) and is only kept for backwards compatibility for some supervisor API endpoints. Until now, the `releaseId` was part o the container name, leading to longer names than desired. This change removes the value from the container name and modifies queries to look for the value in the image database. Change-type: minor Relates-to: #2077
The release id is not really necessary for reporting the current state since v3 (it has been replaced by `releaseUuid`) and is only kept for backwards compatibility for some supervisor API endpoints. Until now, the `releaseId` was part o the container name, leading to longer names than desired. This change removes the value from the container name and modifies queries to look for the value in the image database. Change-type: minor Relates-to: #2077
The release id is not really necessary for reporting the current state since v3 (it has been replaced by `releaseUuid`) and is only kept for backwards compatibility for some supervisor API endpoints. Until now, the `releaseId` was part o the container name, leading to longer names than desired. This change removes the value from the container name and modifies queries to look for the value in the image database. Change-type: minor Relates-to: #2077
The release id is not really necessary for reporting the current state since v3 (it has been replaced by `releaseUuid`) and is only kept for backwards compatibility for some supervisor API endpoints. Until now, the `releaseId` was part o the container name, leading to longer names than desired. This change removes the value from the container name and modifies queries to look for the value in the image database. Change-type: minor Relates-to: #2077
The release id is not really necessary for reporting the current state since v3 (it has been replaced by `releaseUuid`) and is only kept for backwards compatibility for some supervisor API endpoints. Until now, the `releaseId` was part o the container name, leading to longer names than desired. This change removes the value from the container name and modifies queries to look for the value in the image database. Change-type: minor Relates-to: #2077
[ramirogm] [ramirogm] Summary of issue discussed on a forums thread:
A container name longer than 63 chars is causing the issue. From this docker guide Container name configured using --name is used to discover a container within an user-defined docker network. The embedded DNS server maintains the mapping between the container name and its IP address (on the network the container is connected to).
A container was created with name
zookeeper_5826140_2403300_72b853b44d0246fd789c89269db742d04a2e5ef9
, which is 66 characters long.When doing a ping to that container, an error shows up on journalctl:
The text was updated successfully, but these errors were encountered: