Getting fatal error on `Get list of labels.` task #78

cpxPratik · 2020-05-14T14:09:34Z

The task Get list of labels. is failing after updated with ansible_fqdn on 3bb8a49

The node hostname(staging-manager-03) on docker node ls is different from the fqdn string given on following error:

TASK [atosatto.docker-swarm : Get list of labels.] ********************************************************************************************************************************************
fatal: [165.22.48.107 -> 165.22.48.105]: FAILED! => {"changed": false, "cmd": ["docker", "inspect", "--format", "{{ range $key, $value := .Spec.Labels }}{{ printf \"%s\\n\" $key }}{{ end }}", "staging-manager-03.sgp1"], "delta": "0:00:00.412684", "end": "2020-05-14 13:10:42.573599", "msg": "non-zero return code", "rc": 1, "start": "2020-05-14 13:10:42.160915", "stderr": "Error: No such object: staging-manager-03.sgp1", "stderr_lines": ["Error: No such object: staging-manager-03.sgp1"], "stdout": "", "stdout_lines": []}

For now I am using v2.2.0 which gives no error.

The text was updated successfully, but these errors were encountered:

wombathuffer · 2020-05-15T21:01:24Z

I have the same issue except for 'ambigious' instead of not found.
TASK [atosatto.docker-swarm : Get list of labels.] ******************************************************************************************************************************************* fatal: [asus.yi -> None]: FAILED! => {"changed": false, "cmd": ["docker", "inspect", "--format", "{{ range $key, $value := .Spec.Labels }}{{ printf \"%s\\n\" $key }}{{ end }}", "host.domain"], " delta": "0:00:00.335281", "end": "2020-05-15 22:58:12.700418", "msg": "non-zero return code", "rc": 1, "start": "2020-05-15 22:58:12.365137", "stderr": "Error response from daemon: node host.domain is ambiguous (2 matches found)", "stderr_lines": ["Error response from daemon: node host.domain is ambiguous (2 matches found)"], "stdout": "", "stdout_lines": []}

Edit: Workaround for me was simply making the node leave. 'docker swarm leave --force'.

atosatto · 2020-05-18T07:48:11Z

Thanks @cpxPratik for reporting this issue.
I'll try to reproduce this issue in a test cluster and figure our a better way of managing nodes.

Can you please confirm me the docker version you are using?

cpxPratik · 2020-05-18T08:47:49Z

@atosatto The docker version is Docker version 19.03.8, build afacb8b7f0

yukiisbored · 2020-05-22T13:29:29Z

Hello, I'm having the same issue on a cluster. It seems the node object is using hostname instead of the full FQDN.

It seems the this is the root-cause: 3bb8a49

Though, I don't see any references in the playbook that it joins by FQDN, is this a new change on upstream docker?

yukiisbored · 2020-05-22T13:32:26Z

btw, I'm currently running version 19.03.8

FleischKarussel · 2020-05-23T11:34:48Z

Same issue here, using 19.03.6 (latest Ubuntu 18.04 provided docker.io package)

Bogdan1001 · 2020-05-23T18:32:37Z

I have same issue too. Ubuntu 18.04.

till · 2020-05-24T13:51:03Z

@atosatto We fixed this a while back but it was reverted or we mixed it up. It's inventory_hostname vs fqdn.

Bogdan1001 · 2020-05-28T19:11:13Z

Workaround for me was: replace {{ ansible_fqdn|lower }} on {{ ansible_hostname }} and from the hostname remove all dots. Was node1.connect become node1connect

gumbo2k · 2020-06-16T11:44:02Z

@atosatto We fixed this a while back but it was reverted or we mixed it up. It's inventory_hostname vs fqdn.

@till I thought the same and tried to work around by listing the hosts as fqdns in my inventory. No luck.

nununo · 2020-07-18T13:08:05Z

Hello. I'm also having this issue. Any plans to reaply the fix? Thanks!

juanluisbaptiste · 2020-08-01T19:35:28Z

I can confirm that the commit 3bb8a49 mentioned in the issue #82 is the one that breaks the labels setup, if it is reverted then the playbook finishes without issues.

…cluster setup

joshes · 2020-08-31T22:10:42Z

Seeing the same behaviour on v2.3.0, rolling back to v2.2.0 resolves this situation.

till · 2020-09-30T08:22:37Z

Another case where this happens is the following:

I had botched my swarm setup, so it was not about node names (e.g. inventory name or fully qualified domain name (fqdn)), but the nodes were no longer seen by the manager.

The role doesn't handle this (no judgement meant) currently. I think it's a split brain/no brain kind of thing, because I had restarted my manager (and I run only one) and then this happened.

The fix was the following:

get the join-token myself
then force leave the workers
(re-)join the manager/cluster

And then the role completes.

The other fix is to run two managers. ;-)

I am not entirely sure how this could be added to the role since the manager doesn't see the workers anymore, but the works think they are still connected. If you can afford it, trash the nodes and setup again. Maybe it's a documentation thing after all?

quadeare · 2020-10-06T13:16:06Z

Same issue on Centos 7.

For now I am using v2.2.0 which works like a charm !

juanluisbaptiste · 2020-10-16T17:50:58Z

I can confirm that the commit 3bb8a49 mentioned in the issue #82 is the one that breaks the labels setup, if it is reverted then the playbook finishes without issues.

Now I'm not sure if this has to do with this at all, as I have been getting this error several times too with that commit reverted. It always happens when I add a new instance to the cluster. First time I run this role is ok, then I create a new aws instance and run again this role to add it to the cluster and the role fails with this error. This is the error message I'm seeing being thrown by ansible on nodes that are already part of the cluster:

<10.0.10.36> (0, b'', b'')
fatal: [10.0.10.36 -> 10.0.10.36]: FAILED! => {
    "changed": false,
    "cmd": [
        "docker",
        "inspect",
        "--format",
        "{{ range $key, $value := .Spec.Labels }}{{ printf \"%s\\n\" $key }}{{ end }}",
        "10"
    ],
    "delta": "0:00:00.081487",
    "end": "2020-10-15 23:41:24.604902",
    "invocation": {
        "module_args": {
            "_raw_params": "docker inspect --format '{{ range $key, $value := .Spec.Labels }}{{ printf \"%s\\n\" $key }}{{ end }}' 10",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "stdin_add_newline": true,
            "strip_empty_ends": true,
            "warn": true
        }
    },
    "msg": "non-zero return code",
    "rc": 1,
    "start": "2020-10-15 23:41:24.523415",
    "stderr": "Error: No such object: 10",
    "stderr_lines": [
        "Error: No such object: 10"
    ],
    "stdout": "",
    "stdout_lines": []
}

That is the error for the manager, but the workers throw it too.

juanluisbaptiste · 2020-10-16T18:04:03Z

Same issue on Centos 7.

For now I am using v2.2.0 which works like a charm !

For me it also happens with v2.2.0 as described on my previous comment.

juanluisbaptiste · 2021-05-20T17:01:50Z

I had to use this role again and got an error when running it for the second time, and this time I noticed that the error was different to the one of this issue (and probably the error reported in my previous comment was about this new issue and not related to this one). This time the error is on the "Remove labels from swarm node" task, and it occurs when labels are configured outside this role (ie, manually adding a role to a node). I will create a separate issue for that with an accompanying PR fixing it.

juanluisbaptiste · 2021-05-20T17:56:14Z

I had to use this role again and got an error when running it for the second time, and this time I noticed that the error was different to the one of this issue (and probably the error reported in my previous comment was about this new issue and not related to this one). This time the error is on the "Remove labels from swarm node" task, and it occurs when labels are configured outside this role (ie, manually adding a role to a node). I will create a separate issue for that with an accompanying PR fixing it.

Added issue #96 for this and fixed on PR #97, I hope it gets merged (although I do not have my hopes up that it will happen hreh).

…cluster setup

jauffrey mentioned this issue Jun 5, 2020

Swarm Node Labels broken in v2.3.0 #82

Closed

juanluisbaptiste added a commit to juanluisbaptiste/ansible-dockerswarm that referenced this issue Aug 5, 2020

Revert commit 3bb8a49 mentioned in upstream issue atosatto#78 to fix …

048f5ce

…cluster setup

juanluisbaptiste added a commit to juanluisbaptiste/ansible-dockerswarm that referenced this issue May 20, 2021

Revert commit 3bb8a49 mentioned in upstream issue atosatto#78 to fix …

5c6a949

…cluster setup

juanluisbaptiste mentioned this issue May 20, 2021

Support for Ubuntu 20.04 LTS #81

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting fatal error on `Get list of labels.` task #78

Getting fatal error on `Get list of labels.` task #78

cpxPratik commented May 14, 2020

wombathuffer commented May 15, 2020 •

edited

Loading

atosatto commented May 18, 2020

cpxPratik commented May 18, 2020

yukiisbored commented May 22, 2020 •

edited

Loading

yukiisbored commented May 22, 2020

FleischKarussel commented May 23, 2020

Bogdan1001 commented May 23, 2020 •

edited

Loading

till commented May 24, 2020

Bogdan1001 commented May 28, 2020

gumbo2k commented Jun 16, 2020 •

edited

Loading

nununo commented Jul 18, 2020

juanluisbaptiste commented Aug 1, 2020

joshes commented Aug 31, 2020

till commented Sep 30, 2020

quadeare commented Oct 6, 2020

juanluisbaptiste commented Oct 16, 2020 •

edited

Loading

juanluisbaptiste commented Oct 16, 2020

juanluisbaptiste commented May 20, 2021

juanluisbaptiste commented May 20, 2021

Getting fatal error on Get list of labels. task #78

Getting fatal error on Get list of labels. task #78

Comments

cpxPratik commented May 14, 2020

wombathuffer commented May 15, 2020 • edited Loading

atosatto commented May 18, 2020

cpxPratik commented May 18, 2020

yukiisbored commented May 22, 2020 • edited Loading

yukiisbored commented May 22, 2020

FleischKarussel commented May 23, 2020

Bogdan1001 commented May 23, 2020 • edited Loading

till commented May 24, 2020

Bogdan1001 commented May 28, 2020

gumbo2k commented Jun 16, 2020 • edited Loading

nununo commented Jul 18, 2020

juanluisbaptiste commented Aug 1, 2020

joshes commented Aug 31, 2020

till commented Sep 30, 2020

quadeare commented Oct 6, 2020

juanluisbaptiste commented Oct 16, 2020 • edited Loading

juanluisbaptiste commented Oct 16, 2020

juanluisbaptiste commented May 20, 2021

juanluisbaptiste commented May 20, 2021

Getting fatal error on `Get list of labels.` task #78

Getting fatal error on `Get list of labels.` task #78

wombathuffer commented May 15, 2020 •

edited

Loading

yukiisbored commented May 22, 2020 •

edited

Loading

Bogdan1001 commented May 23, 2020 •

edited

Loading

gumbo2k commented Jun 16, 2020 •

edited

Loading

juanluisbaptiste commented Oct 16, 2020 •

edited

Loading