Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy issue during 4.12 upgrade #1481

Closed
llomgui opened this issue Jan 25, 2023 · 12 comments
Closed

Proxy issue during 4.12 upgrade #1481

llomgui opened this issue Jan 25, 2023 · 12 comments

Comments

@llomgui
Copy link

llomgui commented Jan 25, 2023

Hello,

During an 4.12 upgrade I had an issue with worker upgrade (fedora 36 to fedora 37).

The first worker was stuck, so I check journalctl -f.
I saw the following log: Txn Rebase on /org/projectatomic/rpmostree1/fedora_coreos failed: Failed to invoke skopeo proxy method OpenImage: remote error: pinging container registry quay.io: Get "[https://quay.io/v2/":](https://quay.io/v2/%22:) dial tcp 54.163.152.191:443: i/o timeout

The cluster is behind a company proxy. So it should not try to get the package directly.

On some workers, the solution was to create this file:

sudo vi /etc/systemd/system/rpm-ostreed.service.d/http-proxy.conf

[Service]
Environment="http_proxy=PROXY_URL"

sudo systemctl daemon-reload
sudo systemctl restart rpm-ostreed.service

But on some others, I did'nt have to do anything. It worked without any issue.

I created a poc cluster before updating this cluster, with the same version 4.11.0-0.okd-2023-01-14-152430.
To make sure the upgrade to 4.12 is working on GCP. I didn't get any issue.

@vrutkovs
Copy link
Member

But on some others, I did'nt have to do anything. It worked without any issue.

Did other nodes got this configuration settings after successful reboot? I'd expect MCO to apply proxy settings, but it looks like there's a race applying those

@llomgui
Copy link
Author

llomgui commented Jan 25, 2023

Did other nodes got this configuration settings after successful reboot? I'd expect MCO to apply proxy settings, but it looks like there's a race applying those

No, on "without issue" workers, I don't have this file after sucessful reboot.

@tyronewilsonfh
Copy link

Experienced this issue on 5 nodes in a 10 node test cluster (same MCP), running the rebase command manually whilst setting upper and lowercase http/https proxy env's would sometimes show a message of pulling manifest then timeout with the above error, most attempts would just timeout with the same message.

Upgrading another cluster in same environment didn't have these issues.

@vrutkovs
Copy link
Member

Sounds indeed like an MCO race. Please report this to https://issues.redhat.com/browse/OCPBUGS, component "Machine Config Operator" with a must-gather please.

@vinisman
Copy link

vinisman commented Mar 29, 2023

We also have okd behind the proxy and this approach helps us to update to 4.12 version. @llomgui thank you.
We had such kind an error:
E0329 05:46:02.424632 1461945 writer.go:200] Marking Degraded due to: failed to update OS to quay.io/openshift/okd-content@sha256:125e94f63520330aa50e85bcfd55e429d3051498c9ff3936628edd0d4ea5696b : error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift/okd-content@sha256:125e94f63520330aa50e85bcfd55e429d3051498c9ff3936628edd0d4ea5696b: error: remote error: (Mirrors also failed: [nexus.dev.mycompany.com:60002/okd-mirror/okd@sha256:125e94f63520330aa50e85bcfd55e429d3051498c9ff3936628edd0d4ea5696b: reading manifest sha256:125e94f63520330aa50e85bcfd55e429d3051498c9ff3936628edd0d4ea5696b in nexus.dev.mycompany.com:60002/okd-mirror/okd: manifest unknown: manifest unknown] 715 [nexus.dev.mycompany.com:60022/okd-mirror/okd@sha256:125e94f63520330aa50e85bcfd55e429d3051498c9ff3936628edd0d4ea5696b: reading manifest sha256:125e94f63520330aa50e85bcfd55e429d3051498c9ff3936628edd0d4ea5696b in nexus.dev.mycompany.com:60022/okd-mirror/okd: manifest unknown: manifest unknown]): quay.io/openshift/okd-content@sha256:125e94f63520330aa50e85bcfd55e429d3051498c9ff3936628edd0d4ea5696b: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io: no such host 716 : exit status 1

@danielchristianschroeter
Copy link

danielchristianschroeter commented May 24, 2023

I ran into the same error situation when upgrading from 4.11.0-0.okd-2023-01-14-152430 to 4.12.0-0.okd-2023-04-16-041331.
In journalctl -b -u rpm-ostreed you see those issues:
Txn Rebase on /org/projectatomic/rpmostree1/fedora_coreos failed: Failed to invoke skopeo proxy method OpenImage: remote error: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp 3.87.166.194:443: i/o timeout

You see also crashing coredns- and keepalived- pods on effected node. I executed this one liner via SSH on all nodes:
echo -e "[Service]\nEnvironment=\"https_proxy=http://<proxy>:3128\"" | sudo tee /etc/systemd/system/rpm-ostreed.service.d/http-proxy.conf >/dev/null && sudo systemctl daemon-reload && sudo systemctl restart rpm-ostreed.service

Please note, you can only create http-proxy.conf during an upgrade process, otherwise the directory /etc/systemd/system/rpm-ostreed.service.d/ does not exists on the node. I'm not sure if you can place this file also before starting the upgrade.

@vrutkovs
Copy link
Member

Caused by ostreedev/ostree-rs-ext#582, fixed in rpm-ostree 2024.2. openshift/okd-machine-os#751 should include it, but in order to update to it in disconnected env you'd need a workaround (see previous comment)

@llomgui
Copy link
Author

llomgui commented Feb 25, 2024

The previous comment workaround won't work with the latest 4.15 version.
The only way is to create /usr/local/bin/skopeo with the following:

#! /bin/bash

if [ $(systemctl whoami) = "rpm-ostreed.service" ]; then
	export http_proxy=http://MY_PROXY:3128
	export https_proxy=http://MY_PROXY:3128
fi

/usr/bin/skopeo "$@"

Credit to ostreedev/ostree-rs-ext#582 (comment)

@vrutkovs
Copy link
Member

@llomgui @danielchristianschroeter could you check if https://github.com/okd-project/okd/releases/tag/4.15.0-0.okd-2024-02-23-163410 sets correct proxy vars for rpm-ostreed?

@llomgui
Copy link
Author

llomgui commented Mar 4, 2024

@vrutkovs It doesn't work, I still have to use the workaround above.
I will try with another cluster jumping directly from 2024-01-27-070424 to 2024-02-23-163410.

@vrutkovs
Copy link
Member

vrutkovs commented Mar 4, 2024

Doesn't work on clean install or upgrade?

@Vins88
Copy link

Vins88 commented Mar 12, 2024

Hello,
I experienced the same problem when I upgraded from 4.15.0-0.okd-2024-02-10-035534 to 4.15.0-0.okd-2024-03-10-010116
I can confirm that this procedure correctly applied the proxy env and allowed the upgrade to be completed:
ostreedev/ostree-rs-ext#582 (comment)

thanks for sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants