Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trying to load metadata from network before configuring the network #5771

Open
jorhett opened this issue Oct 2, 2024 · 7 comments
Open

trying to load metadata from network before configuring the network #5771

jorhett opened this issue Oct 2, 2024 · 7 comments
Labels
bug Something isn't working correctly incomplete Action required by submitter

Comments

@jorhett
Copy link

jorhett commented Oct 2, 2024

Bug report

During boot cloud-init attempts to contact the metadata and reporting services prior to configuring the network.

During cloud-init of a successfully configured node cloud-init attempts to retrieve metadata from MaaS BEFORE initializing the network.

2024-10-02 02:18:39,147 - __init__.py[DEBUG]: Detected platform: DataSourceMAAS [None]. Checking for active instance data
2024-10-02 02:18:39,151 - url_helper.py[DEBUG]: [0/1] open 'http://10.10.10.10:5248/MAAS/metadata/2012-03-01/meta-data/instance-id' with {'url': 'http://10.10.10.10:5248/MAAS/metadata/2012-03-01/meta-data/instance-id', 'stream': False, 'allow_redirects': True, 'method': 'GET', 'timeout': 50.0, 'headers': {'User-Agent': 'Cloud-Init/23.4-7.el9_4.6.alma.1', 'Authorization': 'OAuth oauth_nonce="****", oauth_timestamp="1727835519", oauth_version="1.0", oauth_signature_method="PLAINTEXT", oauth_consumer_key="****", oauth_token="****", oauth_signature="****"'}} configuration
2024-10-02 02:18:39,154 - url_helper.py[DEBUG]: Calling 'None' failed [0/120s]: request error [HTTPConnectionPool(host='10.10.10.10', port=5248): Max retries exceeded with url: /MAAS/metadata/2012-03-01/meta-data/instance-id (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6e41abcd30>: Failed to establish a new connection: [Errno 101] Network is unreachable'))]

It also fails to post status messages. In fact, the cloud-init log has dozens and dozens of reports of network failure. After about 10 pages of this and 2 minutes of failures, we get down to:

2024-10-02 02:20:45,347 - util.py[DEBUG]: Reading from /sys/class/net/enp129s0f0/address (quiet=False)
2024-10-02 02:20:45,347 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/enp129s0f0/address
...snip repeat for each interface...

2024-10-02 02:20:45,347 - util.py[DEBUG]: Reading from /sys/class/net/lo/address (quiet=False)
2024-10-02 02:20:45,347 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/lo/address
2024-10-02 02:20:45,348 - util.py[DEBUG]: Reading from /sys/class/net/enp129s0f0/address (quiet=False)
2024-10-02 02:20:45,348 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/enp129s0f0/address
...snip repeat for each interface...

2024-10-02 02:20:45,348 - util.py[DEBUG]: Reading from /usr/lib/python3.9/site-packages/cloudinit/config/schemas/schema-network-config-v1.json (quiet=False)

and then we configure the network. This is clearly out of order, no?

Naive ideas about improvements:

  • Configure the network before querying the network for metadata
  • Queue all status reports until the network is available

I reported this to the maas project, they claim it's a cloud-init bug. If there’s anything I or maas (via curtin) can do in the configuration to address it, please supply clue-by-four.

Steps to reproduce the problem

I'll happily supplied the complete cloud.config.d files to someone offline, but I can't post them here. Key points are

$ cd /etc/cloud/cloud.cfg.d
$ cat 50-cloudconfig-maas-datasource.cfg
datasource_list: [ MAAS ]

$ cat 50-cloudconfig-maas-cloud-config.cfg
#cloud-config
datasource:
  MAAS:
    metadata_url: http://10.10.10.10:5248/MAAS/metadata/
   ...keys snipped...

$ cat 50-cloudconfig-maas-reporting.cfg
#cloud-config
reporting:
  maas:
    metadata_url: http://10.10.10.10:5248/MAAS/metadata/status/*****
    type: webhook
   ...keys snipped...

# cat 50-curtin-networking.cfg
network:
  config:
  - id: enp129s0f0
...snip...
  version: 1

Environment details

  • Cloud-init version:
  • Operating System Distribution:
  • Cloud provider, platform or installer type:

cloud-init logs

cloud-init.tar.gz

@jorhett jorhett added bug Something isn't working correctly new An issue that still needs triage labels Oct 2, 2024
@jorhett
Copy link
Author

jorhett commented Oct 2, 2024

@holmanb
Copy link
Member

holmanb commented Oct 2, 2024

@jorhett This look like a possible duplicate of #5064. It looks like you are using 23.4. That bug was fixed in 24.2. Could you please upgrade to the latest version and let us know if your issue persists?

@holmanb holmanb added incomplete Action required by submitter and removed new An issue that still needs triage labels Oct 2, 2024
@jorhett
Copy link
Author

jorhett commented Oct 2, 2024

According to the bug you pointed at:

  1. All he changed was the placement of the maas data source
  2. The tester replied that the networking was entirely broken

It does not appear solved to me?

Could you please upgrade to the latest version

I looked but didn't succeed in finding a source for up more recent RPMs. The only source I'm aware of is RedHat's provided sources, and they're stuck with 2.3 at the moment...?

$ dnf search cloud-init --showduplicates  --enablerepo=epel-testing
Extra Packages for Enterprise Linux 9 - x86_64                                                                                                                                                                 2.0 MB/s |  23 MB     00:11
Extra Packages for Enterprise Linux 9 - Testing - x86_64                                                                                                                                                       1.2 MB/s | 2.6 MB     00:02
Last metadata expiration check: 0:00:01 ago on Wed 02 Oct 2024 02:26:25 PM PDT.
====================================================================================================== Name Exactly Matched: cloud-init =======================================================================================================
cloud-init-23.4-7.el9_4.6.alma.1.noarch : Cloud instance init scripts
cloud-init-23.4-7.el9_4.3.alma.1.noarch : Cloud instance init scripts
cloud-init-23.4-7.el9_4.5.alma.1.noarch : Cloud instance init scripts
cloud-init-23.4-7.el9_4.6.alma.1.noarch : Cloud instance init scripts
cloud-init-23.4-7.el9_4.alma.1.noarch : Cloud instance init scripts

even the Fedora upstream doesn't list any EPEL versions https://src.fedoraproject.org/rpms/cloud-init

I'd love to help, but if you have sources to help bootstrap a 24.x package build that will make it more likely to happen soon. I know all about building RPMs, I don't need an RPM howto... but having done this in the past, I know that many/most packages require a lot of specific settings to work well and it can be timeconsuming to work those out.

@holmanb
Copy link
Member

holmanb commented Oct 3, 2024

According to the bug you pointed at:

  1. All he changed was the placement of the maas data source
  2. The tester replied that the networking was entirely broken

It does not appear solved to me?

I missed that, thanks @jorhett. The reporter commented on the PR after it was already closed rather than filing a new bug which is probably why it wasn't noticed.

I think that you are right. There is a fundamental problem in the distro-agnostic solution here. Klibc stuff is debian/ubuntu specific, so expecting it to deliver network config on non-debian distros is a non-starter. Note the log from the local datasource:

2024-03-21 08:38:29,280 - DataSourceMAAS.py[DEBUG]: No initramfs applicable config

So the network configuration that is supposed to be received in the local stage from the initramfs isn't received on a RHEL derivative, and (I think) therefore the networking daemon isn't correctly configured so the IMDS cannot be reached later in Network stage. @blackboxsw does this sound right to you?

even the Fedora upstream doesn't list any EPEL versions https://src.fedoraproject.org/rpms/cloud-init

I'd love to help, but if you have sources to help bootstrap a 24.x package build that will make it more likely to happen soon. I know all about building RPMs, I don't need an RPM howto... but having done this in the past, I know that many/most packages require a lot of specific settings to work well and it can be timeconsuming to work those out.

We publish releases to COPR for testing purposes, and there is also an RPM build script in the source tree (./packages/brpm) which builds a the RPM in an lxd container. However, given the points above I think that there is still a network configuration gap on non-debian derivatives so I don't think that the fix is complete / worth testing until further work is completed.

Without doing further investigation, two questions come to mind:

  1. Is some network configuration available via initramfs on RHEL derivatives similar to Klibc?
  2. Would an ephemeral dhcp (ipv6) or slaac (ipv4) network suffice in Local stage to get the necessary network configuration?

I should take a closer look at what Klibc configuration is being received and where it comes from before speculating further.

@jorhett
Copy link
Author

jorhett commented Oct 3, 2024

Sorry, I totally overlooked that spec file. I'm happy to build an RPM and put it on our images regardless of whether or not there's a fix for that one issue.

Is some network configuration available via initramfs on RHEL derivatives similar to Klibc?

I don't know offhand, this is something @ani-sinha might be best positioned to answer.

Would an ephemeral dhcp (ipv6) or slaac (ipv4) network suffice in Local stage to get the necessary network configuration?

Assuming we swap the 4 and 6 in this statement above 😉 yes this would 💯 work in our configuration.

For my own curiosity... is it truly necessary to have this connection prior to applying the supplied network configuration? I realize that this might be specific to curtin/maas but the network configuration is written to disk prior to rebooting the node. So it would/could be entirely practical to simply apply the network configuration on disk prior to contacting the metadata. Is this something we could define/establish within the cloud-init config to skip this "pre-networking" step?

@ani-sinha
Copy link
Contributor

Is some network configuration available via initramfs on RHEL derivatives similar to Klibc?

AFAIK no.

@holmanb
Copy link
Member

holmanb commented Oct 24, 2024

For my own curiosity... is it truly necessary to have this connection prior to applying the supplied network configuration? I realize that this might be specific to curtin/maas but the network configuration is written to disk prior to rebooting the node.

If that's the case, then no. What writes it to disk and where is it written? I'm afraid that I don't have a test system to verify/experiment on.

So it would/could be entirely practical to simply apply the network configuration on disk prior to contacting the metadata. Is this something we could define/establish within the cloud-init config to skip this "pre-networking" step?

That sounds like it should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly incomplete Action required by submitter
Projects
None yet
Development

No branches or pull requests

3 participants