cloud-init 24.4 prevents Ubuntu 20.04 Digital Ocean droplet from booting #5945

crankycyclops · 2025-01-03T16:36:57Z

Bug report

On Digital Ocean, after updating cloud-init from 24.3.1 to 24.4 on two Ubuntu 20.04 droplets and rebooting, the boot process got stuck in init, and I couldn't login in even through the recovery console. I had to boot into recovery mode and re-install 24.3.1 over 24.4 to get it working again.

Steps to reproduce the problem

I don't know if this affects all Ubuntu 20.04 droplets or not, but it killed two of mine.

Use apt-get to update cloud-init to 24.4
Reboot droplet
Observe that the droplet gets stuck in init

Temporary workaround: launch the recovery CD, chroot into the mounted filesystem, re-install 24.3.1 over 24.4, then boot the droplet as normal. With the old version, the droplet boots up normally again.

Environment details

Cloud-init version: 24.4
Operating System Distribution: Ubuntu 20.04
Cloud provider, platform or installer type: Digital Ocean

cloud-init logs

2025-01-03 01:33:53,577 - log_util.py[DEBUG]: Cloud-init v. 24.4-0ubuntu1~20.04.1 running 'init-local' at Fri, 03 Jan 2025 01:33:53 +0000. Up 7.18 seconds.
2025-01-03 01:33:53,578 - main.py[INFO]: PID [1] started cloud-init 'init-local'.
2025-01-03 01:33:53,578 - main.py[DEBUG]: No kernel command line url found.
2025-01-03 01:33:53,578 - main.py[DEBUG]: Closing stdin
2025-01-03 01:33:53,580 - util.py[DEBUG]: Writing to /var/log/cloud-init.log - ab: [640] 0 bytes
2025-01-03 01:33:53,580 - util.py[DEBUG]: Changing the ownership of /var/log/cloud-init.log to 102:4
2025-01-03 01:33:53,580 - util.py[DEBUG]: Reading from /var/lib/cloud/data/python-version (quiet=False)
2025-01-03 01:33:53,581 - util.py[DEBUG]: Reading 3 bytes from /var/lib/cloud/data/python-version
2025-01-03 01:33:53,581 - util.py[DEBUG]: Attempting to remove /var/lib/cloud/instance/boot-finished
2025-01-03 01:33:53,582 - handlers.py[DEBUG]: start: init-local/check-cache: attempting to read from cache [check]
2025-01-03 01:33:53,582 - util.py[DEBUG]: Reading from /var/lib/cloud/instance/obj.pkl (quiet=False)
2025-01-03 01:33:53,582 - util.py[DEBUG]: Reading 33986 bytes from /var/lib/cloud/instance/obj.pkl
2025-01-03 01:33:53,588 - lifecycle.py[INFO]: DataSourceDigitalOcean is deprecated in 23.2 and scheduled to be removed in 28.2. Deprecated in favour of DataSourceConfigDrive.
2025-01-03 01:33:53,588 - dmi.py[DEBUG]: querying dmi data /sys/class/dmi/id/product_serial
2025-01-03 01:33:53,588 - stages.py[DEBUG]: restored from checked cache: DataSourceDigitalOcean
2025-01-03 01:33:53,589 - handlers.py[DEBUG]: finish: init-local/check-cache: SUCCESS: restored from checked cache: DataSourceDigitalOcean
2025-01-03 01:33:53,589 - util.py[DEBUG]: Reading from /etc/cloud/cloud.cfg (quiet=False)
2025-01-03 01:33:53,589 - util.py[DEBUG]: Reading 3766 bytes from /etc/cloud/cloud.cfg
2025-01-03 01:33:53,589 - util.py[DEBUG]: Attempting to load yaml from string of length 3766 with allowed root types (<class 'dict'>,)
2025-01-03 01:33:53,600 - util.py[DEBUG]: Reading from /etc/cloud/cloud.cfg.d/99-digitalocean.cfg (quiet=False)
2025-01-03 01:33:53,600 - util.py[DEBUG]: Reading 168 bytes from /etc/cloud/cloud.cfg.d/99-digitalocean.cfg
2025-01-03 01:33:53,600 - util.py[DEBUG]: Attempting to load yaml from string of length 168 with allowed root types (<class 'dict'>,)
2025-01-03 01:33:53,601 - util.py[DEBUG]: Reading from /etc/cloud/cloud.cfg.d/90_dpkg.cfg (quiet=False)
2025-01-03 01:33:53,601 - util.py[DEBUG]: Reading 257 bytes from /etc/cloud/cloud.cfg.d/90_dpkg.cfg
2025-01-03 01:33:53,601 - util.py[DEBUG]: Attempting to load yaml from string of length 257 with allowed root types (<class 'dict'>,)
2025-01-03 01:33:53,602 - util.py[DEBUG]: Reading from /etc/cloud/cloud.cfg.d/90-digitalocean.cfg (quiet=False)
2025-01-03 01:33:53,602 - util.py[DEBUG]: Reading 4276 bytes from /etc/cloud/cloud.cfg.d/90-digitalocean.cfg
2025-01-03 01:33:53,602 - util.py[DEBUG]: Attempting to load yaml from string of length 4276 with allowed root types (<class 'dict'>,)
2025-01-03 01:33:53,613 - util.py[DEBUG]: Reading from /etc/cloud/cloud.cfg.d/05_logging.cfg (quiet=False)
2025-01-03 01:33:53,613 - util.py[DEBUG]: Reading 2071 bytes from /etc/cloud/cloud.cfg.d/05_logging.cfg
2025-01-03 01:33:53,613 - util.py[DEBUG]: Attempting to load yaml from string of length 2071 with allowed root types (<class 'dict'>,)
2025-01-03 01:33:53,617 - util.py[DEBUG]: Reading from /run/cloud-init/cloud.cfg (quiet=False)
2025-01-03 01:33:53,617 - util.py[DEBUG]: Reading 40 bytes from /run/cloud-init/cloud.cfg
2025-01-03 01:33:53,617 - util.py[DEBUG]: Attempting to load yaml from string of length 40 with allowed root types (<class 'dict'>,)
2025-01-03 01:33:53,617 - util.py[DEBUG]: Attempting to load yaml from string of length 0 with allowed root types (<class 'dict'>,)
2025-01-03 01:33:53,617 - util.py[DEBUG]: loaded blob returned None, returning default.
2025-01-03 01:33:53,618 - util.py[DEBUG]: Reading from /var/lib/cloud/instance/cloud-config.txt (quiet=False)
2025-01-03 01:33:53,618 - util.py[DEBUG]: Reading 0 bytes from /var/lib/cloud/instance/cloud-config.txt
2025-01-03 01:33:53,618 - util.py[DEBUG]: Attempting to load yaml from string of length 0 with allowed root types (<class 'dict'>,)
2025-01-03 01:33:53,618 - util.py[DEBUG]: loaded blob returned None, returning default.
2025-01-03 01:33:53,618 - util.py[DEBUG]: Reading from /var/lib/cloud/instance/vendor-cloud-config.txt (quiet=False)
2025-01-03 01:33:53,618 - util.py[DEBUG]: Reading 548 bytes from /var/lib/cloud/instance/vendor-cloud-config.txt
2025-01-03 01:33:53,618 - util.py[DEBUG]: Attempting to load yaml from string of length 548 with allowed root types (<class 'dict'>,)
2025-01-03 01:33:53,622 - util.py[DEBUG]: Writing to /run/cloud-init/cloud-id-digitalocean - wb: [644] 13 bytes
2025-01-03 01:33:53,622 - util.py[DEBUG]: Creating symbolic link from '/run/cloud-init/cloud-id' => '/run/cloud-init/cloud-id-digitalocean'
2025-01-03 01:33:53,623 - atomic_helper.py[DEBUG]: Atomically writing to file /run/cloud-init/instance-data-sensitive.json (via temporary file /run/cloud-init/tmp_kv3vwwl) - w: [600] 10566 bytes/chars
2025-01-03 01:33:53,623 - atomic_helper.py[DEBUG]: Atomically writing to file /run/cloud-init/instance-data.json (via temporary file /run/cloud-init/tmp81ivct2j) - w: [644] 3801 bytes/chars
2025-01-03 01:33:53,624 - stages.py[INFO]: Instance link already exists, not recreating it.
2025-01-03 01:33:53,624 - util.py[DEBUG]: Reading from /var/lib/cloud/instances/113496214/datasource (quiet=False)
2025-01-03 01:33:53,624 - util.py[DEBUG]: Reading 47 bytes from /var/lib/cloud/instances/113496214/datasource
2025-01-03 01:33:53,625 - util.py[DEBUG]: Writing to /var/lib/cloud/instances/113496214/datasource - wb: [644] 47 bytes
2025-01-03 01:33:53,628 - util.py[DEBUG]: Writing to /var/lib/cloud/data/previous-datasource - wb: [644] 47 bytes
2025-01-03 01:33:53,628 - util.py[DEBUG]: Reading from /var/lib/cloud/data/instance-id (quiet=False)
2025-01-03 01:33:53,628 - util.py[DEBUG]: Reading 10 bytes from /var/lib/cloud/data/instance-id
2025-01-03 01:33:53,628 - stages.py[DEBUG]: previous iid found to be 113496214
2025-01-03 01:33:53,629 - util.py[DEBUG]: Writing to /var/lib/cloud/data/instance-id - wb: [644] 10 bytes
2025-01-03 01:33:53,629 - util.py[DEBUG]: Writing to /run/cloud-init/.instance-id - wb: [644] 10 bytes
2025-01-03 01:33:53,629 - util.py[DEBUG]: Writing to /var/lib/cloud/data/previous-instance-id - wb: [644] 10 bytes
2025-01-03 01:33:53,630 - util.py[DEBUG]: Writing to /var/lib/cloud/instance/obj.pkl - wb: [400] 33986 bytes
2025-01-03 01:33:53,630 - stages.py[INFO]: Not re-loading configuration, instance id and datasource have not changed.
2025-01-03 01:33:53,630 - main.py[DEBUG]: [local] init will now be targeting instance id: 113496214. new=False
2025-01-03 01:33:53,631 - stages.py[DEBUG]: Using distro class <class 'cloudinit.distros.ubuntu.Distro'>
2025-01-03 01:33:53,632 - util.py[DEBUG]: Reading from /var/lib/cloud/data/set-hostname (quiet=False)
2025-01-03 01:33:53,632 - util.py[DEBUG]: Reading 65 bytes from /var/lib/cloud/data/set-hostname
2025-01-03 01:33:53,632 - cc_set_hostname.py[DEBUG]: No hostname changes. Skipping set_hostname
2025-01-03 01:33:53,633 - util.py[DEBUG]: Reading from /sys/class/net/lo/address (quiet=False)
2025-01-03 01:33:53,633 - util.py[DEBUG]: Reading 18 bytes from /sys/class/net/lo/address
2025-01-03 01:33:53,633 - util.py[DEBUG]: Reading from /sys/class/net/eth0/address (quiet=False)
2025-01-03 01:33:53,633 - util.py[DEBUG]: Reading 18 bytes from /sys/class/net/eth0/address
2025-01-03 01:33:53,634 - util.py[DEBUG]: Reading from /var/lib/cloud/hotplug.enabled (quiet=False)
2025-01-03 01:33:53,634 - util.py[DEBUG]: File not found: /var/lib/cloud/hotplug.enabled
2025-01-03 01:33:53,634 - stages.py[DEBUG]: Allowed events: {<EventScope.NETWORK: 'network'>: {<EventType.BOOT_NEW_INSTANCE: 'boot-new-instance'>}}
2025-01-03 01:33:53,634 - stages.py[DEBUG]: Event Denied: scopes=['network'] EventType=boot
2025-01-03 01:33:53,634 - util.py[DEBUG]: Reading from /var/lib/cloud/hotplug.enabled (quiet=False)
2025-01-03 01:33:53,634 - util.py[DEBUG]: File not found: /var/lib/cloud/hotplug.enabled
2025-01-03 01:33:53,634 - stages.py[DEBUG]: Allowed events: {<EventScope.NETWORK: 'network'>: {<EventType.BOOT_NEW_INSTANCE: 'boot-new-instance'>}}
2025-01-03 01:33:53,634 - stages.py[DEBUG]: Event Denied: scopes=['network'] EventType=boot-legacy
2025-01-03 01:33:53,634 - stages.py[DEBUG]: No network config applied. Neither a new instance nor datasource network update allowed
2025-01-03 01:33:53,634 - stages.py[DEBUG]: applying net config names for {'version': 1, 'config': [{'type': 'physical', 'mac_address': '1e:f4:12:49:d9:ed', 'name': 'eth0', 'subnets': [{'type': 'static', 'control': 'auto', 'address': '159.89.146.229', 'gateway': '159.89.144.1', 'netmask': '255.255.240.0'}, {'type': 'static', 'control': 'auto', 'address': '2604:A880:0002:00D0:0000:0000:21AF:8001/64', 'gateway': '2604:a880:0002:00d0:0000:0000:0000:0001'}, {'type': 'static', 'control': 'auto', 'address': '10.46.0.6', 'netmask': '255.255.0.0'}]}, {'type': 'nameserver', 'address': ['67.207.67.3', '67.207.67.2']}]}
2025-01-03 01:33:53,634 - util.py[DEBUG]: Reading from /sys/class/net/eth0/device/device (quiet=False)
2025-01-03 01:33:53,635 - util.py[DEBUG]: Reading 7 bytes from /sys/class/net/eth0/device/device
2025-01-03 01:33:53,635 - util.py[DEBUG]: Reading from /sys/class/net/lo/addr_assign_type (quiet=False)
2025-01-03 01:33:53,635 - util.py[DEBUG]: Reading 2 bytes from /sys/class/net/lo/addr_assign_type
2025-01-03 01:33:53,635 - util.py[DEBUG]: Reading from /sys/class/net/lo/uevent (quiet=False)
2025-01-03 01:33:53,635 - util.py[DEBUG]: Reading 23 bytes from /sys/class/net/lo/uevent
2025-01-03 01:33:53,635 - util.py[DEBUG]: Reading from /sys/class/net/lo/address (quiet=False)
2025-01-03 01:33:53,635 - util.py[DEBUG]: Reading 18 bytes from /sys/class/net/lo/address
2025-01-03 01:33:53,635 - net[DEBUG]: ovs-vsctl not in PATH; not detecting Open vSwitch interfaces
2025-01-03 01:33:53,635 - util.py[DEBUG]: Reading from /sys/class/net/lo/device/device (quiet=False)
2025-01-03 01:33:53,635 - util.py[DEBUG]: Reading from /sys/class/net/eth0/addr_assign_type (quiet=False)
2025-01-03 01:33:53,635 - util.py[DEBUG]: Reading 2 bytes from /sys/class/net/eth0/addr_assign_type
2025-01-03 01:33:53,635 - util.py[DEBUG]: Reading from /sys/class/net/eth0/uevent (quiet=False)
2025-01-03 01:33:53,635 - util.py[DEBUG]: Reading 25 bytes from /sys/class/net/eth0/uevent
2025-01-03 01:33:53,635 - util.py[DEBUG]: Reading from /sys/class/net/eth0/address (quiet=False)
2025-01-03 01:33:53,635 - util.py[DEBUG]: Reading 18 bytes from /sys/class/net/eth0/address
2025-01-03 01:33:53,636 - util.py[DEBUG]: Reading from /sys/class/net/eth0/device/device (quiet=False)
2025-01-03 01:33:53,636 - util.py[DEBUG]: Reading 7 bytes from /sys/class/net/eth0/device/device
2025-01-03 01:33:53,636 - util.py[DEBUG]: Reading from /sys/class/net/lo/operstate (quiet=False)
2025-01-03 01:33:53,636 - util.py[DEBUG]: Reading 8 bytes from /sys/class/net/lo/operstate
2025-01-03 01:33:53,636 - util.py[DEBUG]: Reading from /sys/class/net/eth0/operstate (quiet=False)
2025-01-03 01:33:53,636 - util.py[DEBUG]: Reading 5 bytes from /sys/class/net/eth0/operstate
2025-01-03 01:33:53,636 - subp.py[DEBUG]: Running command ['ip', '-6', 'addr', 'show', 'permanent', 'scope', 'global'] with allowed return codes [0] (shell=False, capture=True)
2025-01-03 01:33:53,644 - subp.py[DEBUG]: Running command ['ip', '-4', 'addr', 'show'] with allowed return codes [0] (shell=False, capture=True)
2025-01-03 01:33:53,648 - net[DEBUG]: Detected interfaces {'lo': {'downable': False, 'device_id': None, 'driver': None, 'mac': '00:00:00:00:00:00', 'name': 'lo', 'up': True}, 'eth0': {'downable': True, 'device_id': '0x0001', 'driver': 'virtio_net', 'mac': '1e:f4:12:49:d9:ed', 'name': 'eth0', 'up': False}}
2025-01-03 01:33:53,648 - net[DEBUG]: no work necessary for renaming of [['1e:f4:12:49:d9:ed', 'eth0', 'virtio_net', '0x0001']]
2025-01-03 01:33:53,648 - main.py[DEBUG]: Network connectivity determined necessary for cloud-init's network stage. Reason: non-cloud-config user data found in vendor data
2025-01-03 01:33:53,648 - main.py[DEBUG]: [local] Exiting. datasource DataSourceDigitalOcean not in local mode.
2025-01-03 01:33:53,649 - util.py[DEBUG]: Reading from /proc/uptime (quiet=False)
2025-01-03 01:33:53,649 - util.py[DEBUG]: Reading 10 bytes from /proc/uptime
2025-01-03 01:33:53,649 - atomic_helper.py[DEBUG]: Atomically writing to file /var/lib/cloud/data/status.json (via temporary file /var/lib/cloud/data/tmpy6d7i2ws) - w: [644] 503 bytes/chars
2025-01-03 01:33:53,649 - performance.py[DEBUG]: cloud-init stage: 'init-local' took 0.194 seconds
2025-01-03 01:33:53,650 - handlers.py[DEBUG]: finish: init-local: SUCCESS: searching for local datasources
2025-01-03 01:33:54,233 - log_util.py[DEBUG]: Cloud-init v. 24.4-0ubuntu1~20.04.1 running 'init' at Fri, 03 Jan 2025 01:33:54 +0000. Up 7.84 seconds.
2025-01-03 01:33:54,233 - main.py[INFO]: PID [1] started cloud-init 'init'.
2025-01-03 01:33:54,233 - main.py[DEBUG]: No kernel command line url found.
2025-01-03 01:33:54,233 - main.py[DEBUG]: Closing stdin
2025-01-03 01:33:54,236 - util.py[DEBUG]: Writing to /var/log/cloud-init.log - ab: [640] 0 bytes
2025-01-03 01:33:54,236 - util.py[DEBUG]: Changing the ownership of /var/log/cloud-init.log to 102:4
2025-01-03 01:33:54,236 - util.py[DEBUG]: Reading from /var/lib/cloud/data/python-version (quiet=False)
2025-01-03 01:33:54,236 - util.py[DEBUG]: Reading 3 bytes from /var/lib/cloud/data/python-version
2025-01-03 01:33:54,237 - main.py[DEBUG]: Will wait for network connectivity before continuing
2025-01-03 01:33:54,238 - stages.py[DEBUG]: Using distro class <class 'cloudinit.distros.ubuntu.Distro'>
2025-01-03 01:33:54,239 - activators.py[DEBUG]: Using selected activator: <class 'cloudinit.net.activators.NetplanActivator'> from priority: ['netplan', 'eni', 'network-manager', 'networkd']
2025-01-03 01:33:54,239 - subp.py[DEBUG]: Running command ['systemctl', 'is-enabled', 'NetworkManager.service'] with allowed return codes [0] (shell=False, capture=True)
2025-01-03 01:33:54,251 - performance.py[DEBUG]: Running ['systemctl', 'is-enabled', 'NetworkManager.service'] took 0.012 seconds
2025-01-03 01:33:54,252 - subp.py[DEBUG]: Running command ['systemctl', 'start', 'systemd-networkd-wait-online.service'] with allowed return codes [0] (shell=False, capture=True)

The text was updated successfully, but these errors were encountered:

holmanb · 2025-01-06T16:32:13Z

@crankycyclops Please include the required logs in the bug report. Also you sure that3 the version 10.4 is correct? That doesn't seem right to me.

crankycyclops · 2025-01-06T17:11:46Z

@crankycyclops Please include the required logs in the bug report. Also you sure that3 the version 10.4 is correct? That doesn't seem right to me.

Sorry, I meant 24.4. I updated the title of the issue. I also added the logs from that day.

ltvkit · 2025-01-08T14:23:30Z

yup, same here. All of my Ubuntu servers cant boot normally, and get frozen on cloud-init stage.
If I boot into recovery, the server boots up fine.

And I've seen on many forums the same. Example: https://forum.virtualmin.com/t/cloud-init-network-stage-hangs/131292/15

Downgrading to cloud-init=23.1.2-0ubuntu0~20.04.2 solves the issue. And I've frozen that package on all of my servers for now.
This has been an issue for ~3 weeks now, but I was too lazy to report it.

AndreasGocht · 2025-01-08T15:22:47Z

We had a not working DNS, so I started digging around, and realized, that we did hat networkd activated. It seems that cloud-init did this, but actually failed to bring up the network in the first attempt, leading to resolved not working properly.

about the logs: I won't publish them online, as there are some public IPs and stuff in there (dating back one year!), which i won't like to share. Maybe I can recreate the issue on a non public "younger" machine. Not sure, when I'll find the time. However, I think I can share an excerpt from the cloud-init logs:

2025-01-03 04:00:32,101 - handlers.py[DEBUG]: finish: init-local: SUCCESS: searching for local datasources
2025-01-03 04:00:32,562 - log_util.py[DEBUG]: Cloud-init v. 24.4-0ubuntu1~22.04.1 running 'init' at Fri, 03 Jan 2025 04:00:32 +0000. Up 3.71 seconds.
2025-01-03 04:00:32,562 - main.py[INFO]: PID [1] started cloud-init 'init'.
2025-01-03 04:00:32,562 - main.py[DEBUG]: No kernel command line url found.
2025-01-03 04:00:32,562 - main.py[DEBUG]: Closing stdin
2025-01-03 04:00:32,564 - util.py[DEBUG]: Writing to /var/log/cloud-init.log - ab: [640] 0 bytes
2025-01-03 04:00:32,564 - util.py[DEBUG]: Changing the ownership of /var/log/cloud-init.log to 106:4
2025-01-03 04:00:32,565 - util.py[DEBUG]: Reading from /var/lib/cloud/data/python-version (quiet=False)
2025-01-03 04:00:32,565 - util.py[DEBUG]: Reading 4 bytes from /var/lib/cloud/data/python-version
2025-01-03 04:00:32,565 - main.py[DEBUG]: Will wait for network connectivity before continuing
2025-01-03 04:00:32,566 - stages.py[DEBUG]: Using distro class <class 'cloudinit.distros.ubuntu.Distro'>
2025-01-03 04:00:32,567 - activators.py[DEBUG]: Using selected activator: <class 'cloudinit.net.activators.NetplanActivator'> from priority: ['netplan', 'eni', 'network-manager', 'networkd']
2025-01-03 04:00:32,568 - subp.py[DEBUG]: Running command ['systemctl', 'is-enabled', 'NetworkManager.service'] with allowed return codes [0] (shell=False, capture=True)
2025-01-03 04:00:32,572 - subp.py[DEBUG]: Running command ['systemctl', 'start', 'systemd-networkd-wait-online.service'] with allowed return codes [0] (shell=False, capture=True)
2025-01-03 04:02:32,698 - performance.py[DEBUG]: Running ['systemctl', 'start', 'systemd-networkd-wait-online.service'] took 120.125 seconds
2025-01-03 04:02:32,698 - ubuntu.py[ERROR]: Failed to wait for network: Unexpected error while running command.
Command: ['systemctl', 'start', 'systemd-networkd-wait-online.service']
Exit code: 1
Reason: -
Stdout: 
Stderr: Job for systemd-networkd-wait-online.service failed because the control process exited with error code.
        See "systemctl status systemd-networkd-wait-online.service" and "journalctl -xeu systemd-networkd-wait-online.service" for details.
2025-01-03 04:02:32,699 - subp.py[DEBUG]: Running command ['ip', '--json', 'addr'] with allowed return codes [0] (shell=False, capture=True)
2025-01-03 04:02:32,704 - subp.py[DEBUG]: Running command ['ip', '-o', 'route', 'list'] with allowed return codes [0] (shell=False, capture=True)
2025-01-03 04:02:32,706 - subp.py[DEBUG]: Running command ['ip', '--oneline', '-6', 'route', 'list', 'table', 'all'] with allowed return codes [0, 1] (shell=False, capture=True)
2025-01-03 04:02:32,710 - handlers.py[DEBUG]: start: init-network/check-cache: attempting to read from cache [trust]
2025-01-03 04:02:32,710 - util.py[DEBUG]: Reading from /var/lib/cloud/instance/obj.pkl (quiet=False)
2025-01-03 04:02:32,710 - stages.py[DEBUG]: no cache found
2025-01-03 04:02:32,710 - handlers.py[DEBUG]: finish: init-network/check-cache: SUCCESS: no cache found
2025-01-03 04:02:32,711 - sources[DEBUG]: Looking for data source in: ['Ec2', 'None'], via packages ['', 'cloudinit.sources'] that matches dependencies ['FILESYSTEM', 'NETWORK']
2025-01-03 04:02:32,716 - sources[DEBUG]: Searching for network data source in: ['DataSourceEc2', 'DataSourceNone']
2025-01-03 04:02:32,716 - handlers.py[DEBUG]: start: init-network/search-Ec2: searching for network data from DataSourceEc2
2025-01-03 04:02:32,716 - sources[DEBUG]: Seeing if we can get any data from <class 'cloudinit.sources.DataSourceEc2.DataSourceEc2'>
2025-01-03 04:02:32,716 - sources[DEBUG]: Update datasource metadata and network config due to events: boot-new-instance
2025-01-03 04:02:32,717 - sources[DEBUG]: Detected DataSourceEc2

and systemd-networkd-wait-online.service:

Jan 03 04:02:32 codevault systemd-networkd-wait-online[521]: Timeout occurred while waiting for network connectivity.
Jan 03 04:02:32 codevault systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Jan 03 04:02:32 codevault systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
Jan 03 04:02:32 codevault systemd[1]: Failed to start Wait for Network to be Configured.

and

Jan 03 04:00:32 codevault systemd[1]: Starting Cloud-init: Network Stage...
Jan 03 04:00:32 codevault cloud-init[517]: Cloud-init v. 24.4-0ubuntu1~22.04.1 running 'init' at Fri, 03 Jan 2025 04:00:32 +0000. Up 3.71 seconds.
Jan 03 04:02:32 codevault cloud-init[517]: 2025-01-03 04:02:32,698 - ubuntu.py[ERROR]: Failed to wait for network: Unexpected error while running command.
Jan 03 04:02:32 codevault cloud-init[517]: Command: ['systemctl', 'start', 'systemd-networkd-wait-online.service']
Jan 03 04:02:32 codevault cloud-init[517]: Exit code: 1
Jan 03 04:02:32 codevault cloud-init[517]: Reason: -
Jan 03 04:02:32 codevault cloud-init[517]: Stdout:
Jan 03 04:02:32 codevault cloud-init[517]: Stderr: Job for systemd-networkd-wait-online.service failed because the control process exited with error code.
Jan 03 04:02:32 codevault cloud-init[517]:         See "systemctl status systemd-networkd-wait-online.service" and "journalctl -xeu systemd-networkd-wait-online.service" for details.
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: +++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: +--------+------+------------------------------+---------------+--------+-------------------+
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: | Device |  Up  |           Address            |      Mask     | Scope  |     Hw-Address    |
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: +--------+------+------------------------------+---------------+--------+-------------------+
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: |  eth0  | True |        192.168.3.140         | 255.255.255.0 | global | fa:16:3e:b7:dd:58 |
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: |  eth0  | True | fe80::f816:3eff:feb7:dd58/64 |       .       |  link  | fa:16:3e:b7:dd:58 |
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: |   lo   | True |          127.0.0.1           |   255.0.0.0   |  host  |         .         |
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: |   lo   | True |           ::1/128            |       .       |  host  |         .         |
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: +--------+------+------------------------------+---------------+--------+-------------------+
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: ++++++++++++++++++++++++++++++++Route IPv4 info++++++++++++++++++++++++++++++++
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: +-------+-----------------+-------------+-----------------+-----------+-------+
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: | Route |   Destination   |   Gateway   |     Genmask     | Interface | Flags |
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: +-------+-----------------+-------------+-----------------+-----------+-------+
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: |   0   |     0.0.0.0     | 192.168.3.1 |     0.0.0.0     |    eth0   |   UG  |
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: |   1   | 169.254.169.254 | 192.168.3.3 | 255.255.255.255 |    eth0   |  UGH  |
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: |   2   |   192.168.3.0   |   0.0.0.0   |  255.255.255.0  |    eth0   |   U   |
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: +-------+-----------------+-------------+-----------------+-----------+-------+
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: +-------+-------------+---------+-----------+-------+
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: | Route | Destination | Gateway | Interface | Flags |
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: +-------+-------------+---------+-----------+-------+
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: |   1   |  fe80::/64  |    ::   |    eth0   |   U   |
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: |   3   |    local    |    ::   |    eth0   |   U   |
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: |   4   |  multicast  |    ::   |    eth0   |   U   |
Jan 03 04:02:32 codevault cloud-init[517]: ci-info: +-------+-------------+---------+-----------+-------+
Jan 03 04:02:36 codevault cloud-init[517]: 2025-01-03 04:02:36,892 - DataSourceEc2.py[WARNING]: Metadata 'network' key not valid: None.
Jan 03 04:02:37 codevault systemd[1]: Finished Cloud-init: Network Stage.

feel free to ask about additional specific log entries.

Best,

Andreas

TheRealFalcon · 2025-01-08T18:37:02Z

Thanks for the log snippets. Would it be possible to get the results of

cloud-init query vendordata

along with the contents of /etc/cloud/. E.g., from / run

tar czf logs.tar.gz etc/cloud/

and then attach to this issue?

This shouldn't contain any sensitive data unless it was manually written there.

Monochromics · 2025-01-08T22:33:51Z

This is fairly difficult to test, but I /think/ that this all comes down to e30549e

Since prior it was tagged as an After=s-n-w-o, it didn't really care if it was actually successful or not, just that it happened. Since it was swapped to an explicit start call, I think the boot is just failing whenever there is something wrong that prevents s-n-w-o from returning cleanly. Looks like some people are seeing this outside of just digital ocean as well. Presumably something in digital ocean is more reliably triggering this based on the thread? I don't have an acct to see what the c-i layout looks like in those images unfortunately.

The thing is its fairly difficult to trigger, since meeting the requirements to trigger the check itself are fairly stringent. Azure for ex only uses mounts for first-boot, and doesn't use bootcmd/random_seed/w.e. else.

Now what is somewhat interesting is that it looks like the installer .isos (server and desktop) leave di-intentify.cfg in images installed via that method that will force c-i to run even if no ds is found, which would trigger this issue if there was /also/ a network problem that cause w-o to fail. That isn't terribly hard to do though, as just having a disabled nic or something w/o letting netplan know would cause dhcp waiting period to trigger a timeout/exit, then killing c-i and hanging the instance.

All this being said... I don't have a great idea for a solution. As a hacky fix, just changing the systemctl start to ignore the exit code so c-i doesn't fail would work I think? I'd need to do some more testing.

holmanb · 2025-01-08T23:13:43Z

A reproducer or full journal logs + cloud-init logs would make this much easier to debug.

di-intentify.cfg in images installed via that method that will force c-i to run even if no ds is found

Did you observe that? The subiquity autoinstaller disables cloud-init on first boot with /etc/cloud/cloud-init.disabled.

changing the systemctl start to ignore the exit code so c-i doesn't fail would work I think

That wouldn't be reliable. The whole point of calling that binary was to enable cloud-init to run earlier in boot, but only when safe to do so. If this command fails, all bets are off on the "safe to do so" part. Cloud-init could always loop checking the systemd service status when the command fails, but that would obviously not be a great solution.

TheRealFalcon · 2025-01-09T00:53:43Z

changing the systemctl start to ignore the exit code so c-i doesn't fail would work I think?

That's essentially what cloud-init already does. We log an error and the process will return 1,but a failure of s-n-w-o doesn't cause a cloud-init crash or early exit.

In the DO case, I have a hunch (but no proof yet) that instances that don't come up are experiencing a dependency loop in systemd. DO's vendor-data writes out a network service file and performs a systemd daemon-reload. That could theoretically be problematic, but I can't reproduce the behavior on a fresh instance. It's likely that only older instances are affected because of the image configuration or vendor data that was used at the time (the c-i logs pasted here are using a deprecated datasource).

ltvkit · 2025-01-09T07:45:34Z

I might try to get the logs tomorrow, kinda busy atm.
But the servers that are affected for me, are the ones that were originally Ubuntu 16.04/18.04 based, and upgraded to Ubuntu 20.04 and 22.04.

They all have a network interface configured with netplan, and the IP's are set manually, with IPv6/DHCP4/DHCP6 disabled, and using public facing addresses.

AndreasGocht · 2025-01-09T10:27:52Z

Dear all,

please find attached the logs. I added the journal log since boot, and redacted some information (boot.log.tar.gz).

boot.log.tar.gz
logs.tar.gz

sudo cloud-init query vendordata does not seem to output anything (except an empty line).

If you could provide me another way of sharing logs except a public GitHub I am happy to do so.

changing the systemctl start to ignore the exit code so c-i doesn't fail would work I think?

A word on the network manager side: Normally we use neither netplan, nor any of eni, systemd-netword or NetworkManager on our side. I think the network configuration is coming from cloud-init. However, on our side it seems cloud-init tries to start networkd and fails, which leads to resolved failing, and DNS not working. However, the network seemed to be configured correctly.

Best,

Andreas

crankycyclops added bug Something isn't working correctly new An issue that still needs triage labels Jan 3, 2025

holmanb added incomplete Action required by submitter invalid This doesn't seem right and removed new An issue that still needs triage labels Jan 6, 2025

crankycyclops changed the title ~~cloud-init 10.4 prevents Ubuntu 20.04 Digital Ocean droplet from booting~~ cloud-init 24.4 prevents Ubuntu 20.04 Digital Ocean droplet from booting Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cloud-init 24.4 prevents Ubuntu 20.04 Digital Ocean droplet from booting #5945

cloud-init 24.4 prevents Ubuntu 20.04 Digital Ocean droplet from booting #5945

crankycyclops commented Jan 3, 2025 •

edited

Loading

holmanb commented Jan 6, 2025

crankycyclops commented Jan 6, 2025

ltvkit commented Jan 8, 2025 •

edited

Loading

AndreasGocht commented Jan 8, 2025

TheRealFalcon commented Jan 8, 2025

Monochromics commented Jan 8, 2025

holmanb commented Jan 8, 2025 •

edited

Loading

TheRealFalcon commented Jan 9, 2025

ltvkit commented Jan 9, 2025

AndreasGocht commented Jan 9, 2025 •

edited

Loading

cloud-init 24.4 prevents Ubuntu 20.04 Digital Ocean droplet from booting #5945

cloud-init 24.4 prevents Ubuntu 20.04 Digital Ocean droplet from booting #5945

Comments

crankycyclops commented Jan 3, 2025 • edited Loading

Bug report

Steps to reproduce the problem

Environment details

cloud-init logs

holmanb commented Jan 6, 2025

crankycyclops commented Jan 6, 2025

ltvkit commented Jan 8, 2025 • edited Loading

AndreasGocht commented Jan 8, 2025

TheRealFalcon commented Jan 8, 2025

Monochromics commented Jan 8, 2025

holmanb commented Jan 8, 2025 • edited Loading

TheRealFalcon commented Jan 9, 2025

ltvkit commented Jan 9, 2025

AndreasGocht commented Jan 9, 2025 • edited Loading

crankycyclops commented Jan 3, 2025 •

edited

Loading

ltvkit commented Jan 8, 2025 •

edited

Loading

holmanb commented Jan 8, 2025 •

edited

Loading

AndreasGocht commented Jan 9, 2025 •

edited

Loading