-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Conditionally remove networkd online dependency on Ubuntu #5772
Conversation
Take 2... I added a commit that instead of using a drop-in will call the systemd-networkd-wait-online binary using the exact same args that the service uses based on 'systemctl cat'. This removes the need for a costly daemon-reload in the cases we need to wait on network. I haven't touched integration tests since the first commit, so they will have known failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @TheRealFalcon for working this. quick first pass review as your are actively coding/developing tests for this functionality
f88d8a9
to
ed3a50e
Compare
@blackboxsw , thanks for the review. I applied your comments and also made a change to move the wait into the activators. |
483f2b6
to
bbf16f9
Compare
updated based on comments. Integration tests still need to be updated. |
Traditionally, cloud-init-network.service (previously cloud-init.service) waited for network connectivity (via systemd service ordering) before running. This has caused cloud-init-network.service to block boot for a significant amount of time. For the vast majority of boots, this network connectivity isn't required. This commit removes the ordering After=systemd-networkd-wait-online.service, but checks the datasource and user data in the init-local timeframe to see if network connectivity will be necessary in the init network timeframe. If so, when the init network service starts, it will call systemd-networkd-wait-online manually in the same manner that the systemd-networkd-wait-online.service does to wait for network connectivity. This commit affects Ubuntu only due to the various number of service orderings and network renderers possible, along with the downstream synchronization needed. However, a new overrideable method in the Distro class should make this optimization trivial to implement for any other distro.
…call into the activators.
The biggest one being write .skip-network rather than .wait-for-network, so if there's ever a case where it doesn't get written where it should, default behavior will be to wait as we always have.
8940070
to
45d48ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Content looks good, is limited to Ubutnu and behaves well across upgrade testing.
Ran through some performance samples on Azure across clean reboot to assess boot speed impacts the result is negligible boot speed impacts across the upgrade well within standard deviation of the samples that can be attributed to platform behavior differences.. In some cases I was able to see 1.2 second reduction of time to SSH but that wasn't represented in all cases. Generally the degraded timing of init-local/search-Azure appears due to latency in dhcpcd responses which is platform performance related, not cloud-init changeset related for this branch.
Performance samples on 15 clean reboot runs on Azure of daily PPA vs this branch
``` --------------------- Performance Deltas Encountered --------------------------------- | Control Avg/Stdev | Upgr. Avg/Stdev | Avg delta | Delta type and service name -------------------------------------------------------------------------------------- | 021.33s/0.86s | 021.59s/0.29s | +00.26s | *** client_time_to_ssh | 014.29s/0.66s | 014.45s/0.59s | +00.16s | *** time_systemd_userspace | 010.34s/3.79s | 012.24s/3.71s | +01.90s | *** time_cloudinit_total | 000.98s/0.65s | 001.35s/0.65s | +00.37s | ***DEGRADED stage/init-local/search-Azure | 000.94s/0.65s | 001.30s/0.65s | +00.37s | ***DEGRADED stage/azure-ds/crawl_metadata | 000.84s/0.65s | 001.22s/0.65s | +00.37s | ***DEGRADED stage/azure-ds/obtain-dhcp-lease | 000.85s/0.65s | 001.22s/0.65s | +00.37s | ***DEGRADED stage/azure-ds/_setup_ephemeral_networking | 001.08s/0.65s | 001.44s/0.65s | +00.37s | ***DEGRADED stage/init-local | 000.98s/0.65s | 001.35s/0.65s | +00.37s | ***DEGRADED stage/azure-ds/_get_data ------------------- Control image -------------------------- | Avg/Stdev | Max | Min | Metric Name ----------------------------------------------------------------------- | 021.33s/0.86s | 021.87s | 018.38s | client_time_to_ssh | 021.91s/0.61s | 022.38s | 019.89s | client_time_to_cloudinit_done | 001.43s/0.02s | 001.48s | 001.40s | time_systemd_kernel | 014.29s/0.66s | 015.44s | 013.19s | time_systemd_userspace | 010.34s/3.79s | 017.19s | 005.59s | time_cloudinit_total | 004.05s/0.12s | 004.43s | 003.95s | cloud-config.service | 004.03s/0.13s | 004.45s | 003.92s | snapd.seeded.service | 004.00s/0.08s | 004.21s | 003.90s | dev-sda1.device | 003.85s/0.12s | 004.23s | 003.70s | snapd.service | 004.43s/0.79s | 005.75s | 003.77s | walinuxagent-network-setup.service | 003.21s/0.12s | 003.49s | 002.99s | apport.service | 003.38s/0.36s | 003.83s | 002.81s | rsyslog.service | 002.96s/0.32s | 003.84s | 002.78s | networkd-dispatcher.service | 002.65s/0.34s | 003.50s | 002.39s | udisks2.service | 001.85s/0.05s | 001.92s | 001.74s | polkit.service | 001.57s/0.06s | 001.73s | 001.45s | chrony.service | 001.36s/0.18s | 001.53s | 001.13s | systemd-fsck@dev-disk-by\x2dlabel-BOOT.service | 001.37s/0.06s | 001.48s | 001.33s | systemd-fsck@dev-disk-by\x2duuid-CD06\x2d6D44.service | 001.34s/0.03s | 001.38s | 001.30s | cloud-init-main.service | 001.25s/0.07s | 001.38s | 001.14s | systemd-logind.service | 001.09s/0.06s | 001.20s | 001.00s | cloud-init-network.service | 001.72s/0.48s | 002.35s | 001.02s | cloud-init-local.service | 001.70s/0.25s | 001.88s | 001.53s | ModemManager.service | 001.47s/0.00s | 001.47s | 001.47s | systemd-fsck@dev-disk-cloud-azure_resource\x2dpart1.service | 001.06s/0.00s | 001.06s | 001.06s | secureboot-db.service | 002.24s/0.17s | 002.75s | 002.00s | modules-config/config-grub_dpkg | 000.44s/0.01s | 000.46s | 000.42s | init-network/config-mounts | 000.18s/0.05s | 000.28s | 000.12s | modules-config/config-apt_configure | 000.84s/0.65s | 002.04s | 000.07s | stage/azure-ds/obtain-dhcp-lease | 000.85s/0.65s | 002.04s | 000.08s | stage/azure-ds/_setup_ephemeral_networking | 000.94s/0.65s | 002.13s | 000.17s | stage/azure-ds/crawl_metadata | 000.98s/0.65s | 002.17s | 000.21s | stage/azure-ds/_get_data | 000.98s/0.65s | 002.18s | 000.21s | stage/init-local/search-Azure | 001.08s/0.65s | 002.27s | 000.31s | stage/init-local | 000.23s/0.09s | 000.37s | 000.10s | stage/init-network/config-ssh | 000.98s/0.08s | 001.13s | 000.85s | stage/init-network | 002.89s/0.17s | 003.40s | 002.70s | stage/modules-config | 000.17s/0.03s | 000.29s | 000.15s | stage/modules-final ------------------- Updated cloud-init image -------------------------- | Avg/Stdev | Max | Min | Metric Name ----------------------------------------------------------------------- | 021.59s/0.29s | 021.99s | 020.72s | client_time_to_ssh | 022.10s/0.30s | 022.50s | 021.22s | client_time_to_cloudinit_done | 001.42s/0.02s | 001.44s | 001.39s | time_systemd_kernel | 014.45s/0.59s | 015.19s | 013.46s | time_systemd_userspace | 012.24s/3.71s | 016.89s | 006.58s | time_cloudinit_total | 004.10s/0.26s | 004.96s | 003.87s | dev-sda1.device | 003.93s/0.21s | 004.08s | 003.20s | cloud-config.service | 003.91s/0.19s | 004.03s | 003.25s | snapd.seeded.service | 004.26s/0.59s | 005.61s | 003.75s | walinuxagent-network-setup.service | 003.72s/0.20s | 003.86s | 003.02s | snapd.service | 003.08s/0.21s | 003.18s | 002.32s | apport.service | 003.24s/0.43s | 003.58s | 002.15s | rsyslog.service | 002.75s/0.18s | 002.91s | 002.12s | networkd-dispatcher.service | 002.43s/0.23s | 002.59s | 001.62s | udisks2.service | 001.79s/0.13s | 001.98s | 001.35s | polkit.service | 001.92s/0.33s | 002.38s | 001.28s | cloud-init-local.service | 001.54s/0.10s | 001.63s | 001.21s | chrony.service | 001.37s/0.05s | 001.47s | 001.31s | cloud-init-main.service | 001.24s/0.05s | 001.35s | 001.16s | systemd-logind.service | 001.36s/0.13s | 001.50s | 001.15s | systemd-fsck@dev-disk-by\x2dlabel-BOOT.service | 001.12s/0.19s | 001.63s | 001.01s | cloud-init-network.service | 001.46s/0.11s | 001.57s | 001.34s | systemd-fsck@dev-disk-by\x2duuid-CD06\x2d6D44.service | 001.33s/0.16s | 001.44s | 001.21s | systemd-fsck@dev-disk-cloud-azure_resource\x2dpart1.service | 002.12s/0.13s | 002.29s | 001.69s | modules-config/config-grub_dpkg | 000.43s/0.01s | 000.45s | 000.41s | init-network/config-mounts | 000.20s/0.05s | 000.26s | 000.11s | modules-config/config-apt_configure | 001.22s/0.65s | 002.07s | 000.25s | stage/azure-ds/obtain-dhcp-lease | 001.22s/0.65s | 002.07s | 000.26s | stage/azure-ds/_setup_ephemeral_networking | 001.30s/0.65s | 002.16s | 000.34s | stage/azure-ds/crawl_metadata | 001.35s/0.65s | 002.20s | 000.39s | stage/azure-ds/_get_data | 001.35s/0.65s | 002.21s | 000.39s | stage/init-local/search-Azure | 001.44s/0.65s | 002.29s | 000.49s | stage/init-local | 000.23s/0.17s | 000.78s | 000.07s | stage/init-network/config-ssh | 000.98s/0.17s | 001.54s | 000.81s | stage/init-network | 002.76s/0.18s | 002.92s | 002.15s | stage/modules-config | 000.17s/0.03s | 000.27s | 000.15s | stage/modules-final ```( | ||
mock.Mock(), | ||
textwrap.dedent( | ||
"""\ | ||
#cloud-config | ||
write_files: | ||
- source: | ||
uri: http://example.com | ||
headers: | ||
Authorization: Basic stuff | ||
User-Agent: me | ||
""" | ||
), | ||
True, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a counter-case with uri: /somepath and assert False
Test case added and no more comments left. I'm going to merge this. |
…nical#5772) Traditionally, cloud-init-network.service (previously cloud-init.service) waited for network connectivity (via systemd service ordering) before running. This has caused cloud-init-network.service to block boot for a significant amount of time. For the vast majority of boots, this network connectivity isn't required. This commit removes the ordering After=systemd-networkd-wait-online.service, but checks the datasource and user data in the init-local timeframe to see if network connectivity will be necessary in the init network timeframe. If so, when the init network service starts, it will start systemd-networkd-wait-online.service manually. This commit affects only Ubuntu due to the various number of service orderings and network renderers possible, along with the downstream synchronization needed. However, a new overrideable method in the Distro class should make this optimization trivial to implement for any other distro.
Proposed Commit Message
Additional Context
Test Steps
Merge type