Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: single process optimization #5489

Merged
merged 22 commits into from
Aug 2, 2024

Conversation

holmanb
Copy link
Member

@holmanb holmanb commented Jul 4, 2024

Proposed Commit Message

feat: Single process optimization

Python interpreter initialization and module import time 
contributes a significant amount of wall clock time to
cloud-init's runtime (and therefore to total boot time).

Cloud-init has four stages. Each stage starts its own Python
interpreter and loads the same libraries. To eliminate the
redundant work of starting an interpreter and loading libraries,
this changes cloud-init to run as a single process. Systemd
service ordering is retained by using the existing cloud-init
services as shims which use a synchronization protocol to start
each cloud-init stage and to communicate that each stage is
complete to the init system. Since multiple cloud-init processes
sit in the critical chain of starting the system, this reduces
boot time (including time to ssh login and time to cloud-init
completion).

Currently only systemd is supported, but the synchronization
protocol should be capable of supporting other init systems
as well with minor changes.

Note: This enables many additional follow-on improvements that
eliminate redundant work. However, these potential improvements
are temporarily ignored. This commit has been structured to
minimize the changes required to capture the majority of primary
performance savings while preserving correctness and the ability
to preserve backwards compatibility.

Since this changes the semantics of the existing cloud-init unit
files, this change takes the opportunity to rename one of its
systemd units which causes frequent user confusion. The unit named
cloud-init.service is often mistaken by users for being the only
cloud-init service, when it is simply one of four stages. This
stage is documented as the "network" stage, so this service will
be renamed to "cloud-init-network.service". A new notify service
is added as part of this implementation which contains the
cloud-init process. This unit is named "cloud-init-main.service".
 
Synchronization protocol
========================

- create one Unix socket for each systemd service stage
- send sd_notify()
- For each of the four stages (local, network, config, final):
   - when init system sends "start" to the Unix socket, start the
     stage
   - when running stage is complete, send "done" to Unix socket

File changes
============

socket.py (new)
---------------

- define a systemd-notify helper function
- define a context manager which implements a multi-socket
  synchronization protocol

cloud-init.service -> cloud-init-network.service (renamed)
----------------------------------------------------------

- renamed to cloud-network.service

cloud-{init-local,init-network,config,final}.services
-------------------------------------------

- change ExecStart to use netcat to connect to Unix socket and:
  - send a start message
  - wait for completion response
- note: a pure Python equivalent is possible for any downstreams
  which do not package openbsd's netcat

cloud-init-main.service (new)
-----------------------------

 - use service type to 'notify'
 - invoke cloud-init in single process mode
 - adopt systemd ordering requirements from cloud-init-local.service
 - adopt KillMode from cloud-final.service

main.py
-------

 - Add command line flag to indicate "all stages" mode
 - In this mode run each stage followed by an IPC
   synchronization protocol step

cloud-final.services
--------------------

- drop KillMode

cloud-init-local.services
-------------------------

- drop dependencies made redundant by ordering after
  cloud-init-main.service

Performance Impact
==================

On Ubuntu 24.04, Python's wall clock start up time as measured with
`time python3 -c 'import cloudinit.cmd.main' on a few cloud types:

lxc container: 0.256s
QEMU machine:  0.300s
gce instance:  0.367s
ec2 instance:  0.491s

This change eliminates x1 this start up time from time to ssh.
This change eliminates x3 this start up time from cloud-init's total
completion. Total benefit varies based on the platform that the
instance is hosted by, but all platforms will measurably benefit from
this change.

BREAKING_CHANGE: Run all four cloud-init services as a single systemd service.

(if you made it this far, don't worry the implementation code is only 2x longer than this commit message!)

Design

Why use Unix sockets?

The implementation requires a fast, event-driven, lossless, bi-directional, cross platform IPC mechanism that is easily limited to root-only send/receive. Alternatives obviously exist, but Unix sockets seem like a natural fit.

Why not just share one socket?

A socket per stage keeps the implementation and makes failure modes simple.

Why openbsd netcat?

It supports Unix sockets and it's 7x faster than a bare bones pure Python implementation. A C / Rust / Go based implementation would be trivial to write, but netcat already exists and does the job sufficiently fast.

Why a new service?

  1. any external service that is ordered Before=cloud-init-local.service can now run in parallel to the interpreter loading

  2. by decoupling interpreter loading from running cloud-init logic, systemd can potentially start loading cloud-init earlier in boot (note the difference in ordering between cloud-init-single.service vs cloud-init-local.service)

  3. changing all four services to be shims that use an identical synchronization protocol to trigger and wait for the cloud-init stages keeps the design symmetrical - this makes it easier to debug and describe

Security implications

Dependencies

No additional dependencies required - this change implements our own sd_notify.

File permissions

The directory which contains the unix sockets is limited to root-only (see cloudinit/socket.py::67):

os.makedirs(f"{DEFAULT_RUN_DIR}/share", mode=0o700, exist_ok=True)

On Ubuntu DEFAULT_RUN_DIR resolves to /run/cloud-init/ which has the following access:

$ stat /run/cloud-init/
...
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)

Currently (before this PR), writes to this directory could modify cloud-init's configuration; cloud-init depends on POSIX permissions.

Todo (this PR)

  • unittest - this code gets exercised by every integration test, but a unittest
    would benefit development and could better stress test the synchronization code
    for race conditions
  • fix integration test tests/integration_tests/test_multi_part_user_data_handling.py::test_cloud_config_archive_boot_hook_logging
  • update performance results in the above commit message

Future work (followup PRs, required pre-24.3 release)

  • update man page
  • update CLI docs
  • update debugging docs per feat: single process optimization #5489 (comment)
  • fix logging
    • cloud-init pipes all output to tee by default (which is completely unnecessary for stream handling - we can probably ditch this dependency), which leaves processes hanging around after cloud-init signals completion. This leaves extraneous log messages in the output of systemctl status cloud-init-single.service (see Appendix B).
[RFC] Names (closed)

The cloud-init --single-process and cloud-init-single.service is the working names that I've used, but I think that we can do better.

  • cloud-init - this naming proposal is the spiritual successor to this spec we could rename cloud-init.service to cloud-init-network.service and repurpose the service name cloud-init.service to be the service that, you know, "runs cloud-init". This would probably be the best option in the long run, but would come at the cost of short-term pain for distributions. Since we won't backport this change to old releases this might be feasible. I'd be willing to write a blog post and release notes regarding the choice, and the changes that are visible to the user/distributor should be pretty straightforward: for any package that has an ordering dependency on cloud-init.service they could just sed -i s/cloud-init.service/cloud-init-network.service/g`.
  • runner / run - the names cloud-init --run and cloud-init-runner.service make it clear that a thing is running, but some might still see a cloud-init.service and incorrectly assume that this is the name of the service that runs cloud-init (many have made this incorrect assumption)

Rejected names

  • daemon - this isn't a long-lived process. While this lives longer than the previous four processes, it simply combines their four lifespans into one. This name would be misleading.
  • all - cloud-init --all and cloud-init-all.service, convey the right meaning, but it requires context to understand - the only reason to call this "all" is because it replaces multiple parts.

Appendix

A. systemctl status cloud-init-local.service

 cloud-init-local.service - Cloud-init: Local Stage (pre-network)
     Loaded: loaded (/usr/lib/systemd/system/cloud-init-local.service; enabled; preset: enabled)
     Active: active (exited) since Tue 2024-07-16 00:51:39 CEST; 16h ago
   Main PID: 254 (code=exited, status=0/SUCCESS)
        CPU: 1ms

Jul 16 00:51:39 cloudinit-0715-225129iftdf1yb systemd[1]: Starting cloud-init-local.service - Cloud-init: Local Stage (pre-network)...
Jul 16 00:51:39 cloudinit-0715-225129iftdf1yb nc.openbsd[254]: done
Jul 16 00:51:39 cloudinit-0715-225129iftdf1yb systemd[1]: Finished cloud-init-local.service - Cloud-init: Local Stage (pre-network).

B. systemctl status cloud-init-single.service

○ cloud-init-single.service - Cloud-init: Single Process
     Loaded: loaded (/usr/lib/systemd/system/cloud-init-single.service; enabled; preset: enabled)
     Active: inactive (dead) since Tue 2024-07-16 00:52:11 CEST; 16h ago
   Duration: 31.569s
    Process: 76 ExecStart=/usr/bin/cloud-init --single-process (code=exited, status=0/SUCCESS)
   Main PID: 76 (code=exited, status=0/SUCCESS)
     Status: "Complete"
        CPU: 10.096s



Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb cloud-init[257]: | ... . o =oO.    |
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb cloud-init[257]: +----[SHA256]-----+
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb systemd[1]: cloud-init-single.service: Deactivated successfully.
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb systemd[1]: cloud-init-single.service: Unit process 256 (sh) remains running after unit stopped.
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb systemd[1]: cloud-init-single.service: Unit process 257 (tee) remains running after unit stopped.
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb systemd[1]: cloud-init-single.service: Unit process 273 (sh) remains running after unit stopped.
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb systemd[1]: cloud-init-single.service: Unit process 274 (tee) remains running after unit stopped.
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb systemd[1]: cloud-init-single.service: Unit process 369 (sh) remains running after unit stopped.
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb systemd[1]: cloud-init-single.service: Unit process 1161 (sh) remains running after unit stopped.
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb systemd[1]: cloud-init-single.service: Consumed 10.095s CPU time, 228.9M memory peak, 0B memory swap peak.

C. Example log from instance booted with latest commit:

single-process.log

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

@holmanb holmanb force-pushed the holmanb/fastpath-single-proc branch 3 times, most recently from 5920138 to bdc76a4 Compare July 5, 2024 03:24
@a-dubs
Copy link
Collaborator

a-dubs commented Jul 5, 2024

(if you made it this far, don't worry the implementation code is only 2x longer than this commit message!)

🤣

@aciba90 aciba90 self-requested a review July 5, 2024 16:02
Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here! I have some comments but they're all very minor. We should take some extra time to think about any additional testing that may need happen, but overall +1 to the approach.

[RFC] Names

--single-process is fine, but I'm -1 to cloud-init-single.service simply because we have the cloud-init single command that could potentially be confusing.

What about cloud-init boot and cloud-init-boot.service?

Long term, I like the idea of making this cloud-init.service, but I'm not sure it's worth the churn of doing it at the same time as this PR. Ideally, this PR won't introduce any backwards incompatibility, but given the scope of the change, I'm sure there will be some effects we need to deal with, so I think I'd rather keep intentional breakage separate.

Why openbsd netcat?

I don't know the history or difference between implementations, but is nc.openbsd installed everywhere? If not, is the difference meaningful such that we couldn't just use nc?

cloudinit/cmd/main.py Outdated Show resolved Hide resolved
cloudinit/socket.py Outdated Show resolved Hide resolved
cloudinit/socket.py Outdated Show resolved Hide resolved
systemd/cloud-init-single.service Outdated Show resolved Hide resolved
cloudinit/cmd/main.py Outdated Show resolved Hide resolved
cloudinit/socket.py Show resolved Hide resolved
cloudinit/socket.py Show resolved Hide resolved
cloudinit/socket.py Outdated Show resolved Hide resolved
systemd/cloud-config.service.tmpl Outdated Show resolved Hide resolved
cloudinit/socket.py Outdated Show resolved Hide resolved
@holmanb
Copy link
Member Author

holmanb commented Jul 15, 2024

Great work here! ... overall +1 to the approach.

Thanks!

We should take some extra time to think about any additional testing that may need happen

Agreed. I have an idea for an integration test that would support verifying the behavior more easily.

[RFC] Names

...

Long term, I like the idea of making this cloud-init.service, but I'm not sure it's worth the churn of doing it at the same time as this PR. Ideally, this PR won't introduce any backwards incompatibility, but given the scope of the change, I'm sure there will be some effects we need to deal with, so I think I'd rather keep intentional breakage separate.

This definitely breaks backwards compatibility and will require a patch and will require downstream maintainers that to update their vendored service files if they want to benefit from the performance improvements here. If we do this change separately from a rename, that means two breaking changes rather than one. Since we already require downstream changes as a result of this PR, I'd prefer to do the rename now - with the justification of a cleanup alongside a performance improvement is less painful than two breaking changes. This will require a community communication email / post to communicate changes required of downstreams, so this can be included in the "what this means for you" section for maintainers. Thoughts?

is nc.openbsd installed everywhere?

I can't promise that it is used everywhere, but I do know that it is pretty widely used, and is already a dependency of ubuntu-minimal:

$ apt rdepends netcat-openbsd                                        
netcat-openbsd
Reverse Depends:
  Recommends: libvirt-daemon
  Suggests: mariadb-server
  Recommends: libvirt-daemon
  Depends: xletters
  Depends: tomcat10-user
 |Depends: rtpengine-utils
  Suggests: mariadb-server
 |Depends: ifupdown-extra
  Depends: freedombox
  Depends: ubuntu-minimal

If not, is the difference meaningful such that we couldn't just use nc?

Like I mentioned before, openbsd's netcat supports Unix sockets. I don't see any other implementations that do.

If we'd prefer to avoid more dependencies, a C / Rust / Go implementation of the behavior that we need would be trivial to write using minimal dependencies, but that comes with the complexity of cloud-init packages becoming arch-dependent - which is not a huge burden, but it is another packaging burdon (though it's probably worth considering since this would likely improve performance further).

@holmanb holmanb force-pushed the holmanb/fastpath-single-proc branch 3 times, most recently from 5692c50 to c6e9e4b Compare July 15, 2024 23:11
Copy link
Contributor

@aciba90 aciba90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work here, thanks for the effort!

I don't know the history or difference between implementations, but is nc.openbsd installed everywhere? If not, is the difference meaningful such that we couldn't just use nc?

@TheRealFalcon: AFAIK the standard netcat does not implement unix sockets.

@@ -38,12 +37,13 @@ ExecStartPre=/bin/mkdir -p /run/cloud-init
ExecStartPre=/sbin/restorecon /run/cloud-init
ExecStartPre=/usr/bin/touch /run/cloud-init/enabled
{% endif %}
ExecStart=/usr/bin/cloud-init init --local
ExecStart=nc.openbsd -Uu -W1 /run/cloud-init/share/local.sock -s /run/cloud-init/share/local-return.sock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires an explicit downstream Ubuntu explicit runtime dependency on netcat-openbsd.

Copy link
Member Author

@holmanb holmanb Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires an explicit downstream Ubuntu explicit runtime dependency on netcat-openbsd.

Yes, it would be best to add this dependency to the packaging. However, since ubuntu-minimal already includes it it won't functionally change anything.

I plan to add that to the Ubuntu packaging in a followup PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! Even if it's a noop, I think explicitly expressing the dependency is the way to go as, cloud-init may be installed in environments where ubuntu-minimal is not, or ubuntu-minimal might drop the dependency on netcat-openbsd in the future.

cloudinit/cmd/main.py Outdated Show resolved Hide resolved
systemd/cloud-init-single.service Outdated Show resolved Hide resolved
@holmanb
Copy link
Member Author

holmanb commented Jul 18, 2024

Nice work here, thanks for the effort!

@aciba90 thanks for the review! I'll ping you when I've addressed your points.

I don't know the history or difference between implementations, but is nc.openbsd installed everywhere? If not, is the difference meaningful such that we couldn't just use nc?

@TheRealFalcon: AFAIK the standard netcat does not implement unix sockets.

@TheRealFalcon also worth noting that on Ubuntu, I'm pretty sure that nc.openbsd is nc on Ubuntu. It is on Ubuntu Noble at least.

@blackboxsw blackboxsw added packaging Supplemental package review requested security labels Jul 18, 2024
@TheRealFalcon
Copy link
Member

This definitely breaks backwards compatibility and will require a patch and will require downstream maintainers that to update their vendored service files if they want to benefit from the performance improvements here. If we do this change separately from a rename, that means two breaking changes rather than one. Since we already require downstream changes as a result of this PR, I'd prefer to do the rename now - with the justification of a cleanup alongside a performance improvement is less painful than two breaking changes. This will require a community communication email / post to communicate changes required of downstreams, so this can be included in the "what this means for you" section for maintainers. Thoughts?

Makes sense. +1. We're just changing filenames though, right? Not also the CLI as you had initially speced out?

You and Alberto also worked through my nc concerns.

@holmanb
Copy link
Member Author

holmanb commented Jul 19, 2024

Makes sense. +1. We're just changing filenames though, right? Not also the CLI as you had initially speced out?

Correct, I'm not going to touch pre-existing cli names.

For now I'm going to rename it to cloud-init --all-stages (named for what it is doing) rather than cloud-init --single-process (named for how it is different from the past). If there are any objections or suggestions over this name please let me know.

@holmanb holmanb force-pushed the holmanb/fastpath-single-proc branch 3 times, most recently from ac9e590 to 49bebbf Compare July 23, 2024 21:36
@holmanb
Copy link
Member Author

holmanb commented Jul 23, 2024

Requesting re-review @TheRealFalcon. I added some another unit test. I think that the testing coverage does a pretty good job of exercising the ordering requirements and error conditions of the protocol in just the unit test suite, and it uses only standard library so I'm not sure what integration tests would add above what is already covered since every integration test in the test suite will already exercise the netcat / systemd service portion of this.

If dedicated integration tests are desired I'm happy to add them (with expectations laid out for what is desired). Note that the unittests added in this PR make use of the same operating system primitives (unix sockets) as any integration test would - with only the sd_notify mocked plus a couple of calls that were added which allow running cloud-init --all-stages under a debugger.

I'm currently working on a community notice to advertise the impacts to users and packagers.

Copy link

@setharnold setharnold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like an awesome performance improvement.

I'm concerned about reusing a systemd unit name for a completely different thing. I'd rather no names are re-used: someone somewhere either has fingers that know the names or documentation that refers to the names or (maybe not, for cloud-init?) service files and configurations on disk, or terraform plans, or ansible recipes, etc, that refer to the old units with the old meanings.

So: I suggest not using an old name for a new purpose.

One common problem with "multiple-stage" software is when something, or someone, causes the stages to execute out of order. This might not be quite as vulnerable as a TLS implementation in a webserver, but if the different phases have strict ordering requirements, we should have some simple checks for that ordering here.

Thanks

cloudinit/socket.py Outdated Show resolved Hide resolved
# notify systemd that this stage has completed
socket.sd_notify("READY=1")
# wait for cloud-init-local.service to start
with sync("local"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there ordering requirements inherent to cloud-init's different stages? If local must be run before network, for example, this would be an ideal place to encode these requirements.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there ordering requirements inherent to cloud-init's different stages?

Yes, these ordering requirements are inherent requirements. This order is already encoded in the init system. Currently cloud-init-local.service is ordered before cloud-init.service which is before cloud-config.service which is before cloud-final.service.

local must be run before network, for example, this would be an ideal place to encode these requirements.

@setharnold Maybe you had something specific in mind that I'm missing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these have to happen in lockstep order, maybe a variable to show what stage it's on:

stage=0
with sync("local"):
    if (stage++ < 0):
        error_and_exit("unexpected stage error")
    # stuff
with sync("network"):
    if (stage++ < 1):
        error_and_exit("network must follow local")
    # stuff
with sync("config"):
    if (stage++ < 2):
        error_and_exit("config must follow network")
    # stuff
with sync # over and over again
...

If the ordering is more vague, like "config" requires "local" and doesn't care about "network", then local_done and network_done and config_done and so on boolean variables to mark these completed in turn, then each one could check the status of the phases that it depends upon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these have to happen in lockstep order, maybe a variable to show what stage it's on:

stage=0
with sync("local"):
    if (stage++ < 0):
        error_and_exit("unexpected stage error")
    # stuff
with sync("network"):
    if (stage++ < 1):
        error_and_exit("network must follow local")
    # stuff
with sync("config"):
    if (stage++ < 2):
        error_and_exit("config must follow network")
    # stuff
with sync # over and over again
...

If the ordering is more vague, like "config" requires "local" and doesn't care about "network", then local_done and network_done and config_done and so on boolean variables to mark these completed in turn, then each one could check the status of the phases that it depends upon.

We could add code like this, but it wouldn't be possible for error_and_exit() to ever be called, regardless of the order that the messages are received on the unix sockets. This is single threaded code and it only has a single entry point. So in order for the network stage to run, the local stage has to have run.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've just got less faith than you in the systemd-enforced ordering, that's all. If you're sure I'm over-thinking it, feel free to leave it out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've just got less faith than you in the systemd-enforced ordering, that's all.

Systemd is not the only thing enforcing ordering - it is already enforced in the Python code as well. If systemd triggers stages out of order, earlier stages will not get skipped - if "network" is triggered before "local", this Python code will wait until systemd triggers "local".

The reason is that the ordering of these stages isn't enforced in the definition of the context manager. It is enforced by the order that the context manager is invoked in - since with sync("local") is called prior to with sync("network") is called, the network stage cannot start until the local stage has completed. No parallalism / async / threading / coroutines / etc are in use; the code simply runs one stage after the next but not until systemd has signaled that it is time for the stage to run.

If you're sure I'm over-thinking it, feel free to leave it out.

I'm pretty sure that's what is happening, but I want to make sure you're comfortable with / understanding with how it works too. Hopefully the following annotated code helps.

with sync("local"):
   # "start" must have been received on the local socket to enter the context manager
   # (triggered by the one-liner in cloud-init-local.service)

# once the code gets to this point, the "local" stage has completed (and a response was
# sent in the context manager to the shim process in cloud-init-local.service)
with sync("network"):
   # "start" must have been received on the network socket to enter the context manager
   # (triggered by the one-liner in cloud-init-network.service) - in addition to
   # all of the code above being completed

# once the code gets to this point, the "network" stage has completed (and a response 
# was sent in the context manager to the shim process in cloud-init-network.service)
with sync("config"):
   # "start" must have been received on the config socket to enter the context manager
   # (triggered by the one-liner in cloud-config.service) - in addition to
   # the code above being completed

# once the code gets to this point, the "config" stage has completed (and a response 
# was sent in the context manager to the shim process in cloud-config.service)
with sync("final"):
   # "start" must have been received on the final socket to enter the context manager
   # (triggered by the one-liner in cloud-config.service) - in addition to
   # the code above being completed

So even if systemd triggers events out of order, cloud-init won't run them out of order. It will just wait until the next stage in the order has completed.

Please let me know if you have any questions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! now I can see why you're confident :) thanks!

cloudinit/socket.py Outdated Show resolved Hide resolved
cloudinit/socket.py Show resolved Hide resolved
cloudinit/socket.py Outdated Show resolved Hide resolved
@@ -10,7 +10,7 @@ ConditionEnvironment=!KERNEL_CMDLINE=cloud-init=disabled

[Service]
Type=oneshot
ExecStart=/usr/bin/cloud-init modules --mode=config
ExecStart=sh -c 'echo "start" | nc.openbsd -Uu -W1 /run/cloud-init/share/config.sock -s /run/cloud-init/share/config-return.sock | sh'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fairly complex invocation for an ExecStart with no surrounding commentary or documentation to aid in explaining the intent of the socket setup and pipe chain. It also makes typical triage entrypoints harder to read when folks may jump into systemctl status cloud-*.service
as it pushes visibility of exit codes off screen because our ExecStart is so long.

It may aid in readability/maintainability if cloud-init delivered a simple wrapper script in /usr/lib/cloudinit/that would take a single command line param for the stage name (local|network|config|final) and create these seperate named sockets. We can then document in that script the intent of opening the socket and the act of passing through socket payloads to sh.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fairly complex invocation for an ExecStart with no surrounding commentary or documentation to aid in explaining the intent of the socket setup and pipe chain.

Agreed, inline-documentation would help a lot. I'm happy to add that.

It also makes typical triage entrypoints harder to read when folks may jump into systemctl status cloud-*.service as it pushes visibility of exit codes off screen because our ExecStart is so long.

You're referring to systemd's pager truncating the message on the Process: line, right?

$ systemctl status cloud-init-local.service
● cloud-init-local.service - Cloud-init: Local Stage (pre-network)
     Loaded: loaded (/usr/lib/systemd/system/cloud-init-local.service; enabled; preset: enabled)
     Active: active (exited) since Wed 2024-07-24 19:53:46 UTC; 2min 25s ago
    Process: 254 ExecStart=sh -c echo "start" | nc.openbsd -Uu -W1 /run/cloud-init/share/local.sock -s /run/cloud-init/share/local-return.sock | sh (code=exited, status=0/S>
   Main PID: 254 (code=exited, status=0/SUCCESS)
        CPU: 5ms

I don't think that this is really a concern. This off-screen information is duplicated on the following line, and even if it wasn't, one can always use --no-pager or pipe to cat / less / etc.

It may aid in readability/maintainability if cloud-init delivered a simple wrapper script in /usr/lib/cloudinit/that would take a single command line param for the stage name (local|network|config|final) and create these seperate named sockets. We can then document in that script the intent of opening the socket and the act of passing through socket payloads to sh.

I'd really rather not hide the implementation in a script. The original implementation was dead simple - just a single nc invocation but at the request of getting error codes into the shim services I made this compromise to the previous simplicity. I'm very averse to making this any more complex than it already is. I don't want another ds-identify. Frankly I'd sooner drop the error codes in the shim services or re-write this one-liner in C/Rust/Go (but in preference of shipping code that we already have packaging for I've proposed it like this for now with the intent of potentially rewriting in a different language eventually).

Also, putting this in a file would add a file load and read to the critical chain per-stage.

Copy link
Collaborator

@blackboxsw blackboxsw Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, inline-documentation would help a lot. I'm happy to add that.

This meets the majority of my concern, I'm good with your concerns about adding a wrapper layer adding complexity or obscurity where it may not be needed.

If we have comments within this unit that describe the intent of this interaction, then I can back down on the "please make systemctl status cloud-init.service more readable/simple" point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the latest commit, which I think addresses the remaining concerns in this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes looking great.

cloudinit/cmd/main.py Show resolved Hide resolved
cloudinit/socket.py Show resolved Hide resolved
# the second value contains the path to a unix socket on which to
# reply, which is expected to be /path/to/{self.stage}-return.sock
sock = self.sockets[self.stage]
chunk, self.remote = sock.recvfrom(5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically systemd units have been kicked out of systemd boot goals due to misconfiguraiton ordering cycles (unit dependency chain issues) with other systemd units. Do we want to implement some sort of long timeout exception handling here to not block indefinitely and hard error that "cloud-init didn't start expected X stage in X mins" to unblock the rest of boot?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically systemd units have been kicked out of systemd boot goals due to misconfiguraiton ordering cycles (unit dependency chain issues) with other systemd units.
Do we want to implement some sort of long timeout exception handling here to not block indefinitely and hard error that "cloud-init didn't start expected X stage in X mins" to unblock the rest of boot?

This is a condition which should be discovered before the image ever gets shipped, not something that should bite end users.

I really don't think that we want that. What you are referring to is a symptom seen at runtime of a problem that in almost all cases is introduced by a combination of packages that have been installed on the system (build time). Services in systemd can take an unknowable amount of time to complete and a timeout could break a real user of cloud-init that has to deal with long boot times. Such a timeout isn't a correct solution for this problem. The correct solution is to fix the ordering cycle. Adding such a timeout could break real users that have long boot times just to introduce a broken workaround for packagers.

Yes cloud-init will behave differently from before if booted out due to an ordering cycle. However I don't think that this is something that an end user should ever see unless they themselves are messing with the boot order or their OS vendor made a mistake - and even in those cases it still doesn't seem like a problem for cloud-init to attempt to solve at the expense of real users with slow boot times.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a condition which should be discovered before the image ever gets shipped, not something that should bite end users.

This does happen sometimes to (image creator) end-users who install supplemental systemd services in their golden images which has resulted in bogus bugs/questions about cloud-init not running right (because it was ejected from the boot goals due to systemd unit dependency conflicts). I was wondering if there are ways we could make that condition a bit more conspicuous, but the symptoms below are probably "good enough" for those image creators to triage and determine what is causing the block in boot.

Here are the symptoms a user would encounter if we don't instrument a timeout of some sort our failure mode in images where cloud-network.service were to be ejected from boot goals due to cyclic dependency ordering the system will be in the following state:

  • systemctl status` will indefinitely set at "starting" State.
  • systemctl list-jobs will not show the missing/disabled cloud-network.service but instead show cloud-final.service, cloud-init.target in waiting state and cloud-config.service sitting indefinitely in "running" state with not logs
  • systemctl status cloud-init.service will show Waiting on external services to complete (network stage)
  • cloud-init status --format=yaml will show extended_status: running and no start times for init stage in the resulting YAML output.

These are probably reasonable enough breadcrumbs to look into on systems which exhibit this configuration problem and may not warrant further arbitrary timeouts which could adversely affect slow system startup.

Copy link
Member Author

@holmanb holmanb Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a condition which should be discovered before the image ever gets shipped, not something that should bite end users.

This does happen sometimes to (image creator) end-users who install supplemental systemd services in their golden images which has resulted in bogus bugs/questions about cloud-init not running right (because it was ejected from the boot goals due to systemd unit dependency conflicts). I was wondering if there are ways we could make that condition a bit more conspicuous, but the symptoms below are probably "good enough" for those image creators to triage and determine what is causing the block in boot.

This can easily be checked with systemd-analyze verify default.target and checking the output, or by checking the journal for messages like: "Breaking ordering cycle by deleting job cloud-init.service".

I can add that to the docs in the appropriate place for anyone that runs into this. I'll make it a separate PR. I've added it to the "future work" list in the PR description.

systemd/cloud-network.service.tmpl Outdated Show resolved Hide resolved
@holmanb
Copy link
Member Author

holmanb commented Jul 24, 2024

Thanks for the review @setharnold!

So: I suggest not using an old name for a new purpose.

You make good points. I was hopeful that this would be a good name for it, and I still think it is, however you're right that this may make the transition unnecessarily painful. I've changed the name of the primary service to cloud-init-main.service.

One common problem with "multiple-stage" software is when something, or someone, causes the stages to execute out of order. This might not be quite as vulnerable as a TLS implementation in a webserver, but if the different phases have strict ordering requirements, we should have some simple checks for that ordering here.

With these changes, cloud-init can only run these stages in the required order. Cloud-init uses systemd ordering to guarantee that a stage is not started before it is ready, and furthermore will only run each stage once the last one has ran and once it has received the "start" message for the next stage. This PR also includes tests that exercise this ordering functionality, so I think that we already have in place what is being suggested, unless I missed something.

@github-actions github-actions bot added the documentation This Pull Request changes documentation label Jul 24, 2024
@holmanb holmanb force-pushed the holmanb/fastpath-single-proc branch from 40992ea to ae7a4da Compare July 24, 2024 23:12
@holmanb
Copy link
Member Author

holmanb commented Jul 24, 2024

I just rebased on tip of main to fix a unit test that failed due to service rename.

@holmanb holmanb requested a review from blackboxsw August 1, 2024 23:42
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!.
Minor requests:

  • a more informative message than "done" when we are done talking to the socket for a given boot stage
  • dropping the 2nd handle = declaration
  • possibly removing down socket files when we've closed() our communication

Comment on lines 862 to 866
handler = next(
filter(
lambda h: isinstance(h, log.LogExporter), root_logger.handlers
)
).export_logs()
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are declaring handler before the leading try: block above, we don't need to redeclare this here. I think you can drop these 4 lines now.


def __exit__(self, exc_type, exc_val, exc_tb):
"""Notify the socket that this stage is complete."""
message = "done"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this empty message doesn't look very informative in each unit logs like systemctl show cloud-config.service

... sh[280]: done
... systemd[1]: Finished cloud-init-network.service- Cloud-init Network stage

How about something a bit more specific?

Suggested change
message = "done"
message = f"Completed socket interaction for boot stage {self.stage}"

sock.sendall(
f"echo '{message}'; exit {self.systemd_exit_code};".encode()
)
sock.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we finish our socket interaction, shouldn't we be removing the associate file paths to avoid leaving them in /run/cloud-init/shared/*sock files?

Suggested change
sock.close()
sock.close()
for socket_path in (self.remote, self.remote.replace("-return", "")):
with suppress(FileNotFoundError):
os.remove(socket_path)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we finish our socket interaction, shouldn't we be removing the associate file paths to avoid leaving them in /run/cloud-init/shared/*sock files?

No. Doing this would introduce a race into the socket protocol.

If we were to do that, there would be no guarantee that the receiving process was scheduled in time by the kernel to receive the message before it gets deleted.

We could "wait until the end", and then delete them all, but again we would require some way to ensure that the message was received before deletion.

These files take up very little space and are harmless, so why not just leave them?

@@ -10,7 +10,7 @@ ConditionEnvironment=!KERNEL_CMDLINE=cloud-init=disabled

[Service]
Type=oneshot
ExecStart=/usr/bin/cloud-init modules --mode=config
ExecStart=sh -c 'echo "start" | nc.openbsd -Uu -W1 /run/cloud-init/share/config.sock -s /run/cloud-init/share/config-return.sock | sh'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes looking great.

Python interpreter initialization and module import time is a
significant portion of cloud-init's total runtime when the
default configuration is used, and in all cases it contributes
a significant amount of wall clock time to cloud-init's runtime.

This commit significantly improves cloud-init time to completion
by eliminating redundant interpreter starts and module loads.
Since multiple cloud-init processes sit in the critical chain of
the boot order, this significantly reduces cloud-init's time to
ssh and time to completion.

Cloud-init has four stages. Each stage starts its own Python
interpreter and loads the same libraries. To eliminate the
redundant work of starting an interpreter and loading libraries,
this changes cloud-init to run as a single process. Systemd
service ordering is retained by using the existing cloud-init
services as shims which use a synchronization protocol to start
each cloud-init stage and to communicate that each stage is
complete to the init system.

Currently only systemd is supported, but the synchronization
protocol should be capable of supporting other init systems
as well with minor changes.

Note: This makes possible many additional improvements that
eliminate redundant work. However, these potential improvements
are temporarily ignored. This commit has been structured to
minimize the changes required to capture the majority of primary
performance savings while preserving correctness and the ability
to preserve backwards compatibility. Many additional
improvements will be possible once this merges.

Synchronization protocol
========================
- create one Unix socket for each systemd service stage
- send sd_notify()
- For each of the four stages (local, network, config, final):
   - when init system sends "start" to the Unix socket, start the
     stage
   - when running stage is complete, send "done" to Unix socket

socket.py (new)
---------------

- define a systemd-notify helper function
- define a context manager which implements a multi-socket
  synchronization protocol

cloud-init-single.service (new)
-------------------------------

 - use service type to 'notify'
 - invoke cloud-init in single process mode
 - adopt systemd ordering requirements from cloud-init-local.service
 - adopt KillMode from cloud-final.service

main.py
-------

 - Add command line flag to indicate single process mode
 - In this mode run each stage followed by an IPC
   synchronization protocol step

cloud-{local,init,config,final}.services
----------------------------------

- change ExecStart to use netcat to connect to Unix socket and:
  - send a start message
  - wait for completion response
- note: a pure Python equivalent is possible for any downstreams
  which do not package openbsd's netcat

cloud-final.services
--------------------
- drop KillMode

cloud-init-local.services
--------------------
- drop dependencies made redundant by ordering after
  cloud-init-single.service

Performance Results
===================

An integration test [1] on a Noble lxd container comparing POC to current
release showed significant improvement. In the POC, cloud-config.service
didn't register in the critical-chain (~340ms improvement),
cloud-init.service added ~378ms to total boot time (~400ms improvement),
and cloud-init-local.service had a marginal improvement (~90ms) which was
likely in the threshold of noise. The total improvement in this (early
stage) test showed a 0.83s improvement to total boot time with 0.66s of
boot time remaining due to cloud-init. This 0.83s second improvement
roughly corresponds to the total boot time, with systemd-analyze
critical-chain reporting 2.267s to reach graphical.target, which is a
0.8s improvement over the current release time.

Note: The numbers quoted above gathered from only one series (Noble),
one platform (lxc), one host machine (Ryzen 7840U), and run environment
was not controlled. I ran the test multiple times to ensure that the
results were repeatable, but not enough times to be considered
statistically significant. I verified that cloud-init worked as expected,
but given the limited scope of this integration test, this is still very
much a proof of concept.

[1] test_logging.py

BREAKING_CHANGE: Run all four cloud-init services as a single systemd service.
Make cloud-network.service map to the cloud-init network stage.
Make cloud-init.service map to all of cloud-init.

BREAKING CHANGE: Changes the semantics of the cloud-init.service files
- make it such that if one stage fails, the next stage isn't blocked
  indefinitely
- notify the init system of per-stage exit codes and failure messages
- make parent process (cloud-init.service) exit with representative exit code
Add a new attribute flag to the argparser Namespace attribute which is used
to disable logging.

This isn't elegant, but fixing logging is going to be a large refactor
so this gets logging "working" for now while minimizing number of LOC
changed
skips sync protocol when stdin is a tty
- remove logs duplicated across stages
- send the single line traceback to systemd
- fix a minor string format in user output
@holmanb holmanb force-pushed the holmanb/fastpath-single-proc branch from 0de6016 to cb7bb25 Compare August 2, 2024 20:24
@holmanb
Copy link
Member Author

holmanb commented Aug 2, 2024

The cc_mounts PR that we just landed introduced a merge conflict with this PR, so I just pushed a change which resolves that.

@holmanb holmanb merged commit 143bc9e into canonical:main Aug 2, 2024
23 checks passed
holmanb added a commit to holmanb/cloud-init that referenced this pull request Aug 2, 2024
Python interpreter initialization and module import time 
contributes a significant amount of wall clock time to
cloud-init's runtime (and therefore to total boot time).

Cloud-init has four stages. Each stage starts its own Python
interpreter and loads the same libraries. To eliminate the
redundant work of starting an interpreter and loading libraries,
this changes cloud-init to run as a single process. Systemd
service ordering is retained by using the existing cloud-init
services as shims which use a synchronization protocol to start
each cloud-init stage and to communicate that each stage is
complete to the init system. Since multiple cloud-init processes
sit in the critical chain of starting the system, this reduces
boot time (including time to ssh login and time to cloud-init
completion).

Currently only systemd is supported, but the synchronization
protocol should be capable of supporting other init systems
as well with minor changes.

Note: This enables many additional follow-on improvements that
eliminate redundant work. However, these potential improvements
are temporarily ignored. This commit has been structured to
minimize the changes required to capture the majority of primary
performance savings while preserving correctness and the ability
to preserve backwards compatibility.

Since this changes the semantics of the existing cloud-init unit
files, this change takes the opportunity to rename one of its
systemd units which causes frequent user confusion. The unit named
cloud-init.service is often mistaken by users for being the only
cloud-init service, when it is simply one of four stages. This
stage is documented as the "network" stage, so this service will
be renamed to "cloud-init-network.service". A new notify service
is added as part of this implementation which contains the
cloud-init process. This unit is named "cloud-init-main.service".
 
Synchronization protocol
========================

- create one Unix socket for each systemd service stage
- send sd_notify()
- For each of the four stages (local, network, config, final):
   - when init system sends "start" to the Unix socket, start the
     stage
   - when running stage is complete, send "done" to Unix socket

File changes
============

socket.py (new)
---------------

- define a systemd-notify helper function
- define a context manager which implements a multi-socket
  synchronization protocol

cloud-init.service -> cloud-init-network.service (renamed)
----------------------------------------------------------

- renamed to cloud-network.service

cloud-{init-local,init-network,config,final}.services
-------------------------------------------

- change ExecStart to use netcat to connect to Unix socket and:
  - send a start message
  - wait for completion response
- note: a pure Python equivalent is possible for any downstreams
  which do not package openbsd's netcat

cloud-init-main.service (new)
-----------------------------

 - use service type to 'notify'
 - invoke cloud-init in single process mode
 - adopt systemd ordering requirements from cloud-init-local.service
 - adopt KillMode from cloud-final.service

main.py
-------

 - Add command line flag to indicate "all stages" mode
 - In this mode run each stage followed by an IPC
   synchronization protocol step

cloud-final.services
--------------------

- drop KillMode

cloud-init-local.services
-------------------------

- drop dependencies made redundant by ordering after
  cloud-init-main.service

Performance Impact
==================

On Ubuntu 24.04, Python's wall clock start up time as measured with
`time python3 -c 'import cloudinit.cmd.main' on a few cloud types:

lxc container: 0.256s
QEMU machine:  0.300s
gce instance:  0.367s
ec2 instance:  0.491s

This change eliminates x1 this start up time from time to ssh.
This change eliminates x3 this start up time from cloud-init's total
completion. Total benefit varies based on the platform that the
instance is hosted by, but all platforms will measurably benefit from
this change.

BREAKING_CHANGE: Run all four cloud-init services as a single systemd service.
holmanb added a commit that referenced this pull request Aug 6, 2024
Python interpreter initialization and module import time 
contributes a significant amount of wall clock time to
cloud-init's runtime (and therefore to total boot time).

Cloud-init has four stages. Each stage starts its own Python
interpreter and loads the same libraries. To eliminate the
redundant work of starting an interpreter and loading libraries,
this changes cloud-init to run as a single process. Systemd
service ordering is retained by using the existing cloud-init
services as shims which use a synchronization protocol to start
each cloud-init stage and to communicate that each stage is
complete to the init system. Since multiple cloud-init processes
sit in the critical chain of starting the system, this reduces
boot time (including time to ssh login and time to cloud-init
completion).

Currently only systemd is supported, but the synchronization
protocol should be capable of supporting other init systems
as well with minor changes.

Note: This enables many additional follow-on improvements that
eliminate redundant work. However, these potential improvements
are temporarily ignored. This commit has been structured to
minimize the changes required to capture the majority of primary
performance savings while preserving correctness and the ability
to preserve backwards compatibility.

Since this changes the semantics of the existing cloud-init unit
files, this change takes the opportunity to rename one of its
systemd units which causes frequent user confusion. The unit named
cloud-init.service is often mistaken by users for being the only
cloud-init service, when it is simply one of four stages. This
stage is documented as the "network" stage, so this service will
be renamed to "cloud-init-network.service". A new notify service
is added as part of this implementation which contains the
cloud-init process. This unit is named "cloud-init-main.service".
 
Synchronization protocol
========================

- create one Unix socket for each systemd service stage
- send sd_notify()
- For each of the four stages (local, network, config, final):
   - when init system sends "start" to the Unix socket, start the
     stage
   - when running stage is complete, send "done" to Unix socket

File changes
============

socket.py (new)
---------------

- define a systemd-notify helper function
- define a context manager which implements a multi-socket
  synchronization protocol

cloud-init.service -> cloud-init-network.service (renamed)
----------------------------------------------------------

- renamed to cloud-network.service

cloud-{init-local,init-network,config,final}.services
-------------------------------------------

- change ExecStart to use netcat to connect to Unix socket and:
  - send a start message
  - wait for completion response
- note: a pure Python equivalent is possible for any downstreams
  which do not package openbsd's netcat

cloud-init-main.service (new)
-----------------------------

 - use service type to 'notify'
 - invoke cloud-init in single process mode
 - adopt systemd ordering requirements from cloud-init-local.service
 - adopt KillMode from cloud-final.service

main.py
-------

 - Add command line flag to indicate "all stages" mode
 - In this mode run each stage followed by an IPC
   synchronization protocol step

cloud-final.services
--------------------

- drop KillMode

cloud-init-local.services
-------------------------

- drop dependencies made redundant by ordering after
  cloud-init-main.service

Performance Impact
==================

On Ubuntu 24.04, Python's wall clock start up time as measured with
`time python3 -c 'import cloudinit.cmd.main' on a few cloud types:

lxc container: 0.256s
QEMU machine:  0.300s
gce instance:  0.367s
ec2 instance:  0.491s

This change eliminates x1 this start up time from time to ssh.
This change eliminates x3 this start up time from cloud-init's total
completion. Total benefit varies based on the platform that the
instance is hosted by, but all platforms will measurably benefit from
this change.

BREAKING_CHANGE: Run all four cloud-init services as a single systemd service.
@holmanb holmanb mentioned this pull request Aug 8, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation This Pull Request changes documentation packaging Supplemental package review requested security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants