feat: single process optimization #5489

holmanb · 2024-07-04T23:05:26Z

Proposed Commit Message

feat: Single process optimization

Python interpreter initialization and module import time 
contributes a significant amount of wall clock time to
cloud-init's runtime (and therefore to total boot time).

Cloud-init has four stages. Each stage starts its own Python
interpreter and loads the same libraries. To eliminate the
redundant work of starting an interpreter and loading libraries,
this changes cloud-init to run as a single process. Systemd
service ordering is retained by using the existing cloud-init
services as shims which use a synchronization protocol to start
each cloud-init stage and to communicate that each stage is
complete to the init system. Since multiple cloud-init processes
sit in the critical chain of starting the system, this reduces
boot time (including time to ssh login and time to cloud-init
completion).

Currently only systemd is supported, but the synchronization
protocol should be capable of supporting other init systems
as well with minor changes.

Note: This enables many additional follow-on improvements that
eliminate redundant work. However, these potential improvements
are temporarily ignored. This commit has been structured to
minimize the changes required to capture the majority of primary
performance savings while preserving correctness and the ability
to preserve backwards compatibility.

Since this changes the semantics of the existing cloud-init unit
files, this change takes the opportunity to rename one of its
systemd units which causes frequent user confusion. The unit named
cloud-init.service is often mistaken by users for being the only
cloud-init service, when it is simply one of four stages. This
stage is documented as the "network" stage, so this service will
be renamed to "cloud-init-network.service". A new notify service
is added as part of this implementation which contains the
cloud-init process. This unit is named "cloud-init-main.service".
 
Synchronization protocol
========================

- create one Unix socket for each systemd service stage
- send sd_notify()
- For each of the four stages (local, network, config, final):
   - when init system sends "start" to the Unix socket, start the
     stage
   - when running stage is complete, send "done" to Unix socket

File changes
============

socket.py (new)
---------------

- define a systemd-notify helper function
- define a context manager which implements a multi-socket
  synchronization protocol

cloud-init.service -> cloud-init-network.service (renamed)
----------------------------------------------------------

- renamed to cloud-network.service

cloud-{init-local,init-network,config,final}.services
-------------------------------------------

- change ExecStart to use netcat to connect to Unix socket and:
  - send a start message
  - wait for completion response
- note: a pure Python equivalent is possible for any downstreams
  which do not package openbsd's netcat

cloud-init-main.service (new)
-----------------------------

 - use service type to 'notify'
 - invoke cloud-init in single process mode
 - adopt systemd ordering requirements from cloud-init-local.service
 - adopt KillMode from cloud-final.service

main.py
-------

 - Add command line flag to indicate "all stages" mode
 - In this mode run each stage followed by an IPC
   synchronization protocol step

cloud-final.services
--------------------

- drop KillMode

cloud-init-local.services
-------------------------

- drop dependencies made redundant by ordering after
  cloud-init-main.service

Performance Impact
==================

On Ubuntu 24.04, Python's wall clock start up time as measured with
`time python3 -c 'import cloudinit.cmd.main' on a few cloud types:

lxc container: 0.256s
QEMU machine:  0.300s
gce instance:  0.367s
ec2 instance:  0.491s

This change eliminates x1 this start up time from time to ssh.
This change eliminates x3 this start up time from cloud-init's total
completion. Total benefit varies based on the platform that the
instance is hosted by, but all platforms will measurably benefit from
this change.

BREAKING_CHANGE: Run all four cloud-init services as a single systemd service.

(if you made it this far, don't worry the implementation code is only 2x longer than this commit message!)

Design

Why use Unix sockets?

The implementation requires a fast, event-driven, lossless, bi-directional, cross platform IPC mechanism that is easily limited to root-only send/receive. Alternatives obviously exist, but Unix sockets seem like a natural fit.

Why not just share one socket?

A socket per stage keeps the implementation and makes failure modes simple.

Why openbsd netcat?

It supports Unix sockets and it's 7x faster than a bare bones pure Python implementation. A C / Rust / Go based implementation would be trivial to write, but netcat already exists and does the job sufficiently fast.

Why a new service?

any external service that is ordered Before=cloud-init-local.service can now run in parallel to the interpreter loading
by decoupling interpreter loading from running cloud-init logic, systemd can potentially start loading cloud-init earlier in boot (note the difference in ordering between cloud-init-single.service vs cloud-init-local.service)
changing all four services to be shims that use an identical synchronization protocol to trigger and wait for the cloud-init stages keeps the design symmetrical - this makes it easier to debug and describe

Security implications

Dependencies

No additional dependencies required - this change implements our own sd_notify.

File permissions

The directory which contains the unix sockets is limited to root-only (see cloudinit/socket.py::67):

os.makedirs(f"{DEFAULT_RUN_DIR}/share", mode=0o700, exist_ok=True)

On Ubuntu DEFAULT_RUN_DIR resolves to /run/cloud-init/ which has the following access:

$ stat /run/cloud-init/
...
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)

Currently (before this PR), writes to this directory could modify cloud-init's configuration; cloud-init depends on POSIX permissions.

Todo (this PR)

unittest - this code gets exercised by every integration test, but a unittest
would benefit development and could better stress test the synchronization code
for race conditions
fix integration test tests/integration_tests/test_multi_part_user_data_handling.py::test_cloud_config_archive_boot_hook_logging
update performance results in the above commit message

Future work (followup PRs, required pre-24.3 release)

update man page
update CLI docs
update debugging docs per feat: single process optimization #5489 (comment)
fix logging
- cloud-init pipes all output to tee by default (which is completely unnecessary for stream handling - we can probably ditch this dependency), which leaves processes hanging around after cloud-init signals completion. This leaves extraneous log messages in the output of systemctl status cloud-init-single.service (see Appendix B).

[RFC] Names (closed)

The cloud-init --single-process and cloud-init-single.service is the working names that I've used, but I think that we can do better.

cloud-init - this naming proposal is the spiritual successor to this spec we could rename cloud-init.service to cloud-init-network.service and repurpose the service name cloud-init.service to be the service that, you know, "runs cloud-init". This would probably be the best option in the long run, but would come at the cost of short-term pain for distributions. Since we won't backport this change to old releases this might be feasible. I'd be willing to write a blog post and release notes regarding the choice, and the changes that are visible to the user/distributor should be pretty straightforward: for any package that has an ordering dependency on cloud-init.service they could just sed -i s/cloud-init.service/cloud-init-network.service/g`.
runner / run - the names cloud-init --run and cloud-init-runner.service make it clear that a thing is running, but some might still see a cloud-init.service and incorrectly assume that this is the name of the service that runs cloud-init (many have made this incorrect assumption)

Rejected names

daemon - this isn't a long-lived process. While this lives longer than the previous four processes, it simply combines their four lifespans into one. This name would be misleading.
all - cloud-init --all and cloud-init-all.service, convey the right meaning, but it requires context to understand - the only reason to call this "all" is because it replaces multiple parts.

Appendix

A. systemctl status cloud-init-local.service

● cloud-init-local.service - Cloud-init: Local Stage (pre-network)
     Loaded: loaded (/usr/lib/systemd/system/cloud-init-local.service; enabled; preset: enabled)
     Active: active (exited) since Tue 2024-07-16 00:51:39 CEST; 16h ago
   Main PID: 254 (code=exited, status=0/SUCCESS)
        CPU: 1ms

Jul 16 00:51:39 cloudinit-0715-225129iftdf1yb systemd[1]: Starting cloud-init-local.service - Cloud-init: Local Stage (pre-network)...
Jul 16 00:51:39 cloudinit-0715-225129iftdf1yb nc.openbsd[254]: done
Jul 16 00:51:39 cloudinit-0715-225129iftdf1yb systemd[1]: Finished cloud-init-local.service - Cloud-init: Local Stage (pre-network).

B. systemctl status cloud-init-single.service

○ cloud-init-single.service - Cloud-init: Single Process
     Loaded: loaded (/usr/lib/systemd/system/cloud-init-single.service; enabled; preset: enabled)
     Active: inactive (dead) since Tue 2024-07-16 00:52:11 CEST; 16h ago
   Duration: 31.569s
    Process: 76 ExecStart=/usr/bin/cloud-init --single-process (code=exited, status=0/SUCCESS)
   Main PID: 76 (code=exited, status=0/SUCCESS)
     Status: "Complete"
        CPU: 10.096s



Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb cloud-init[257]: | ... . o =oO.    |
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb cloud-init[257]: +----[SHA256]-----+
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb systemd[1]: cloud-init-single.service: Deactivated successfully.
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb systemd[1]: cloud-init-single.service: Unit process 256 (sh) remains running after unit stopped.
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb systemd[1]: cloud-init-single.service: Unit process 257 (tee) remains running after unit stopped.
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb systemd[1]: cloud-init-single.service: Unit process 273 (sh) remains running after unit stopped.
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb systemd[1]: cloud-init-single.service: Unit process 274 (tee) remains running after unit stopped.
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb systemd[1]: cloud-init-single.service: Unit process 369 (sh) remains running after unit stopped.
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb systemd[1]: cloud-init-single.service: Unit process 1161 (sh) remains running after unit stopped.
Jul 16 00:52:11 cloudinit-0715-225129iftdf1yb systemd[1]: cloud-init-single.service: Consumed 10.095s CPU time, 228.9M memory peak, 0B memory swap peak.

C. Example log from instance booted with latest commit:

single-process.log

Merge type

Squash merge using "Proposed Commit Message"
Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

a-dubs · 2024-07-05T13:28:42Z

(if you made it this far, don't worry the implementation code is only 2x longer than this commit message!)

🤣

TheRealFalcon

Great work here! I have some comments but they're all very minor. We should take some extra time to think about any additional testing that may need happen, but overall +1 to the approach.

[RFC] Names

--single-process is fine, but I'm -1 to cloud-init-single.service simply because we have the cloud-init single command that could potentially be confusing.

What about cloud-init boot and cloud-init-boot.service?

Long term, I like the idea of making this cloud-init.service, but I'm not sure it's worth the churn of doing it at the same time as this PR. Ideally, this PR won't introduce any backwards incompatibility, but given the scope of the change, I'm sure there will be some effects we need to deal with, so I think I'd rather keep intentional breakage separate.

Why openbsd netcat?

I don't know the history or difference between implementations, but is nc.openbsd installed everywhere? If not, is the difference meaningful such that we couldn't just use nc?

cloudinit/cmd/main.py

cloudinit/socket.py

systemd/cloud-init-single.service

cloudinit/cmd/main.py

cloudinit/socket.py

systemd/cloud-config.service.tmpl

cloudinit/socket.py

holmanb · 2024-07-15T17:59:43Z

Great work here! ... overall +1 to the approach.

Thanks!

We should take some extra time to think about any additional testing that may need happen

Agreed. I have an idea for an integration test that would support verifying the behavior more easily.

[RFC] Names

...

Long term, I like the idea of making this cloud-init.service, but I'm not sure it's worth the churn of doing it at the same time as this PR. Ideally, this PR won't introduce any backwards incompatibility, but given the scope of the change, I'm sure there will be some effects we need to deal with, so I think I'd rather keep intentional breakage separate.

This definitely breaks backwards compatibility and will require a patch and will require downstream maintainers that to update their vendored service files if they want to benefit from the performance improvements here. If we do this change separately from a rename, that means two breaking changes rather than one. Since we already require downstream changes as a result of this PR, I'd prefer to do the rename now - with the justification of a cleanup alongside a performance improvement is less painful than two breaking changes. This will require a community communication email / post to communicate changes required of downstreams, so this can be included in the "what this means for you" section for maintainers. Thoughts?

is nc.openbsd installed everywhere?

I can't promise that it is used everywhere, but I do know that it is pretty widely used, and is already a dependency of ubuntu-minimal:

$ apt rdepends netcat-openbsd                                        
netcat-openbsd
Reverse Depends:
  Recommends: libvirt-daemon
  Suggests: mariadb-server
  Recommends: libvirt-daemon
  Depends: xletters
  Depends: tomcat10-user
 |Depends: rtpengine-utils
  Suggests: mariadb-server
 |Depends: ifupdown-extra
  Depends: freedombox
  Depends: ubuntu-minimal

If not, is the difference meaningful such that we couldn't just use nc?

Like I mentioned before, openbsd's netcat supports Unix sockets. I don't see any other implementations that do.

If we'd prefer to avoid more dependencies, a C / Rust / Go implementation of the behavior that we need would be trivial to write using minimal dependencies, but that comes with the complexity of cloud-init packages becoming arch-dependent - which is not a huge burden, but it is another packaging burdon (though it's probably worth considering since this would likely improve performance further).

aciba90

Nice work here, thanks for the effort!

I don't know the history or difference between implementations, but is nc.openbsd installed everywhere? If not, is the difference meaningful such that we couldn't just use nc?

@TheRealFalcon: AFAIK the standard netcat does not implement unix sockets.

aciba90 · 2024-07-16T14:29:04Z

systemd/cloud-init-local.service.tmpl

@@ -38,12 +37,13 @@ ExecStartPre=/bin/mkdir -p /run/cloud-init
 ExecStartPre=/sbin/restorecon /run/cloud-init
 ExecStartPre=/usr/bin/touch /run/cloud-init/enabled
 {% endif %}
-ExecStart=/usr/bin/cloud-init init --local
+ExecStart=nc.openbsd -Uu -W1 /run/cloud-init/share/local.sock -s /run/cloud-init/share/local-return.sock


This requires an explicit downstream Ubuntu explicit runtime dependency on netcat-openbsd.

This requires an explicit downstream Ubuntu explicit runtime dependency on netcat-openbsd.

Yes, it would be best to add this dependency to the packaging. However, since ubuntu-minimal already includes it it won't functionally change anything.

I plan to add that to the Ubuntu packaging in a followup PR.

Sweet! Even if it's a noop, I think explicitly expressing the dependency is the way to go as, cloud-init may be installed in environments where ubuntu-minimal is not, or ubuntu-minimal might drop the dependency on netcat-openbsd in the future.

cloudinit/cmd/main.py

systemd/cloud-init-single.service

holmanb · 2024-07-18T06:59:11Z

Nice work here, thanks for the effort!

@aciba90 thanks for the review! I'll ping you when I've addressed your points.

I don't know the history or difference between implementations, but is nc.openbsd installed everywhere? If not, is the difference meaningful such that we couldn't just use nc?

@TheRealFalcon: AFAIK the standard netcat does not implement unix sockets.

@TheRealFalcon also worth noting that on Ubuntu, I'm pretty sure that nc.openbsd is nc on Ubuntu. It is on Ubuntu Noble at least.

TheRealFalcon · 2024-07-18T20:42:52Z

This definitely breaks backwards compatibility and will require a patch and will require downstream maintainers that to update their vendored service files if they want to benefit from the performance improvements here. If we do this change separately from a rename, that means two breaking changes rather than one. Since we already require downstream changes as a result of this PR, I'd prefer to do the rename now - with the justification of a cleanup alongside a performance improvement is less painful than two breaking changes. This will require a community communication email / post to communicate changes required of downstreams, so this can be included in the "what this means for you" section for maintainers. Thoughts?

Makes sense. +1. We're just changing filenames though, right? Not also the CLI as you had initially speced out?

You and Alberto also worked through my nc concerns.

holmanb · 2024-07-19T23:27:18Z

Makes sense. +1. We're just changing filenames though, right? Not also the CLI as you had initially speced out?

Correct, I'm not going to touch pre-existing cli names.

For now I'm going to rename it to cloud-init --all-stages (named for what it is doing) rather than cloud-init --single-process (named for how it is different from the past). If there are any objections or suggestions over this name please let me know.

holmanb · 2024-07-23T21:48:04Z

Requesting re-review @TheRealFalcon. I added some another unit test. I think that the testing coverage does a pretty good job of exercising the ordering requirements and error conditions of the protocol in just the unit test suite, and it uses only standard library so I'm not sure what integration tests would add above what is already covered since every integration test in the test suite will already exercise the netcat / systemd service portion of this.

If dedicated integration tests are desired I'm happy to add them (with expectations laid out for what is desired). Note that the unittests added in this PR make use of the same operating system primitives (unix sockets) as any integration test would - with only the sd_notify mocked plus a couple of calls that were added which allow running cloud-init --all-stages under a debugger.

I'm currently working on a community notice to advertise the impacts to users and packagers.

setharnold

This sounds like an awesome performance improvement.

I'm concerned about reusing a systemd unit name for a completely different thing. I'd rather no names are re-used: someone somewhere either has fingers that know the names or documentation that refers to the names or (maybe not, for cloud-init?) service files and configurations on disk, or terraform plans, or ansible recipes, etc, that refer to the old units with the old meanings.

So: I suggest not using an old name for a new purpose.

One common problem with "multiple-stage" software is when something, or someone, causes the stages to execute out of order. This might not be quite as vulnerable as a TLS implementation in a webserver, but if the different phases have strict ordering requirements, we should have some simple checks for that ordering here.

Thanks

cloudinit/socket.py

setharnold · 2024-07-24T01:50:39Z

cloudinit/cmd/main.py

+    # notify systemd that this stage has completed
+    socket.sd_notify("READY=1")
+    # wait for cloud-init-local.service to start
+    with sync("local"):


Are there ordering requirements inherent to cloud-init's different stages? If local must be run before network, for example, this would be an ideal place to encode these requirements.

Are there ordering requirements inherent to cloud-init's different stages?

Yes, these ordering requirements are inherent requirements. This order is already encoded in the init system. Currently cloud-init-local.service is ordered before cloud-init.service which is before cloud-config.service which is before cloud-final.service.

local must be run before network, for example, this would be an ideal place to encode these requirements.

@setharnold Maybe you had something specific in mind that I'm missing?

If these have to happen in lockstep order, maybe a variable to show what stage it's on:

stage=0 with sync("local"): if (stage++ < 0): error_and_exit("unexpected stage error") # stuff with sync("network"): if (stage++ < 1): error_and_exit("network must follow local") # stuff with sync("config"): if (stage++ < 2): error_and_exit("config must follow network") # stuff with sync # over and over again ...

If the ordering is more vague, like "config" requires "local" and doesn't care about "network", then local_done and network_done and config_done and so on boolean variables to mark these completed in turn, then each one could check the status of the phases that it depends upon.

If these have to happen in lockstep order, maybe a variable to show what stage it's on:

stage=0 with sync("local"): if (stage++ < 0): error_and_exit("unexpected stage error") # stuff with sync("network"): if (stage++ < 1): error_and_exit("network must follow local") # stuff with sync("config"): if (stage++ < 2): error_and_exit("config must follow network") # stuff with sync # over and over again ...

If the ordering is more vague, like "config" requires "local" and doesn't care about "network", then local_done and network_done and config_done and so on boolean variables to mark these completed in turn, then each one could check the status of the phases that it depends upon.

We could add code like this, but it wouldn't be possible for error_and_exit() to ever be called, regardless of the order that the messages are received on the unix sockets. This is single threaded code and it only has a single entry point. So in order for the network stage to run, the local stage has to have run.

I think I've just got less faith than you in the systemd-enforced ordering, that's all. If you're sure I'm over-thinking it, feel free to leave it out.

I think I've just got less faith than you in the systemd-enforced ordering, that's all.

Systemd is not the only thing enforcing ordering - it is already enforced in the Python code as well. If systemd triggers stages out of order, earlier stages will not get skipped - if "network" is triggered before "local", this Python code will wait until systemd triggers "local".

The reason is that the ordering of these stages isn't enforced in the definition of the context manager. It is enforced by the order that the context manager is invoked in - since with sync("local") is called prior to with sync("network") is called, the network stage cannot start until the local stage has completed. No parallalism / async / threading / coroutines / etc are in use; the code simply runs one stage after the next but not until systemd has signaled that it is time for the stage to run.

If you're sure I'm over-thinking it, feel free to leave it out.

I'm pretty sure that's what is happening, but I want to make sure you're comfortable with / understanding with how it works too. Hopefully the following annotated code helps.

with sync("local"): # "start" must have been received on the local socket to enter the context manager # (triggered by the one-liner in cloud-init-local.service) # once the code gets to this point, the "local" stage has completed (and a response was # sent in the context manager to the shim process in cloud-init-local.service) with sync("network"): # "start" must have been received on the network socket to enter the context manager # (triggered by the one-liner in cloud-init-network.service) - in addition to # all of the code above being completed # once the code gets to this point, the "network" stage has completed (and a response # was sent in the context manager to the shim process in cloud-init-network.service) with sync("config"): # "start" must have been received on the config socket to enter the context manager # (triggered by the one-liner in cloud-config.service) - in addition to # the code above being completed # once the code gets to this point, the "config" stage has completed (and a response # was sent in the context manager to the shim process in cloud-config.service) with sync("final"): # "start" must have been received on the final socket to enter the context manager # (triggered by the one-liner in cloud-config.service) - in addition to # the code above being completed

So even if systemd triggers events out of order, cloud-init won't run them out of order. It will just wait until the next stage in the order has completed.

Please let me know if you have any questions.

Oh! now I can see why you're confident :) thanks!

cloudinit/socket.py

blackboxsw · 2024-07-24T12:58:08Z

systemd/cloud-config.service.tmpl

@@ -10,7 +10,7 @@ ConditionEnvironment=!KERNEL_CMDLINE=cloud-init=disabled

 [Service]
 Type=oneshot
-ExecStart=/usr/bin/cloud-init modules --mode=config
+ExecStart=sh -c 'echo "start" | nc.openbsd -Uu -W1 /run/cloud-init/share/config.sock -s /run/cloud-init/share/config-return.sock | sh'


This is a fairly complex invocation for an ExecStart with no surrounding commentary or documentation to aid in explaining the intent of the socket setup and pipe chain. It also makes typical triage entrypoints harder to read when folks may jump into systemctl status cloud-*.service
as it pushes visibility of exit codes off screen because our ExecStart is so long.

It may aid in readability/maintainability if cloud-init delivered a simple wrapper script in /usr/lib/cloudinit/that would take a single command line param for the stage name (local|network|config|final) and create these seperate named sockets. We can then document in that script the intent of opening the socket and the act of passing through socket payloads to sh.

This is a fairly complex invocation for an ExecStart with no surrounding commentary or documentation to aid in explaining the intent of the socket setup and pipe chain.

Agreed, inline-documentation would help a lot. I'm happy to add that.

It also makes typical triage entrypoints harder to read when folks may jump into systemctl status cloud-*.service as it pushes visibility of exit codes off screen because our ExecStart is so long.

You're referring to systemd's pager truncating the message on the Process: line, right?

$ systemctl status cloud-init-local.service ● cloud-init-local.service - Cloud-init: Local Stage (pre-network) Loaded: loaded (/usr/lib/systemd/system/cloud-init-local.service; enabled; preset: enabled) Active: active (exited) since Wed 2024-07-24 19:53:46 UTC; 2min 25s ago Process: 254 ExecStart=sh -c echo "start" | nc.openbsd -Uu -W1 /run/cloud-init/share/local.sock -s /run/cloud-init/share/local-return.sock | sh (code=exited, status=0/S> Main PID: 254 (code=exited, status=0/SUCCESS) CPU: 5ms

I don't think that this is really a concern. This off-screen information is duplicated on the following line, and even if it wasn't, one can always use --no-pager or pipe to cat / less / etc.

It may aid in readability/maintainability if cloud-init delivered a simple wrapper script in /usr/lib/cloudinit/that would take a single command line param for the stage name (local|network|config|final) and create these seperate named sockets. We can then document in that script the intent of opening the socket and the act of passing through socket payloads to sh.

I'd really rather not hide the implementation in a script. The original implementation was dead simple - just a single nc invocation but at the request of getting error codes into the shim services I made this compromise to the previous simplicity. I'm very averse to making this any more complex than it already is. I don't want another ds-identify. Frankly I'd sooner drop the error codes in the shim services or re-write this one-liner in C/Rust/Go (but in preference of shipping code that we already have packaging for I've proposed it like this for now with the intent of potentially rewriting in a different language eventually).

Also, putting this in a file would add a file load and read to the critical chain per-stage.

Agreed, inline-documentation would help a lot. I'm happy to add that.

This meets the majority of my concern, I'm good with your concerns about adding a wrapper layer adding complexity or obscurity where it may not be needed.

If we have comments within this unit that describe the intent of this interaction, then I can back down on the "please make systemctl status cloud-init.service more readable/simple" point.

See the latest commit, which I think addresses the remaining concerns in this PR.

Yes looking great.

cloudinit/cmd/main.py

cloudinit/socket.py

blackboxsw · 2024-07-24T13:24:50Z

cloudinit/socket.py

+        # the second value contains the path to a unix socket on which to
+        #     reply, which is expected to be /path/to/{self.stage}-return.sock
+        sock = self.sockets[self.stage]
+        chunk, self.remote = sock.recvfrom(5)


Historically systemd units have been kicked out of systemd boot goals due to misconfiguraiton ordering cycles (unit dependency chain issues) with other systemd units. Do we want to implement some sort of long timeout exception handling here to not block indefinitely and hard error that "cloud-init didn't start expected X stage in X mins" to unblock the rest of boot?

Historically systemd units have been kicked out of systemd boot goals due to misconfiguraiton ordering cycles (unit dependency chain issues) with other systemd units.
Do we want to implement some sort of long timeout exception handling here to not block indefinitely and hard error that "cloud-init didn't start expected X stage in X mins" to unblock the rest of boot?

This is a condition which should be discovered before the image ever gets shipped, not something that should bite end users.

I really don't think that we want that. What you are referring to is a symptom seen at runtime of a problem that in almost all cases is introduced by a combination of packages that have been installed on the system (build time). Services in systemd can take an unknowable amount of time to complete and a timeout could break a real user of cloud-init that has to deal with long boot times. Such a timeout isn't a correct solution for this problem. The correct solution is to fix the ordering cycle. Adding such a timeout could break real users that have long boot times just to introduce a broken workaround for packagers.

Yes cloud-init will behave differently from before if booted out due to an ordering cycle. However I don't think that this is something that an end user should ever see unless they themselves are messing with the boot order or their OS vendor made a mistake - and even in those cases it still doesn't seem like a problem for cloud-init to attempt to solve at the expense of real users with slow boot times.

This is a condition which should be discovered before the image ever gets shipped, not something that should bite end users.

This does happen sometimes to (image creator) end-users who install supplemental systemd services in their golden images which has resulted in bogus bugs/questions about cloud-init not running right (because it was ejected from the boot goals due to systemd unit dependency conflicts). I was wondering if there are ways we could make that condition a bit more conspicuous, but the symptoms below are probably "good enough" for those image creators to triage and determine what is causing the block in boot.

Here are the symptoms a user would encounter if we don't instrument a timeout of some sort our failure mode in images where cloud-network.service were to be ejected from boot goals due to cyclic dependency ordering the system will be in the following state:

systemctl status` will indefinitely set at "starting" State.

systemctl list-jobs will not show the missing/disabled cloud-network.service but instead show cloud-final.service, cloud-init.target in waiting state and cloud-config.service sitting indefinitely in "running" state with not logs

systemctl status cloud-init.service will show Waiting on external services to complete (network stage)

cloud-init status --format=yaml will show extended_status: running and no start times for init stage in the resulting YAML output.

These are probably reasonable enough breadcrumbs to look into on systems which exhibit this configuration problem and may not warrant further arbitrary timeouts which could adversely affect slow system startup.

This is a condition which should be discovered before the image ever gets shipped, not something that should bite end users.

This does happen sometimes to (image creator) end-users who install supplemental systemd services in their golden images which has resulted in bogus bugs/questions about cloud-init not running right (because it was ejected from the boot goals due to systemd unit dependency conflicts). I was wondering if there are ways we could make that condition a bit more conspicuous, but the symptoms below are probably "good enough" for those image creators to triage and determine what is causing the block in boot.

This can easily be checked with systemd-analyze verify default.target and checking the output, or by checking the journal for messages like: "Breaking ordering cycle by deleting job cloud-init.service".

I can add that to the docs in the appropriate place for anyone that runs into this. I'll make it a separate PR. I've added it to the "future work" list in the PR description.

systemd/cloud-network.service.tmpl

holmanb · 2024-07-24T21:47:23Z

Thanks for the review @setharnold!

So: I suggest not using an old name for a new purpose.

You make good points. I was hopeful that this would be a good name for it, and I still think it is, however you're right that this may make the transition unnecessarily painful. I've changed the name of the primary service to cloud-init-main.service.

One common problem with "multiple-stage" software is when something, or someone, causes the stages to execute out of order. This might not be quite as vulnerable as a TLS implementation in a webserver, but if the different phases have strict ordering requirements, we should have some simple checks for that ordering here.

With these changes, cloud-init can only run these stages in the required order. Cloud-init uses systemd ordering to guarantee that a stage is not started before it is ready, and furthermore will only run each stage once the last one has ran and once it has received the "start" message for the next stage. This PR also includes tests that exercise this ordering functionality, so I think that we already have in place what is being suggested, unless I missed something.

holmanb · 2024-07-24T23:13:48Z

I just rebased on tip of main to fix a unit test that failed due to service rename.

blackboxsw

LGTM!.
Minor requests:

a more informative message than "done" when we are done talking to the socket for a given boot stage
dropping the 2nd handle = declaration
possibly removing down socket files when we've closed() our communication

blackboxsw · 2024-08-02T18:10:39Z

cloudinit/cmd/main.py

+        handler = next(
            filter(
                lambda h: isinstance(h, log.LogExporter), root_logger.handlers
            )
-        ).export_logs()
+        )


Since we are declaring handler before the leading try: block above, we don't need to redeclare this here. I think you can drop these 4 lines now.

blackboxsw · 2024-08-02T18:57:18Z

cloudinit/socket.py

+
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        """Notify the socket that this stage is complete."""
+        message = "done"


Seeing this empty message doesn't look very informative in each unit logs like systemctl show cloud-config.service

... sh[280]: done ... systemd[1]: Finished cloud-init-network.service- Cloud-init Network stage

How about something a bit more specific?

Suggested change

message = "done"

message = f"Completed socket interaction for boot stage {self.stage}"

blackboxsw · 2024-08-02T19:10:40Z

cloudinit/socket.py

+        sock.sendall(
+            f"echo '{message}'; exit {self.systemd_exit_code};".encode()
+        )
+        sock.close()


Once we finish our socket interaction, shouldn't we be removing the associate file paths to avoid leaving them in /run/cloud-init/shared/*sock files?

Suggested change

sock.close()

sock.close()

for socket_path in (self.remote, self.remote.replace("-return", "")):

with suppress(FileNotFoundError):

os.remove(socket_path)

Once we finish our socket interaction, shouldn't we be removing the associate file paths to avoid leaving them in /run/cloud-init/shared/*sock files?

No. Doing this would introduce a race into the socket protocol.

If we were to do that, there would be no guarantee that the receiving process was scheduled in time by the kernel to receive the message before it gets deleted.

We could "wait until the end", and then delete them all, but again we would require some way to ensure that the message was received before deletion.

These files take up very little space and are harmless, so why not just leave them?

blackboxsw · 2024-08-02T19:10:56Z

systemd/cloud-config.service.tmpl

@@ -10,7 +10,7 @@ ConditionEnvironment=!KERNEL_CMDLINE=cloud-init=disabled

 [Service]
 Type=oneshot
-ExecStart=/usr/bin/cloud-init modules --mode=config
+ExecStart=sh -c 'echo "start" | nc.openbsd -Uu -W1 /run/cloud-init/share/config.sock -s /run/cloud-init/share/config-return.sock | sh'


Yes looking great.

Python interpreter initialization and module import time is a significant portion of cloud-init's total runtime when the default configuration is used, and in all cases it contributes a significant amount of wall clock time to cloud-init's runtime. This commit significantly improves cloud-init time to completion by eliminating redundant interpreter starts and module loads. Since multiple cloud-init processes sit in the critical chain of the boot order, this significantly reduces cloud-init's time to ssh and time to completion. Cloud-init has four stages. Each stage starts its own Python interpreter and loads the same libraries. To eliminate the redundant work of starting an interpreter and loading libraries, this changes cloud-init to run as a single process. Systemd service ordering is retained by using the existing cloud-init services as shims which use a synchronization protocol to start each cloud-init stage and to communicate that each stage is complete to the init system. Currently only systemd is supported, but the synchronization protocol should be capable of supporting other init systems as well with minor changes. Note: This makes possible many additional improvements that eliminate redundant work. However, these potential improvements are temporarily ignored. This commit has been structured to minimize the changes required to capture the majority of primary performance savings while preserving correctness and the ability to preserve backwards compatibility. Many additional improvements will be possible once this merges. Synchronization protocol ======================== - create one Unix socket for each systemd service stage - send sd_notify() - For each of the four stages (local, network, config, final): - when init system sends "start" to the Unix socket, start the stage - when running stage is complete, send "done" to Unix socket socket.py (new) --------------- - define a systemd-notify helper function - define a context manager which implements a multi-socket synchronization protocol cloud-init-single.service (new) ------------------------------- - use service type to 'notify' - invoke cloud-init in single process mode - adopt systemd ordering requirements from cloud-init-local.service - adopt KillMode from cloud-final.service main.py ------- - Add command line flag to indicate single process mode - In this mode run each stage followed by an IPC synchronization protocol step cloud-{local,init,config,final}.services ---------------------------------- - change ExecStart to use netcat to connect to Unix socket and: - send a start message - wait for completion response - note: a pure Python equivalent is possible for any downstreams which do not package openbsd's netcat cloud-final.services -------------------- - drop KillMode cloud-init-local.services -------------------- - drop dependencies made redundant by ordering after cloud-init-single.service Performance Results =================== An integration test [1] on a Noble lxd container comparing POC to current release showed significant improvement. In the POC, cloud-config.service didn't register in the critical-chain (~340ms improvement), cloud-init.service added ~378ms to total boot time (~400ms improvement), and cloud-init-local.service had a marginal improvement (~90ms) which was likely in the threshold of noise. The total improvement in this (early stage) test showed a 0.83s improvement to total boot time with 0.66s of boot time remaining due to cloud-init. This 0.83s second improvement roughly corresponds to the total boot time, with systemd-analyze critical-chain reporting 2.267s to reach graphical.target, which is a 0.8s improvement over the current release time. Note: The numbers quoted above gathered from only one series (Noble), one platform (lxc), one host machine (Ryzen 7840U), and run environment was not controlled. I ran the test multiple times to ensure that the results were repeatable, but not enough times to be considered statistically significant. I verified that cloud-init worked as expected, but given the limited scope of this integration test, this is still very much a proof of concept. [1] test_logging.py BREAKING_CHANGE: Run all four cloud-init services as a single systemd service.

…rent progress

Make cloud-network.service map to the cloud-init network stage. Make cloud-init.service map to all of cloud-init. BREAKING CHANGE: Changes the semantics of the cloud-init.service files

- make it such that if one stage fails, the next stage isn't blocked indefinitely - notify the init system of per-stage exit codes and failure messages - make parent process (cloud-init.service) exit with representative exit code

Add a new attribute flag to the argparser Namespace attribute which is used to disable logging. This isn't elegant, but fixing logging is going to be a large refactor so this gets logging "working" for now while minimizing number of LOC changed

skips sync protocol when stdin is a tty

- remove logs duplicated across stages - send the single line traceback to systemd - fix a minor string format in user output

holmanb · 2024-08-02T20:25:22Z

The cc_mounts PR that we just landed introduced a merge conflict with this PR, so I just pushed a change which resolves that.

Python interpreter initialization and module import time contributes a significant amount of wall clock time to cloud-init's runtime (and therefore to total boot time). Cloud-init has four stages. Each stage starts its own Python interpreter and loads the same libraries. To eliminate the redundant work of starting an interpreter and loading libraries, this changes cloud-init to run as a single process. Systemd service ordering is retained by using the existing cloud-init services as shims which use a synchronization protocol to start each cloud-init stage and to communicate that each stage is complete to the init system. Since multiple cloud-init processes sit in the critical chain of starting the system, this reduces boot time (including time to ssh login and time to cloud-init completion). Currently only systemd is supported, but the synchronization protocol should be capable of supporting other init systems as well with minor changes. Note: This enables many additional follow-on improvements that eliminate redundant work. However, these potential improvements are temporarily ignored. This commit has been structured to minimize the changes required to capture the majority of primary performance savings while preserving correctness and the ability to preserve backwards compatibility. Since this changes the semantics of the existing cloud-init unit files, this change takes the opportunity to rename one of its systemd units which causes frequent user confusion. The unit named cloud-init.service is often mistaken by users for being the only cloud-init service, when it is simply one of four stages. This stage is documented as the "network" stage, so this service will be renamed to "cloud-init-network.service". A new notify service is added as part of this implementation which contains the cloud-init process. This unit is named "cloud-init-main.service". Synchronization protocol ======================== - create one Unix socket for each systemd service stage - send sd_notify() - For each of the four stages (local, network, config, final): - when init system sends "start" to the Unix socket, start the stage - when running stage is complete, send "done" to Unix socket File changes ============ socket.py (new) --------------- - define a systemd-notify helper function - define a context manager which implements a multi-socket synchronization protocol cloud-init.service -> cloud-init-network.service (renamed) ---------------------------------------------------------- - renamed to cloud-network.service cloud-{init-local,init-network,config,final}.services ------------------------------------------- - change ExecStart to use netcat to connect to Unix socket and: - send a start message - wait for completion response - note: a pure Python equivalent is possible for any downstreams which do not package openbsd's netcat cloud-init-main.service (new) ----------------------------- - use service type to 'notify' - invoke cloud-init in single process mode - adopt systemd ordering requirements from cloud-init-local.service - adopt KillMode from cloud-final.service main.py ------- - Add command line flag to indicate "all stages" mode - In this mode run each stage followed by an IPC synchronization protocol step cloud-final.services -------------------- - drop KillMode cloud-init-local.services ------------------------- - drop dependencies made redundant by ordering after cloud-init-main.service Performance Impact ================== On Ubuntu 24.04, Python's wall clock start up time as measured with `time python3 -c 'import cloudinit.cmd.main' on a few cloud types: lxc container: 0.256s QEMU machine: 0.300s gce instance: 0.367s ec2 instance: 0.491s This change eliminates x1 this start up time from time to ssh. This change eliminates x3 this start up time from cloud-init's total completion. Total benefit varies based on the platform that the instance is hosted by, but all platforms will measurably benefit from this change. BREAKING_CHANGE: Run all four cloud-init services as a single systemd service.

holmanb force-pushed the holmanb/fastpath-single-proc branch 3 times, most recently from 5920138 to bdc76a4 Compare July 5, 2024 03:24

aciba90 self-requested a review July 5, 2024 16:02

TheRealFalcon reviewed Jul 5, 2024

View reviewed changes

holmanb force-pushed the holmanb/fastpath-single-proc branch 3 times, most recently from 5692c50 to c6e9e4b Compare July 15, 2024 23:11

aciba90 reviewed Jul 17, 2024

View reviewed changes

blackboxsw added packaging Supplemental package review requested security labels Jul 18, 2024

holmanb force-pushed the holmanb/fastpath-single-proc branch 3 times, most recently from ac9e590 to 49bebbf Compare July 23, 2024 21:36

holmanb requested review from aciba90 and TheRealFalcon July 23, 2024 21:37

setharnold approved these changes Jul 24, 2024

View reviewed changes

blackboxsw reviewed Jul 24, 2024

View reviewed changes

github-actions bot added the documentation This Pull Request changes documentation label Jul 24, 2024

holmanb force-pushed the holmanb/fastpath-single-proc branch from 40992ea to ae7a4da Compare July 24, 2024 23:12

holmanb requested a review from blackboxsw July 24, 2024 23:14

holmanb mentioned this pull request Jul 25, 2024

[enhancement]: compiled language wishlist #5552

Open

holmanb requested a review from blackboxsw August 1, 2024 23:42

blackboxsw approved these changes Aug 2, 2024

View reviewed changes

holmanb added 21 commits August 2, 2024 14:06

comments

79e191f

remove redundant services, add TasksMax= from cloud-final

7a62897

use sd_notify to update systemd in case of fatal exception and of cur…

c127fca

…rent progress

Rename cloud-init services to be more intuitive.

3247c11

Make cloud-network.service map to the cloud-init network stage. Make cloud-init.service map to all of cloud-init. BREAKING CHANGE: Changes the semantics of the cloud-init.service files

add templated order to service from cloud-init-local.service

d87be7c

Improve intra-stage error handling

18aa5b3

- make it such that if one stage fails, the next stage isn't blocked indefinitely - notify the init system of per-stage exit codes and failure messages - make parent process (cloud-init.service) exit with representative exit code

Do not set up logger multiple times

5c05690

Add a new attribute flag to the argparser Namespace attribute which is used to disable logging. This isn't elegant, but fixing logging is going to be a large refactor so this gets logging "working" for now while minimizing number of LOC changed

fix commandline (for debugger use)

14ca37f

skips sync protocol when stdin is a tty

rename command to --all-stages

c2079ea

drop unused stdin

f0944d0

comments

417d550

fix new tests

cef0f5e

drop reference to cloud-init.service

7d13021

plumb stage exit codes through to systemd

0848734

format

212f841

plumb substage failures to the exit code of cloud-init-main.service

5089e59

Clean up UI

a053c19

- remove logs duplicated across stages - send the single line traceback to systemd - fix a minor string format in user output

format

9ca7a97

comments

7df7d83

fix merge conflicts

cb7bb25

holmanb force-pushed the holmanb/fastpath-single-proc branch from 0de6016 to cb7bb25 Compare August 2, 2024 20:24

update unit tests

4b7dbbb

holmanb merged commit 143bc9e into canonical:main Aug 2, 2024
23 checks passed

holmanb mentioned this pull request Aug 8, 2024

Update docs #5595

Merged

2 tasks

	message = "done"
	message = f"Completed socket interaction for boot stage {self.stage}"

-        sock.close()
+        sock.close()
+        for socket_path in (self.remote, self.remote.replace("-return", "")):
+            with suppress(FileNotFoundError):
+                os.remove(socket_path)

feat: single process optimization #5489

feat: single process optimization #5489

Conversation

holmanb commented Jul 4, 2024 • edited Loading

Proposed Commit Message

Design

Why use Unix sockets?

Why not just share one socket?

Why openbsd netcat?

Why a new service?

Security implications

Dependencies

File permissions

Todo (this PR)

Future work (followup PRs, required pre-24.3 release)

Rejected names

Appendix

Merge type

a-dubs commented Jul 5, 2024

TheRealFalcon left a comment • edited Loading

Choose a reason for hiding this comment

holmanb commented Jul 15, 2024 • edited Loading

aciba90 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

holmanb Jul 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

holmanb commented Jul 18, 2024

TheRealFalcon commented Jul 18, 2024

holmanb commented Jul 19, 2024 • edited Loading

holmanb commented Jul 23, 2024

setharnold left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blackboxsw Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

holmanb Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

holmanb commented Jul 24, 2024 • edited Loading

holmanb commented Jul 24, 2024

blackboxsw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

holmanb commented Aug 2, 2024

holmanb commented Jul 4, 2024 •

edited

Loading

TheRealFalcon left a comment •

edited

Loading

holmanb commented Jul 15, 2024 •

edited

Loading

holmanb Jul 18, 2024 •

edited

Loading

holmanb commented Jul 19, 2024 •

edited

Loading

blackboxsw Jul 24, 2024 •

edited

Loading

holmanb Jul 25, 2024 •

edited

Loading

holmanb commented Jul 24, 2024 •

edited

Loading