-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move disable checks out of systemd generator #4399
Conversation
4b6acf1
to
6b082b9
Compare
@smoser @raharper @ani-sinha @esposem Given the invasive nature this change, I am requesting feedback from you directly on this pull request. A few specific things I'd like to draw attention to:
|
This needs to be tested. We are going to test this internally.
|
Thanks for looking into this; it was certainly non-obvious why adding In the open issue w.r.t Azure+Packer -- I think we've come to the conclusion The documentation works AFAICT in the absence of that packer scenario: Using the touch method, for example:
Yes.
I don't think so. The policy is primarily about controlling when cloud-init
Generally, the additional conditions in the Unit file is a workaround for not Historically there were some Distros which moved to cloud-init releases with IIUC, all of the systemd files only become added as systemd Jobs if Distro variants may yet still want to render in the conditions into each unit;
+1
+1 Generally making the generator take less time is valuable since |
This change does not work. This is the feedback from our QE:
|
For the doc issue I sent the PR #4406 . |
Do you have cloud-init logs from this run?
You're suggesting that it should be 'disable', present tense, not 'disabled', past tense? Where in the cloud-init repo does it check for value 'disable' versus 'disabled'? https://github.com/search?q=repo%3Acanonical%2Fcloud-init+%27disable%27&type=code suggests only the 'byobu' module looks for A longer look for "disable" has more hits[2] but none of them look like kernel command expects the present tense, 'disable'. |
You got it all wrong. Please see my PR #4406 |
Your comment quoted,
includes a typo W.r.t to the other requests that are relevant to this PR, do you have any logs from those runs? |
|
@raharper @ani-sinha Thanks for the feedback and discussion on this PR. It seems sentiment so far is that the changes are agreeable, with the exception being the relocation of the service conditionals to the target file. Regarding that:
This was my understanding as well, but testing this now I see that the PR as it is fails to disable, as @ani-sinha reported. Re-adding these lines to the service files disables cloud-init as expected. I'll re-add the conditions to the service files as distro-independent conditions. It should be functionally equivalent, so I'll leave the conditions in the target file so that users can see why cloud-init is disabled in the output of Output with conditionals in target file:
Output without conditionals in target file:
|
From what I recall, systems which did not use ds-identify would fully install the services; that is the generator was effectively disabled and cloud-init*.service always ran no matter what; which prompted the inclusion of the conditionals. With this style of install, ds-identify/generator doesn't matter as systemd itself is going to run these units on its own ( If a distro does not A few things to confirm. In @ani-sinha test, were the cloud-init units already activated (via the systemctl enable like in the specfile I link)? If so, then I believe the cloud-init.target won't affect whether the cloud-init services run since in the multi-user.wants dir, it isn't just cloud-init.target, but rather all 4 services linked. To have cloud-init.target control the execution of the service units, cloud-init.target and only cloud-init target can be enabled (dynamically, or statically). That is, the upstream spec file can instead do:
Or allow the generator to check if cloud-init.target should be added to multi-user.target. In either case here, if cloud-init.target conditionals fail, then the other cloud-init services should not run either. If that is the case, then the conditional tests will need to remain in the generator to avoid adding cloud-init.target to multi-user.target.wants to prevent the services from running. |
I don't think this is a concern. See the enabled cloud-init.target is disabled on reboot below due to the condition check.
|
Yes. That is correct. My changes are here: I think we need to keep the check on individual unit files. Otherwise, with only in |
Excellent! That was what I expecting but wasn't 100% sure. Thank you for testing. |
This is my plan for now with this PR.
The failing unit tests are a result of fixing a bug that I uncovered while testing this code out: Currently cloud-init supports disabling cloud-init via environment variable
@ani-sinha Many thanks for the engagement and feedback! |
49c56bb
to
d854314
Compare
d854314
to
07965a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving a few comments to help reviewers understand the generator changes.
@@ -42,105 +42,47 @@ debug() { | |||
echo "$@" >> "$LOG" | |||
} | |||
|
|||
etc_file() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now handled by ConditionPathExists.
return 0 | ||
} | ||
|
||
read_proc_cmdline() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both container and non-container cases are handled by ConditionKernelCommandline
|
||
kernel_cmdline() { | ||
local cmdline="" tok="" | ||
if [ -n "${KERNEL_CMDLINE+x}" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KERNEL_CMDLINE
is now handled by ConditionEnvironment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. It is not handled from a generator perspective. If KERNEL_CMDLINE=cloud-init=disabled
exists there is really nothing for the generator to do and it should just exit as systemd will handle the rest via the ConditionEnvironment setting. But this does not apply to the generator and as such dsidentify is still run, which we shouldn't really do if the user disabled cloud-init.
I think the generator should specifically look for KERNEL_CMDLINE=cloud-init=disabled
and if found exit and do nothing else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is addressed in the latest commit.
_RET="$ENABLE" | ||
} | ||
|
||
check_for_datasource() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to define a function that gets called once. Function definition and subsequent lookup at invocation time have costs. Inline it.
|
||
debug 1 "$0 normal=$normal_d early=$early_d late=$late_d" | ||
debug 2 "$0 $*" | ||
|
||
local search result="error" ret="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting result
to $ENABLE
or $DISABLE
is unnecessary to accomplish this logic and adds a variable (which adds lookup time) and unnecessary branches (more time cost) and IMO is harder to read. Shift code around to avoid these conditionals and variable usage.
TL;DR: Do the same thing, with less.
systemd/cloud-init-generator.tmpl
Outdated
debug 1 "ds-identify rc=$ds" | ||
|
||
if [ "$ds" = "1" ]; then | ||
debug 1 "cloud-init is enabled but no datasource found, disabling" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a cloud-init is installed but disabled, the misleading message cloud-init is enabled but no datasource found, disabling
will now be printed by the generator on boot, and ds-identify will output misleading messages attempting to detect clouds.
ds-identify will likely write incorrect /run/cloud-init/cloud.cfg configuration as if it was enabling datasources.
$RUN_ENABLED_FILE
(/run/cloud-init/cloud-init.enabled
/.disabled
) will also be set to the wrong state. I'm not sure if this file is used for anything since dhclient hooks were removed in #2015, but it could be used as an indicator for other external scripts. cloud-init.enabled also doesn't appear to be created when not using the generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message will only be printed if no datasource is found. This is still misleading, no question, but it is not unconditional. In the case of a "happy path", i.e. dsidentify runs and finds a data source the message will not be printed.
That said if cloud-init is disabled, why run dsitentify at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this text in /run/cloud-init/cloud-init-generator.log is not machine readable, and I don't think we have scripts parsing and relying on the output of this file, I'd like us to specialize that message when cloud-init is disabled. Then we better represent the actual state during generator time-frame when we see cloud-init disabled and don't have to fallback to querying cloud-init status or systemctl status cloud-init.target etc to get that info.
Having the cloud-init package installed but disabled will become very expensive, as ds-identify will now still run and attempt to detect clouds. On every boot ds-identify will run expensive hardware DMI detection logic as a systemd generator that synchronously blocks all of boot. This performance impact will likely dwarf any small changes from checking a few flags on each boot. Did the benchmarking include deleting the cloud-init cached configuration from /run, as it would not be present on boot? ds-identify skips cloud detection checks when the result is cached. Generators also run any time systemctl daemon-reload is triggered, though in that case ds-identify should at least use the cached result. I think the changes to cloud-init-generator should be reverted, or ds-identify needs to check the disabled state flags before it tries to detect clouds. I also suspect there might be more to be gained by conditionally skipping some of the logic in collect_info from ds_identify than changes to the generator itself. Another option might be to have cloud-init-generator (and ds-identify) just run as a unit before cloud-init-local.service and check the conditions as part of the unit file? Enqueuing additional services during boot using systemctl appears to work, so long as they aren't waited on. This would also allow ds-identify to read configuration from other filesystems. This wouldn't cover new cloud-init package installation having the generator run by systemctl daemon-reload though. |
Thanks @holmanb for the refactors here and insights.
Reducing the number of What I suggest is the following: Per @antcodd's suggestion, ds-identify can perform it's initial disabled check earlier than DMI reads. I think we can do that with with a different exit code(2) that will also resolve @antcodd's other concern about misleading messages coming from the generator, as the generator can check exit code 2 and report "cloud-init disabled due to kernel commandline or etc_file". Here's a suggested proposal for discussion, what his does is gives ds-identify a quick exit when disabled, leaves the generator "simple" without the duplicated kernel cmdline parsing based on container/non-container environments and give avoid enabling systemd units/services in a system when cloud-init disabled by the environment. From b40986fb29c7177e589517a6f6350b8f5fe4ccd2 Mon Sep 17 00:00:00 2001
From: Chad Smith <[email protected]>
Date: Tue, 26 Sep 2023 12:49:16 -0600
Subject: [PATCH] ds-identify: exit 2 on disabled state from marker or cmdline
The cloud-init-generator no longer checks disabled state of
cloud-init from the kernel command line or marker file in
/etc/cloud/cloud-init.disabled.
This means ds-identify will be called during systemd generator
even in disabled cases.
Given that ds-identify parses the kernel commandline for datasource
discovery, allow it to check whether cloud-init=disabled is present
in kernel command line, KERNEL_CMDLINE or
/etc/cloud/cloud-init.disabled.
Add a new exit code(2) from ds-identify to allow
cloud-init-generator to allow differentation of disabled state due
to environment artifacts versus disabled due to no detected
datasources.
Additionally, factor the minimum operations needed for detecting
if cloud-init is disabled to earlier in ds-identify to allow an early
exit before more costly DMI read operations in collect_info.
There is no need to perform all collect_info in ds-identify when
cloud-init is fully disabled by commandline or
/etc/cloud/cloud-init.disabled.
---
systemd/cloud-init-generator.tmpl | 8 ++++++--
tests/unittests/test_ds_identify.py | 4 ++++
tools/ds-identify | 26 +++++++++++++++++++++++---
3 files changed, 33 insertions(+), 5 deletions(-)
diff --git a/systemd/cloud-init-generator.tmpl b/systemd/cloud-init-generator.tmpl
index 5999fc5d2..3c9ca1695 100644
--- a/systemd/cloud-init-generator.tmpl
+++ b/systemd/cloud-init-generator.tmpl
@@ -64,8 +64,12 @@ main() {
ds=$?
debug 1 "ds-identify rc=$ds"
- if [ "$ds" = "1" ]; then
- debug 1 "cloud-init is enabled but no datasource found, disabling"
+ if [ "$ds" = "1" -o "$ds" = "2" ]; then
+ if [ "$ds" = "1" ]; then
+ debug 1 "cloud-init is enabled but no datasource found, disabling"
+ else
+ debug 1 "cloud-init is disabled by kernel commandline or etc_file"
+ fi
if [ -f "$link_path" ]; then
if rm -f "$link_path"; then
debug 1 "disabled. removed existing $link_path"
diff --git a/tests/unittests/test_ds_identify.py b/tests/unittests/test_ds_identify.py
index 66dae60e4..200c488c6 100644
--- a/tests/unittests/test_ds_identify.py
+++ b/tests/unittests/test_ds_identify.py
@@ -329,6 +329,10 @@ class DsIdentifyBase(CiTestCase):
"ret": 1,
"err": "No dmidecode program. ERROR.",
},
+ {
+ "name": "is_disabled",
+ "ret": 1,
+ },
{
"name": "get_kenv_field",
"ret": 1,
diff --git a/tools/ds-identify b/tools/ds-identify
index 8b5229d8e..babd1f43b 100755
--- a/tools/ds-identify
+++ b/tools/ds-identify
@@ -1039,6 +1039,23 @@ has_ovf_cdrom() {
return 1
}
+is_disabled() {
+ if [ -f /etc/cloud/cloud-init.disabled ]; then
+ debug 1 "disabled by marker file /etc/cloud-init.disabled"
+ return 0
+ fi
+ if [ "${KERNEL_CMDLINE:-}" = "cloud-init=disabled" ]; then
+ debug 1 "disabled by KERNEL_CMDLINE environment variable"
+ return 0
+ fi
+ case "$DI_KERNEL_CMDLINE" in
+ *cloud-init=disabled*)
+ debug 1 "disabled by kernel command line cloud-init=disabled"
+ return 0
+ esac
+ return 1
+}
+
dscheck_OVF() {
check_seed_dir ovf ovf-env.xml && return "${DS_FOUND}"
@@ -1517,10 +1534,7 @@ dscheck_VMware() {
}
collect_info() {
- read_uname_info
- read_virt
read_pid1_product_name
- read_kernel_cmdline
read_config
read_datasource_list
read_dmi_sys_vendor
@@ -1795,6 +1809,12 @@ _main() {
read_uptime
debug 1 "[up ${_RET}s]" "ds-identify $*"
+ read_uname_info
+ read_virt
+ read_kernel_cmdline
+ if is_disabled; then
+ return 2
+ fi
collect_info
if [ "$DI_LOG" = "stderr" ]; then
--
2.39.2
|
Also note on Ubuntu systems we are generally seeing 15 calls to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking request changes to discuss whether we can push disable awareness from kernel commandline in ds-identify since it has to parse kernel cmdline anyway in the case that cloud-init is enabled.
ConditionPathExists=!/etc/cloud/cloud-init.disabled | ||
ConditionKernelCommandLine=!cloud-init=disabled | ||
ConditionEnvironment=!KERNEL_CMDLINE=cloud-init=disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these additional condition checks for visibility at systemctl status cloud-init.target level, but this duplicates conditions we've decided to keep on each subordinate cloud-*.service
file and we don't strictly need them if we are going to preserve the checks on the individual units. Can we measure if there is a cost to these duplicated conditions? It does make systemd dependency resolver do a little but more as far as checks for this target unit file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll measure and report back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blackboxsw see the section in the PR description titled "Measuring with and without conditionals in .target" for the numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @holmanb. This looks good to me as well as your assessment above. Also I triggered those compound conditionals directly on the systemd-analyze conditional subcommand with comparable results. The avg time seems to be +- one millisecond with or without the 3 extra checks.
root@schema-m:~# time systemd-analyze condition 'ConditionKernelCommandLine=!cloud-init=disabled'
test.service: ConditionKernelCommandLine=!cloud-init=disabled succeeded.
Conditions succeeded.
real 0m0.007s
user 0m0.000s
sys 0m0.007s
root@schema-m:~# time systemd-analyze condition 'ConditionKernelCommandLine=!cloud-init=disabled' 'ConditionEnvironment=!KERNEL_CMDLINE=cloud-init=disabled' 'ConditionPathExists=!/etc/cloud/cloud-init.disabled' 'ConditionKernelCommandLine=!cloud-init=disabled' 'ConditionEnvironment=!KERNEL_CMDLINE=cloud-init=disabled'
test.service: ConditionEnvironment=!KERNEL_CMDLINE=cloud-init=disabled succeeded.
test.service: ConditionKernelCommandLine=!cloud-init=disabled succeeded.
test.service: ConditionPathExists=!/etc/cloud/cloud-init.disabled succeeded.
test.service: ConditionEnvironment=!KERNEL_CMDLINE=cloud-init=disabled succeeded.
test.service: ConditionKernelCommandLine=!cloud-init=disabled succeeded.
Conditions succeeded.
real 0m0.007s
user 0m0.000s
sys 0m0.007s
systemd/cloud-init-generator.tmpl
Outdated
debug 1 "ds-identify rc=$ds" | ||
|
||
if [ "$ds" = "1" ]; then | ||
debug 1 "cloud-init is enabled but no datasource found, disabling" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this text in /run/cloud-init/cloud-init-generator.log is not machine readable, and I don't think we have scripts parsing and relying on the output of this file, I'd like us to specialize that message when cloud-init is disabled. Then we better represent the actual state during generator time-frame when we see cloud-init disabled and don't have to fallback to querying cloud-init status or systemctl status cloud-init.target etc to get that info.
Agreed, DMI reading is probably a fair amount of cost in ds-identify that we should probably avoid if we know we are disabled.
Most of ds-identify's caching is really just checks on whether variables have been defined and if so, return the cached var name. It's not disk caching that we are talking about here. The benchmarking is going to be highly platform dependent as it does go though and source DMI vars on the host in which the benchmark it is running
I disagree with the first statement as long as we can conditionally skip logic down in |
This seems like a fair trade-off. Since ds-identify already gathers this information it shouldn't be a measurable extra cost to the non-disabled case, and a clear improvement for the disabled case. This is a reasonable tradeoff.
This is a slight UI change to
@blackboxsw If you can push your branch up to Github or share the formatted patch (
Nice instrumentation! Now I'm curious now which other processes are using it. |
I've updated the inline patch above w/ a more specific commit msg and also pushed https://github.com/blackboxsw/cloud-init/tree/holmanb/do-less-in-generator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minor nit on integration test but you don't have to hold up the PR on that. Also, please provide the (#4399)
suffix to each commit if rebase merging.
def restart_cloud_init(c): | ||
client = c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:non-blocking why not just pass in a variable named client and use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since client
is a fixture, I didn't want to use that name because 1) I don't know if a fixture would be invoked (non-test functions can get fixtures invoked, for example a fixture that invokes other fixtures) and 2) even if 1 doesn't occur, it might confuse a reader who might assume that it does. This is more clear in my mind, though I guess I could have just picked a different variable name.
…onical#4399) This feature was implemented in df2d690.
- Make all distros use systemd's kernel commandline and file conditionals - Add ConditionEnvironment to all services and targets - Add conditions to cloud-init.target cloud-init-generator implements these conditional checks, but these can be implemented in cloud-init service and target files at lower cost.
- inline function check_for_datasource() - eliminate unnecessary variable assignments and branches - eliminate enablement checks (made redundant by systemd conditionals)
…al#4399) The cloud-init-generator no longer checks disabled state of cloud-init from the kernel command line or marker file in /etc/cloud/cloud-init.disabled. This means ds-identify will be called during systemd generator even in disabled cases. Given that ds-identify parses the kernel commandline for datasource discovery, allow it to check whether cloud-init=disabled is present in kernel command line, KERNEL_CMDLINE or /etc/cloud/cloud-init.disabled. Add a new exit code(2) from ds-identify to allow cloud-init-generator to allow differentation of disabled state due to environment artifacts versus disabled due to no detected datasources. Additionally, factor the minimum operations needed for detecting if cloud-init is disabled to earlier in ds-identify to allow an early exit before more costly DMI read operations in collect_info. There is no need to perform all collect_info in ds-identify when cloud-init is fully disabled by commandline or /etc/cloud/cloud-init.disabled.
1b5d9b4
to
8ab927a
Compare
Add missing cloud-init disabled state which fixes `cloud-init status --wait` never completing when cloud-init is disabled via KERNEL_CMDLINE.
This feature was implemented in df2d690.
- Make all distros use systemd's kernel commandline and file conditionals - Add ConditionEnvironment to all services and targets - Add conditions to cloud-init.target cloud-init-generator implements these conditional checks, but these can be implemented in cloud-init service and target files at lower cost.
- inline function check_for_datasource() - eliminate unnecessary variable assignments and branches - eliminate enablement checks (made redundant by systemd conditionals)
Please do not squash merge this PR.
The commits in this PR are written separately with the intent of being added to the git tree individually.
Old squashed commit message
Behavior changes:
Additional Details
Motivations: eliminate redundant code, performance improvements (do less in systemd generator, which blocks all of boot), standardize behavior across distros
The current cloud-init docs that describe how to disable cloud-init are incomplete, since ds-identify policy may take precedence over the disable file, the kernel commandline and environment variable arguments. This is not ideal, as ds-identify.cfg is an undocumented and less user-friendly user interface for a common user operation.
Assuming that we want cloud-init to behave as the docs describe (i.e.:
touch /etc/cloud/cloud-init.disabled
actually disables cloud-init), this PR fixes that discrepancy between docs and behavior.Performance Implications
As @blackboxsw mentioned out of band:
and some of my own thoughts:
Benchmarking the systemd generator shows a 2-3x performance improvement. However, this appears negligible since
systemd-analyze critical-chain
shows identical times for both (below) - and some work does get shifted from the generator to systemd itself in this change (later in boot).Generator Benchmark: this branch
Generator Benchmark: tip of main with just benchmark commit applied (6cf53172a)
systemd-analyze critical chain
note: I think that the time is since systemd starts, but if a better methodology exists, feedback is welcome
with this change:
from tip of main:
Since the root slice starts at 147ms for both tip of main and this branch, this change doesn't appear to provide a performance regression.
Measuring with and without conditionals in .target
From a different (slower) machine (don't compare the times below with the times above), I compare this branch:
against an otherwise identical branch that drops f310f9a:
I'd be surprised if this conditional to the target is actually an optimization that would save 0.2 seconds, so I assume that the measurable difference due to the target conditional is within the noise - not something to be concerned about.
Test Steps
Systemd environment conditional test:
Integration Tests: