Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(set_passwords): Run module in Network stage #5395

Merged
merged 1 commit into from
Jun 20, 2024

Conversation

holmanb
Copy link
Member

@holmanb holmanb commented Jun 8, 2024

Proposed Commit Message

perf(set_passwords): Run module in Network stage

Cloud-init blocks login until Config stage completes[1] to prevent users
from connecting to the instance via ssh prior to user configuration[2].
To enable faster ssh, cloud-init can simply move the set_passwords
module sooner in boot. This follows precedent of previous races of this
kind[3], and it effectively reverts b3c9b6a79c85e and while moving the set_passwords
module into Network stage.

The snap and grub_dpkg modules take a significant amount of time due to
runtime of external commands. This should improve time to ssh for all
users of those modules.

This will make both chpasswd and passwd run earlier in boot than before.
This should be safe, since chpasswd and passwd both use PAM, which to my
knowledge just requires r/w access to /etc/.


Probably the biggest noticeable effect will be for snapd users, which will
no longer have to wait an extra ~13s for snapd to start before they can ssh
into the instance.

graphical.target @27.796s
└─multi-user.target @27.796s
  └─snapd.seeded.service @14.658s +13.135s
    └─basic.target @14.207s
      └─sockets.target @14.197s
        └─snapd.socket @14.121s +61ms
          └─sysinit.target @14.006s
            └─cloud-init.service @11.135s +2.506s
              └─systemd-networkd-wait-online.service @9.670s +1.448s
                └─systemd-networkd.service @9.575s +71ms
                  └─network-pre.target @9.561s
                    └─cloud-init-local.service @5.564s +3.983s
                      └─systemd-remount-fs.service @1.581s +150ms
                        └─systemd-fsck-root.service @1.299s +195ms
                          └─systemd-journald.socket @1.011s
                            └─-.mount @880ms
                              └─-.slice @880ms.

[1] https://github.com/canonical/cloud-init/pull/2111
[2] https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/2013403
[3] https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/781101

Additional details

Test Steps

tox -e integration-tests -- tests/integration_tests/modules/test_set_password.py                          

Cloud-init blocks login until Config stage completes[1] to prevent users
from connecting to the instance via ssh prior to user configuration[2].
To enable faster ssh, cloud-init can simply move the set_passwords
module sooner in boot. This follows precedent of previous races of this
kind[3].

The grub_dpkg module sometimes takes a long time to run, so this
should improve time to ssh in that case and some others.

This will make both chpasswd and passwd run earlier in boot than before.
This should be safe, since chpasswd and passwd both use PAM, which to my
knowledge just requires r/w access to /etc/.

This effectively reverts b3c9b6a and moves the set_passwords
module into Network stage.

[1] canonical#2111
[2] https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/2013403
[3] https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/781101
@TheRealFalcon
Copy link
Member

LGTM, but since this is reordering services we should get a review from an SRU reviewer

@holmanb
Copy link
Member Author

holmanb commented Jun 8, 2024

@rbasak would you please review? I'm not sure about whether this would be SRU-able. This is a change in (service ordering)behavior, but the change is gated by a configuration setting change which the user may choose to react to.

Copy link
Contributor

@aciba90 aciba90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

This change would improve boot performance across all clouds, and as such, I would vote to SRU it.

@toabctl
Copy link
Member

toabctl commented Jun 12, 2024

@holmanb I think you wanted to ping @basak (not rbasak)

@basak
Copy link
Contributor

basak commented Jun 12, 2024

IMHO this is a feature change—the feature being "performance". Feature changes are explicitly permitted under the current exception, and users are expected to be protected by the existing QA process. Wearing my SRU hat then I think this is fine.

It's certainly worth thinking about how to ensure its correctness and minimising regression risk though. Given that reordering modules carries more risk of unexpected interactions than normal, it probably also falls under the "major changes should be called out in the SRU template" requirement from the exception.

I suppose there's a risk that an existing user is relying on a window prior to a password change occurring using old credentials, but I think that's a race condition anyway and not reasonable for us to maintain behaviour for. Perhaps that's also worth mentioning in the SRU documentation.

This will make both chpasswd and passwd run earlier in boot than before. This should be safe, since chpasswd and passwd both use PAM, which to my knowledge just requires r/w access to /etc/.

I agree but am happy that you thought about and documented this! If you haven't already it's worth checking that there's no expected interaction with pam_nologin by manually activating it and checking password changes still work (and separately for root normal user cases), but I see nothing in the configuration that suggests that it would.

@basak
Copy link
Contributor

basak commented Jun 12, 2024

I forgot to mention that therefore my conclusion is that most of the regression risk is in the specifics within cloud-init (eg. how modules interact or what's available to modules in which stage, etc) rather than in the interaction with the system.

@catmsred
Copy link
Collaborator

I agree with @aciba90 -- this will be a significant performance improvement that should be SRU'd and I can't think of a situation where one would be relying on cloud-init running set_passwords later.

@holmanb
Copy link
Member Author

holmanb commented Jun 17, 2024

Thanks for the review @basak.

This will make both chpasswd and passwd run earlier in boot than before. This should be safe, since chpasswd and passwd both use PAM, which to my knowledge just requires r/w access to /etc/.

I agree but am happy that you thought about and documented this! If you haven't already it's worth checking that there's no expected interaction with pam_nologin by manually activating it and checking password changes still work (and separately for root normal user cases), but I see nothing in the configuration that suggests that it would.

Both before an after this change, cloud-init's use of chpasswd and passwd is ordered before systemd-user-sessions.service. Since systemd-user-sessions.service controls pam_nologin (via /run/nologin), I wouldn't expect this to fail. That said, I'll do a test to double check this behavior before merging.

@holmanb
Copy link
Member Author

holmanb commented Jun 17, 2024

I'll do a test to double check this behavior before merging.

Results below:

lxc init ubuntu:noble c1 --vm
lxc start c1
lxc shell c1
$ passwd ubuntu
# set passwd...

in another terminal confirm that login is possible

lxc console c1
c1 login: ubuntu
Password: 
Welcome to Ubuntu 24.04 LTS (GNU/Linux 6.8.0-35-generic x86_64)
...

now set pam_nologin

root@c1:~# touch /run/nologin

and confirm that pam_nologin is set:

c1 login: ubuntu

Login incorrect
c1 login: 

now that we've confirmed no_login is set, try modifying shadow via pam:

root@c1:~# passwd ubuntu
New password: 
Retype new password: 
passwd: password updated successfully
root@c1:~# echo "ubuntu:Ubuntu" | chpasswd
ubuntu@c1:/root$ sudo -s
root@c1:~# exit
exit
ubuntu@c1:/root$ passwd
Changing password for ubuntu.
Current password: 
New password: 
Retype new password: 
passwd: password updated successfully

All appears to work as expected with pam_nologin set

Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@blackboxsw
Copy link
Collaborator

I also just exercised locale setting both C, C.utf8, ja_JP.utf8 etc with a password provided to ensure there was no side-effect passwords with and without unicode characters with set-passwords was ordered before - locale set or generation. Just trying to poke around to ensure that system locale afterwards doesn't affect provided password being in system config or user-data.

@blackboxsw
Copy link
Collaborator

With something like this landing in tip of main, we'll also have to update our quilt patches in stable downstream which currently revert the upstream patch which changed this behavior to cope with this changeset in tip of main.
https://github.com/canonical/cloud-init/blob/ubuntu/jammy/debian/patches/revert-551f560d-cloud-config-after-snap-seeding.patch

Before we decide to SRU this ordering change, let's also reflect this changeset to CC: @enr0n as well to ensure he is aware of changesets to systemd ordering in case something interacts with ongoing foundations work.

@enr0n
Copy link

enr0n commented Jun 19, 2024

With something like this landing in tip of main, we'll also have to update our quilt patches in stable downstream which currently revert the upstream patch which changed this behavior to cope with this changeset in tip of main. https://github.com/canonical/cloud-init/blob/ubuntu/jammy/debian/patches/revert-551f560d-cloud-config-after-snap-seeding.patch

Before we decide to SRU this ordering change, let's also reflect this changeset to CC: @enr0n as well to ensure he is aware of changesets to systemd ordering in case something interacts with ongoing foundations work.

Thanks for the heads up. There is no ongoing work that I'm aware of which would be affected by this.

@holmanb
Copy link
Member Author

holmanb commented Jun 19, 2024

we'll also have to update our quilt patches in stable downstream which

@blackboxsw what updates are you thinking besides a patch apply conflict?

@blackboxsw
Copy link
Collaborator

we'll also have to update our quilt patches in stable downstream which

@blackboxsw what updates are you thinking besides a patch apply conflict?

@holmanb I was only thinking that we previously carried a couple of downstream patches we may need to audit if we decide to SRU this feature related to bootspeed to stable releases as we still have a couple of outstanding systemd ordering reverts in place to "retain original behavior" that would likely impact any bootspped results we hope to see on those stable releases: The patches I'm thinking about under consideration in the jammy branch would be

  • /patches/do-not-block-user-login.patch: which reverted the upstream change which ordered cloud-config.service Before=systemd-user-sessions.service
  • debian/patches/revert-551f560d-cloud-config-after-snap-seeding.patch

It's possible that there are minor quilt patch conflicts with the above patches that can easily be resolved. It's not necessary that we drop these patches, but something we should evaluate if we are pushing towards better boot performance on stable releases.

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the general approach here. We can sort downstream stable behavior related to quilt patches separately when those pull requests are under review.

@holmanb holmanb merged commit 4981ea9 into canonical:main Jun 20, 2024
23 checks passed
holmanb added a commit that referenced this pull request Jun 28, 2024
Cloud-init blocks login until Config stage completes[1] to prevent users
from connecting to the instance via ssh prior to user configuration[2].
To enable faster ssh, cloud-init can simply move the set_passwords
module sooner in boot. This follows precedent of previous races of this
kind[3], and it effectively reverts b3c9b6a and while moving the set_passwords
module into Network stage.

The snap and grub_dpkg modules take a significant amount of time due to
runtime of external commands. This should improve time to ssh for all
users of those modules.

This will make both chpasswd and passwd run earlier in boot than before.
This should be safe, since chpasswd and passwd both use PAM, which to my
knowledge just requires r/w access to /etc/.


Probably the biggest noticeable effect will be for snapd users, which will
no longer have to wait an extra ~13s for snapd to start before they can ssh
into the instance.

graphical.target @27.796s
└─multi-user.target @27.796s
  └─snapd.seeded.service @14.658s +13.135s
    └─basic.target @14.207s
      └─sockets.target @14.197s
        └─snapd.socket @14.121s +61ms
          └─sysinit.target @14.006s
            └─cloud-init.service @11.135s +2.506s
              └─systemd-networkd-wait-online.service @9.670s +1.448s
                └─systemd-networkd.service @9.575s +71ms
                  └─network-pre.target @9.561s
                    └─cloud-init-local.service @5.564s +3.983s
                      └─systemd-remount-fs.service @1.581s +150ms
                        └─systemd-fsck-root.service @1.299s +195ms
                          └─systemd-journald.socket @1.011s
                            └─-.mount @880ms
                              └─-.slice @880ms.

[1] #2111
[2] https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/2013403
[3] https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/781101
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants