-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package snap #4202
Package snap #4202
Conversation
@@ -0,0 +1,213 @@ | |||
import fcntl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this file is taken directly from debian.py
self.environment["DEBIAN_FRONTEND"] = "noninteractive" | ||
|
||
@classmethod | ||
def from_config(cls, runner: helpers.Runners, cfg: Mapping) -> "Apt": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it this way to make it a little clearer exactly what the class dependencies are in the __init__
rather than just some cfg magic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it made me re-read what runners were. I get that we'd want semaphore files around related to some expensive PackageManager interactions. +1 on a clear func signature/factory to represent that.
@@ -134,9 +151,76 @@ def _unpickle(self, ci_pkl_version: int) -> None: | |||
# missing expected instance state otherwise. | |||
self.networking = self.networking_cls() | |||
|
|||
@abc.abstractmethod | |||
def _extract_package_by_manager( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These next two methods still feel overly complicated to me. I'm open to any ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed w/ PackageManager subclasses and registration "global" that gives us the ability to define specialized package managers in the future for flatpak, yum, zypper, dnf apk pkg pacman emerge etc if needed. Given that some of the package managers are distro-specific, it may not make sense to refactor the pkg interaction to this layer in all cases. But, if we are interacting with a package manager that is supported on multiple distributions/platforms I like this common API layer for interacting w/ the available package managers as well as a distributions ability to define which pkg mgr to prioritize by default.
As far as your example yaml I think we'd look to something like the following instead of nested lists of lists. Your code is treating it as a dict where key == pkgmgr so I think it's just a typo.
Do we want the user-data to override default distro package pkg manager priority? Maybe not because folks can explicitly use packages: <preferred_pkg_mgr>: [list, of, pkgs]
to make that opinionated decisions without any supplemental user-data config options.
Please do document and schemify the undoc'd apt_* config options that seem to have slipped through the cracks.
packages:
apt:
- pkg1
- pkg2
snap:
- snap1
- snap2
```
from cloudinit import helpers | ||
|
||
UninstalledPackages = List[str] | ||
known_package_managers: MutableMapping[str, Type["PackageManager"]] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshed, why would we not hang this as a class attribute maybe name PackageManager.known_managers
? It seems to relate the the abstract class anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use it in distros/__init__.py
, and referencing a class attribute directly from outside of the class just feels a little weirder to me. I think that having it outside the class makes it feel a little more publicly accessible.
Not a strong opinion though.
known_package_managers: MutableMapping[str, Type["PackageManager"]] = {} | ||
|
||
|
||
class PackageManager(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed w/ PackageManager subclasses and registration "global" that gives us the ability to define specialized package managers in the future for flatpak, yum, zypper, dnf apk pkg pacman emerge etc if needed. Given that some of the package managers are distro-specific, it may not make sense to refactor the pkg interaction to this layer in all cases. But, if we are interacting with a package manager that is supported on multiple distributions/platforms I like this common API layer for interacting w/ the available package managers as well as a distributions ability to define which pkg mgr to prioritize by default.
self.environment["DEBIAN_FRONTEND"] = "noninteractive" | ||
|
||
@classmethod | ||
def from_config(cls, runner: helpers.Runners, cfg: Mapping) -> "Apt": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it made me re-read what runners were. I get that we'd want semaphore files around related to some expensive PackageManager interactions. +1 on a clear func signature/factory to represent that.
def from_config(cls, runner: helpers.Runners, cfg: Mapping) -> "Apt": | ||
return Apt( | ||
runner, | ||
apt_get_wrapper_command=get_apt_wrapper(cfg.get("apt_get_wrapper")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoa are we missing schema defintions and docs on these config options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On closer insepction, this can actually only be specified in the system_info
section of /etc/cloud/cloud.cfg, so I updated the docs accordingly.
|
||
def install_packages(self, pkglist: Iterable[str]) -> UninstalledPackages: | ||
self.update_package_sources() | ||
unavailable = self.get_unavailable_packages(pkglist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While not thrilled about the cost of this lookup function I get that it improves our error handling and fallback to other package managers... Given that apt-get install X Y NOT_PRESENT doesn't give us the ability to optionally install X and Y if NOT_PRESENT isn't available, this is the best option to look before we leap.
to_install = [p for p in pkglist if p not in unavailable] | ||
if to_install: | ||
self.run_package_command("install", pkgs=to_install) | ||
LOG.debug("Apt cannot install %s", unavailable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all caps on APT in messaging everywhere I think
) | ||
# As of this writing, the only use of `package_command` outside of | ||
# distros calling it within their own classes is calling "upgrade" | ||
if command != "upgrade": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-missing dist-upgrade too? ... as well as any config overrides in apt_get_upgrade_subcommand looks like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't dist-upgrade called full-upgrade these days?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-missing dist-upgrade too?
Not as far as I can see. There's no calling code that calls package_command()
with anything other than upgrade
.
e8a7d84
to
9e3ea3e
Compare
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon. (If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.) |
9e3ea3e
to
6070e82
Compare
Commenting so the stale-pr bot doesn't get me 😄 |
6070e82
to
05aa7c8
Compare
@@ -0,0 +1,87 @@ | |||
from itertools import count, cycle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is just the tests removed from test_debian.py
|
||
from cloudinit import distros, subp, util | ||
from cloudinit.distros.debian import APT_GET_COMMAND, APT_GET_WRAPPER | ||
from cloudinit import distros, util |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests removed here were moved to test_apt.py
.
@@ -147,6 +147,18 @@ Both keys will be processed independently. | |||
* ``network-manager``: For ``nmcli connection load``/ | |||
``nmcli connection up``. | |||
* ``networkd``: For ``ip link set up``/``ip link set down``. | |||
- ``apt_get_command``: Command used to interact with APT repositories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't new, but were never documented.
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon. (If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @TheRealFalcon for pulling this together and patience on the review. I hit a couple of snags will schema annotation on invalid schema now that we are representing nested dicts in lists for this schema. I'll provide a separate PR for the schema annotation fixes, but I wanted to capture the review comments I have so far before pushing a separate PR for schema annotation fixes.
I have a number of questions and suggestion inline. Biggest question though is how we are handling "unknown package managers" from a distro perspective when user-data requests "snap" yet the image itself may not contain such a package manager by default.
}, | ||
"minItems": 2, | ||
"maxItems": 2 | ||
"$ref": "#/$defs/package_item_definition" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to order this oneOf schema item after the object item below otherwise JSON schema primary error message generated when "is not valid under any given schemas", is the schema failure from the first schema item in the oneOf list.
This means that when someone typos the package manager key you get the following error:
#cloud-config
packages:
- apta: [sl] # E1,E2
# Errors: -------------
# E1: {'apta': ['sl']} is not of type 'array'
# E2: {'apta': ['sl']} is not valid under any of the given schemas
If we order the object with additonalPropeties: false before the list of strings, we get a more helpful error message:
#cloud-config
packages:
- apta: [sl] # E1,E2
# Errors: -------------
# E1: Additional properties are not allowed ('apta' was unexpected)
# E2: {'apta': ['sl']} is not valid under any of the given schemas
unavailable = self.get_unavailable_packages( | ||
[x.split("=")[0] for x in pkglist] | ||
) | ||
LOG.debug( | ||
"The following packages were not found by APT so APT will " | ||
"not attempt to install them: %s", | ||
unavailable, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes default behavior in the case where we had packages: [pkg1, pkg2]
in that we now pay the cost of get_unavailable_packages
for any previous #cloud-config which provides a generic pkg list.
On distributions like debian where only one package_manager is defined, do we want to avoid this optional unavailable check and just attempt to install all packages provided and "fail hard" if some don't exist?
Additionally, this behavior to install all generic packages without an initial get_unavailable_packages
check may be desirable on stable Ubuntu releases if we don't want to incur extra boot time costs when no packages: - snap:
configuration is present.
But, for stable ubuntu releases, an apt-cache check is relatively inexpensive compared to the cost of running apt update and waiting for the package install. So, maybe that small cost is worth the better handling and support of snaps on stable releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes default behavior in the case where we had packages: [pkg1, pkg2] in that we now pay the cost of get_unavailable_packages for any previous #cloud-config which provides a generic pkg list.
This doesn't actually change behavior though, right? It just adds some search time?
On distributions like debian where only one package_manager is defined, do we want to avoid this optional unavailable check and just attempt to install all packages provided and "fail hard" if some don't exist?
Our inheritance rules make this a little tricky. I'm also not sure I see a super compelling reason to.
So, maybe that small cost is worth the better handling and support of snaps on stable releases.
This was my thinking. If you're installing packages, you're already adding more than a handful of seconds to your boot, so this should be fairly trivial compared to that. It doesn't impact the "fast boot" use case. Also, package install also comes after ssh, so it shouldn't impact time to ssh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't actually change behavior though, right? It just adds some search time?
Correct, yes no config changes or artifacts from cloud-init other than potential upfront cost to parse the apt-cache. Which again is not bad relative to the actual apt update we just performed and the software we need to install and configure during first boot anyway.... We've also added more error handling/coping if we want to fallback to snap'd packages in cases where the deb isn't available in apt, so I think it's a defensible cost to pay here.
Also, package install also comes after ssh, so it shouldn't impact time to ssh.
+1 to this too
failed: List[str] = [] | ||
for pkg in pkglist: | ||
try: | ||
subp.subp(["snap", "install"] + pkg.split("=")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that snap install may want to provide specific channels like [lxd, --channel=5.2/beta]
this split breaks down as our secondary parameters for snap can contain =
.
We should either change util.expand_package_list("%s=%s"
to a character we won't see in snap install parameters and split on that, or just split the first occiurrence of =
here
subp.subp(["snap", "install"] + pkg.split("=")) | |
subp.subp(["snap", "install"] + pkg.split("=", 1)) |
subp.subp(["snap", "install"] + pkg.split("=")) | ||
except subp.ProcessExecutionError: | ||
failed.append(pkg) | ||
LOG.info("Snap failed to install package: %s", pkg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add any additional params supplied to the snap install cmd in the logs.
known_package_managers[package_manager] | ||
].add(definition) | ||
except KeyError: | ||
LOG.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like we have unit test or integration test coverage of this case. Either would be fine.
packages_by_manager.get(manager.__class__, set()) | ||
| generic_packages | ||
) | ||
uninstalled = manager.install_packages(to_try) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only have generic_packages and len (distro.package_managers) == 1. We could provide a strict=True
or force=True
flag to install_packages to force using this package manager instead of try/fallback on unavailable. This could give us the ability to avoid checks on unavailable packages if we think that is something worth avoiding on debian or stable ubuntu.
cloudinit/distros/__init__.py
Outdated
# by distro | ||
for manager, packages in packages_by_manager.items(): | ||
if manager.name not in known_package_managers: | ||
LOG.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like we have unittest/integration coverage for this error, either would be fine.
I think we actually want to raise an error or continue here otherwise we continue after this and fail below with an ugly traceback on 234 because presumably manager
is not known/supported in this image.
if manager.name in [p.name for p in self.package_managers]: | ||
# We already installed/attempt these; don't try again | ||
continue | ||
uninstalled.extend( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Nothing is covering this in tests either. Maybe we want something like:
--- a/tests/unittests/config/test_cc_package_update_upgrade_install.py
+++ b/tests/unittests/config/test_cc_package_update_upgrade_install.py
@@ -75,6 +75,17 @@ class TestMultiplePackageManagers:
assert mock.call(["snap", "install", "pkg1"]) in m_subp.call_args_list
assert mock.call(["snap", "install", "pkg2"]) in m_subp.call_args_list
+ @mock.patch("cloudinit.subp.subp")
+ def test_explicit_snap_when_not_distro_default(self, m_subp, common_mocks):
+ import pdb; pdb.set_trace()
+ cloud = get_cloud("debian")
+ cfg = {"packages": [{"snap": ["pkg1", "pkg2"]}]}
+ handle("", cfg, cloud, [])
+
+ assert len(m_subp.call_args_list) == 2
+ assert mock.call(["snap", "install", "pkg1"]) in m_subp.call_args_list
+ assert mock.call(["snap", "install", "pkg2"]) in m_subp.call_args_list
+
@mock.patch("cloudinit.subp.subp")
def test_explicit_snap_version(self, m_subp, common_mocks):
cloud = get_cloud("ubuntu")
- One thought this brings up is, if debian doesn't have snap command installed should the package_manager have a
available/viable
check to determine if it is even compatible? If not we will traceback with a called process error here oncommand not found
. We can either add aPackageManager.is_applicable/available
check or better handle a ProcessExecutionError when the PackageManager primary command is absent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(original comment deleted).
This will be refactored in the next push
@blackboxsw , I believe I addressed all of your comments. The "big" new one is the |
Also, I'll push a rebase to fix the conflicts |
6773f3d
to
4b245c0
Compare
} | ||
] | ||
}, | ||
"additionalProperties": false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheRealFalcon I see you placed this additionalProperties:false here this doesn't end up applying to the oneOf object:
With your existing schema we still allow the following:
cat > pkg.yaml <<EOF
#cloud-config
packages:
- apta: ["sl"]
EOF
$ PYTHONPATH=. python3 -m cloudinit.cmd.main schema -c pkg.yaml
Valid cloud-config: pkg.yaml
I think instead we need to put the additionalProperties up at line 1953
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what I'm seeing:
root@me:~/a# cat some-config
#cloud-config
packages:
- apta: ["sl"]
root@me:~/a# cloud-init schema --config-file some-config
Invalid cloud-config some-config
Error: Cloud config schema errors: packages.0: Additional properties are not allowed ('apta' was unexpected), packages.0: {'apta': ['sl']} is not valid under any of the given schemas
Error: Invalid cloud-config schema: user-data
AFAIK, additionalProperties
isn't valid inside the properties
definition. It goes outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @TheRealFalcon, I also evaluated cost of install from generic package list for a known snap-only package and saw that our costs of apt cache check are < 100 ms to determine that we don't want to try/fail to install the uninstallable package before falling back to snap install. This is an ok price to pay when we have to fallback from generic packages that are not found in APT.
One last thing we need to address is only specific secondary package mangement sources are provided, we probably shouldn't be calling the primary PackageManager.install_packages in the first place. This leads to wasted apt update
being called on user-data like the following:
#cloud-config
packages:
- snap: [mattermost-desktop]
to avoid this I think we need a trivial check to avoid wasted calls on primary package_manager:
diff --git a/cloudinit/distros/__init__.py b/cloudinit/distros/__init__.py
index b272ff73a..67b66940b 100644
--- a/cloudinit/distros/__init__.py
+++ b/cloudinit/distros/__init__.py
@@ -210,6 +210,8 @@ class Distro(persistence.CloudInitPickleMixin, metaclass=abc.ABCMeta):
packages_by_manager.get(manager.__class__, set())
| generic_packages
)
+ if not to_try:
+ continue
uninstalled = manager.install_packages(to_try)
failed = {
pkg for pkg in uninstalled if pkg not in generic_packages
Otherwise +1
Add snap class Split apt stuff out of debian class Update ubuntu class to use both apt and snap
available as system_info. Move documentation accordingly
…ensive than the specific python file, so I added '_functionality' to the filename
Previously, a package manager would only be "known" if it was initialized, but this won't happen if it isn't a default for the distro. E.g., Debian never initializes Snap, so it is never seen as a known package manager. Instead, be explicit about which package managers are known. I moved it into a util file to avoid a circular import. The removed code in `install_packages` (lines 225-231) were redundant because we already make that check in `_extract_package_by_manager`
35060bf
to
4e4c7bc
Compare
When packages is non-empty, no longer call distros.update_package_sources because this loops through all package managers, calling a costly apt update even the packages provided specify only snaps to install. Also, distro.install_packages will specifically call the update_package_sources if necessary when installing packages so there is no gap if we avoid calling upfront.
@@ -96,7 +102,7 @@ def handle(name: str, cfg: Config, cloud: Cloud, args: list) -> None: | |||
pkglist = util.get_cfg_option_list(cfg, "packages", []) | |||
|
|||
errors = [] | |||
if update or len(pkglist) or upgrade: | |||
if update or upgrade: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this supplemental commit to avoid calling update_package_sources on all package managers. We don't need/want this in the event that not all package managers are represented in the user-data packages:
list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. one minor commit added to avoid calling update_package_sources based on a non-empty packages list. We will be calling that anyway for specific package_managers based on processing the packages later. No loss in doing this apt update later if/when needed.
…onical#4202) This also includes a major refactoring in how the packagement management code is handled. Changes include: * Backwards compatible change to cc_package_update_upgrade_install schema to allow explicitly specifying the package manager to use * Create PackageManager base class that new package manager classes can inherit from * Allow distros to specify the package managers they support with generic install code to install from any of the supported package managers * Create new snap.py and apt.py classes for anything snap and APT related respectively * Move all APT functionality out of debian.py and into apt.py and update callers appropriately * Pull the packaging related calls out of child distro classes and into `distros/__init__.py` so distro code continues to work once package management code is factored out. * Add and update tests Note that this currently only affects debian and ubuntu, along with apt and snap. Migrating other package managers should be straightforward enough, but can be done in later PRs.
Proposed Commit Message
Additional Context
Even though
distros/__init__.py
has method implementations where it didn't before, these should be safe because all child classes had to create their own implementations/overrides.