-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: load deb822 sources; use add-apt-repository to add #137
feat: load deb822 sources; use add-apt-repository to add #137
Conversation
Ubuntu 24.04 adopts the deb822 style source specification. Such files are listed in /etc/apt/sources.list.d/*sources, and allow the specification of sources in a multi-line format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! 🤩
I just have a couple comments/suggestions around how deb822 style sources are being handled in tandem with one-line style sources.
@@ -1198,7 +1199,7 @@ class RepositoryMapping(Mapping): | |||
""" | |||
|
|||
def __init__(self): | |||
self._repository_map = {} | |||
self._repository_map: Dict[str, DebianRepository] = {} | |||
# Repositories that we're adding -- used to implement mode param |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the block of code below also needs to be updated to account for the case where /etc/apt/sources.list
is just comments telling you "hey use ubuntu.sources
instead", otherwise you will still see the InvalidSourcesError
reported in #135 as self.load
fails to find any valid sources in /etc/apt/sources.list
:
# Repositories that we're adding -- used to implement mode param
self.default_file = "/etc/apt/sources.list"
# read sources.list if it exists
if os.path.isfile(self.default_file):
self.load(self.default_file)
The easiest thing I think we can do here to handle this case, where /etc/apt/sources.list
only contains comments, is to first check for the existence of /etc/apt/sources.list.d/ubuntu.sources
and then fall back to /etc/apt/sources.list
if */ubuntu.sources
doesn't exist:
default_list = "/etc/apt/sources.list"
default_sources = "/etc/apt/sources.list.d/ubuntu.sources"
if os.path.isfile(default_sources):
self.default_file = default_sources
else:
logger.debug("%s not found. defaulting to %s", default_sources, default_list)
self.default_file = default_list
self.load(self.default_file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point, will add a fix like the one you've suggested
gpg_key_filename=gpg_key, # TODO: gpg_key can be a literal key, not just a filename | ||
options=options, | ||
) | ||
for repotype, uri, suite in itertools.product(repotypes, uris, suites) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we should convert deb822 style sources into one-line style format? Reason saying is that the disable
methods from the DebianRepository
and RepositoryMapping
classes assumes that the defined repositories in the sources files are one line format. When you call disable
from either class, the method goes into the relevant sources file and comments out the repository line.
This won't work for deb822 style sources since you need to comment out all lines in the stanza for the source to be disabled, however, the sources.list
manpage recommends that you just add Enabled: no
to the stanza. disable
will also be unable to map its generated one-line style representation of the repository to what's in the .sources
file as the generated search expression won't map to the deb822 stanza.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great point, I didn't look into how DebianRepository
objects are used or the methods they support -- like disabling a source. The repository object we end up with will have to be aware of the format. My initial inclination is a Debian822Repository
inheriting from DebianRepository
or at least implementing the same 'protocol'.
EDIT: though maybe backwards compatibility in this way isn't possible since a deb822 source can have multiple repotypes, uris, and releases, which a DebianRepository is expected to just have one of each.
This also makes me wonder if we do need to parse in commented out deb822 style entries and treat them as disabled, which could be a pain. More on parsing in next reply.
@classmethod | ||
def _parse_deb822_lines( | ||
cls, | ||
lines: Iterable[str], | ||
filename: str = "", | ||
) -> Tuple[List[DebianRepository], List[InvalidSourceError]]: | ||
"""Parse lines from a deb822 file into a list of repos and a list of errors.""" | ||
repositories: List[DebianRepository] = [] | ||
errors: List[InvalidSourceError] = [] | ||
for paragraph in cls._iter_deb822_paragraphs(lines): | ||
try: | ||
repos = cls._parse_deb822_paragraph(paragraph, filename=filename) | ||
except InvalidSourceError as e: | ||
errors.append(e) | ||
else: | ||
repositories.extend(repos) | ||
return repositories, errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know regexes invoke a lot of opinions among software engineers, but I found this regex to be pretty useful for parsing deb822 stanzas. You can see it in action on regex101 - https://regex101.com/r/UriO7l/1:
_deb822_matcher = re.compile(
r"""
(?:Enabled:\s*)?(?P<enabled>.+)?\s?
Types:\s*(?P<repo_types>.{3,})\s
URIs:\s*(?P<uris>.+)\s
Suites:\s*(?P<suites>.+)\s
(?:Components:\s*)?(?P<components>.+)?\s?
(?:Signed-By:\s*)?(?P<gpg_key>.+)?\s?
(?P<options>((.*:\s*)(.+)\s?)*)?
""",
re.VERBOSE,
)
Then you can glom all the stanzas in a .sources
file like so:
for stanza in _deb822_matcher.finditer(content):
groups = stanza.groupdict()
enabled = groups.pop("enabled")
repo_types = groups.pop("repo_types").split()
uris = groups.pop("uris").split()
suites = groups.pop("suites").split()
components = groups.pop("components").split()
gpg_key = groups.pop("gpg_key")
raw_options = groups.pop("options")
if enabled not in ["yes", "no", None]:
raise InvalidSourceError("...")
else:
enabled = True if enabled == "yes" or enabled is None else False
if len(suites) == 1 and suites[0].endswith("/") and components is not None:
raise InvalidSourceError("...")
elif components is None:
raise InvalidSourceError("...")
options = {}
for option in raw_options.splitlines():
k, v = option.split(":", maxsplit=1)
options[k] = v.strip()
You do lose some of the granularity you're getting such as the line numbers that you're parsing, but there is the added benefit of not needing to maintain a custom file parser. Trade one form of complexity for another. The one benefit of the deb822 format is that it's easier for both humans to understand and for machines to manipulate 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, regexes ... that's how the one-line-style entries are parsed. I certainly wouldn't feel super happy writing a regex like that one from scratch, but maybe it's better to put the complexity in a standard (if controversial) format than using a custom parser.
That said, there are a couple of cases that this PR's implementation handles differently.
- regex enforces a specific order of lines in an entry, which afaict deb822 doesn't require
- deb822 allows an inline gpg key to be provided instead of a file path, which this regex doesn't capture
- how to handle comments? This PR just throws them away, but the regex seems to match on them
- the deb822 format allows you to specify arbitrary extra key-value pair options in an entry, which apt just ignores (so we could too?), but other tools might care about. This PR captures them wherever they are and, but it looks like the regex wants them to be only at the end (and they mess things up if they appear elsewhere)
I don't bring these up to nitpick the regex, but because I'm not sure I'd feel super comfortable adding those features to it, and the more features you add, the more the regex scares me haha
There's also the need to consider what to do with fully commented out entries ... in my implementation I just went with stripping out all comments entirely. I think allowing a commented out entry(paragraph) to be read in as a disabled entry opens up a lot of complexity even with a custom parser.
I've added some extra cases for the regex here if you want to take a look https://regex101.com/r/7fPKrM/1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note in case you read the previous comment in emails, I've edited it a bit
for repo in repos: | ||
repo_identifier = "{}-{}-{}".format(repo.repotype, repo.uri, repo.release) | ||
self._repository_map[repo_identifier] = repo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there should be a new class - e.g. something like Debian822Repository
- or a new attribute on DebianRepository
- e.g. DebianRepository(...).source_format
- for sources that are written in deb822 style format 🤔
Reason saying is that I think the current DebianRepository
class is well suited for one-line style sources, but not deb822 style. If a particular DebianRepository
generated from the deb822 source, how does that get mapped back correctly into the *.sources
file? Even though we're able to convert the deb822 style source to multiple one-lines, the RepositoryMapping
cache loses how the source is actually represented in the *.sources
file it was read from. Perhaps if we had some tag or differentiation between the two formats on DebianRepository
, we can signal that the source should be treated different when dumping a source into a file under /etc/apt/sources.list.d/
or modifying an existing source in a *.source
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is actually a good case for inheritance for maximum backwards compatibility. A Debian822Repository
is a DebianRepository
, just with different implementation details of some methods.
EDIT: though maybe backwards compatibility in this way isn't possible since a deb822 source can have multiple repotypes, uris, and releases, which a DebianRepository
is expected to just have one of each.
This is perhaps a good point to think about how to handle a Signed-by
key that was provided inline. It's not clear to me how best to handle the existing gpg_key
property in this case, which is expected to be a string filename. There doesn't seem to be a property for getting the key file contents directly -- the other key related methods are all about adding a key to a repository.
Here are some options for how we could handle the inline key case.
- return an empty string (default value), and document that users should check
gpg_key_inline
in this case- old code won't instantly break
- old code will silently assume that inline key entries aren't signed
- maybe ok if they raise an error in this case, since they can then be fixed (but bad to break them)
- bad if instead you get a silent behaviour change (seems more likely to me, but idk)
- new code will have to do
isinstance
orhasattr
or catchAttributeError
when checking forgpg_key_inline
- return a new sentinel value to indicate that you should check
gpg_key_inline
- instantly breaks old code expecting a string
- doesn't hide any errors with treating the entry as unsigned
- new code has to switch based on the return value (but probably is already to handle the empty string case indicating no key)
- just return the inline key instead of a filename
- instantly breaks old code expecting a filename
- doesn't hide any errors with treating the entry as unsigned
- annoying for all users to check if it's a filename or key
- try to do something clever by writing the inline key to a temporary file (in memory?) and returning the path to that
- old code works perfectly
- added complexity? what could go wrong here?
TL;DR: a deb822 source can specify multiple one-line-style repositories. This PR addressed this by parsing into multiple EDIT: if calling
EDIT: I just noticed that calling Apologies for the giant wall of text below. Currently, a A deb822 source entry can specify multiple of each of those components of the identifier. This PR iterates over them to create multiple @NucciTheBoss pointed out that this is problematic when it comes to With the one-line-style source definition, disabling a source is as simple as commenting the line. An entire deb822 source entry can be enabled or disabled by setting the value of an In response to @NucciTheBoss's comments, I was leaning towards parsing a source into a single I think this makes parsing into multiple I'd thought that ideally users of this library wouldn't even need to know that multiple source formats exist (one-line-style and deb822), but the semantics of enable/disable make this tricky. One option would be to make a Here's my pitch on how to try for backwards compatibility. I think the best option is actually backwards incompatible, and the options that do maintain backwards compatibility of the python api need to mess with the underlying files too much to be reasonable. I'm open to better ideas.
|
Thank you for your in-depth replies to my comments @james-garner-canonical! I appreciate it! We already chatted a bit over Matrix, but I also wanted to reply to your replies more in-depth as well 😆 Re. your comments on the regex, I agree that it isn't perfect - there's tradeoffs to both approaches - I do personally find with regexes it's easier to figure out what's going on by slapping it into a regex tool with some provided test input to see how parsing text works compared to having to mentally walk through a custom parser. However, I do agree with you that they get ugly fast and can become more trouble than they're worth after a certain point. However, even though the deb822 style is rather flexible, I do think we should draw the line somewhere with how the charm library expects it to be formatted to make our lives easier. Ideally charms are headless systems, so folks shouldn't be logging in and manually modifying the Re. backwards compatibility with the I think trying to maintain an internal register/reference that tracks a bunch of one-line sources in order to recombine them into a deb822 I my opinion it's fair to say "hey if you want support for third-party repositories on Noble, you need to use v1 of this charm library because v0 doesn't support deb822 style sources" if it simplifies the implementation and reduces the maintenance burden of supporting newer Ubuntu releases that use the deb822 style source format rather than the legacy one-line style source format. |
add: add ability to write a deb822 format file disable: raise NotImplementedError for deb822 format files gpg_key: use existing import_key functionality to provide keys specified in the stanza itself as a file for compatibility Also refactor Deb822 functionality to a separate class. Also move deb822 unit tests to the more appropriate test_repo.py
"""Remove this repository from consideration. | ||
"""Remove this repository by disabling it in the source file. | ||
|
||
WARNING: This method does NOT alter the `self.enabled` flag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it? I didn't find any uses of disable
in the charms I know about, but if we intend to remove this method in future, maybe we don't want to extend its effects now.
WARNING: the default_filename keyword argument is provided for backwards compatibility | ||
only. It is not used, and was not used in the previous revision of this library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just remove it? This method is called in one charm, and the default_filename argument isn't used there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say just leave it in to avoid unnecessary changes / type checker issues.
WARNING: in the one-per-line format case, will mutate repo.options to add the 'signed-by' | ||
key if both options and gpg_file are truthy (for example if a gpg_key_filename and | ||
non-empty options were provided at DebianRepository initialisation time). If options were | ||
not provided or are empty, but a gpg_key is available, will silently fail to include it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug. Potentially easy to fix. Doesn't affect the one user of this method, as the repository they pass doesn't use either gpg_key
or options
.
WARNING: if repo.filename is falsey, the new filename is calculated only from the repo's | ||
uri and release, but repos are assumed to be uniquely identified by the combination of | ||
repotype, uri, and release. Adding two repos with differing repotypes but identical uris | ||
and releases will result in the second repo clobbering the file written by the first. | ||
In this case, repo.filename must be set appropriately to avoid data loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be improved by including the repotype in the filename. Any concerns with compatibility when making this change? The lack of users points towards it being safe enough.
WARNING: if repo.filename is truthy, and that file exists, this method will clobber that | ||
file with a single entry for the repo being added. Set it to a falsey value to have a new | ||
filename derived from its uri and release (which will also clobber any existing file), or | ||
set the filename that you want to write to -- or construct a new DebianRepository object | ||
with the filename you want to write to. | ||
For example: repo.filename = my_filename | ||
For example: DebianRepository(uri=repo.uri, filename=my_filename, ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this behaviour. It seems especially bad that any DebianRepository
repo
created via a RepositoryMapping
mapping
will clobber the entire file it was loaded from with a single entry if you call mapping.add(repo)
.
WARNING: if repo.enabled is falsey, the repository will be added in the disabled state. | ||
Note that repo.disable() does not affect this value. If repo.enabled does not match the | ||
value you expect, construct a new DebianRepository object with the appropriate value. | ||
For example: DebianRepository(uri=repo.uri, enabled=my_value, ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably ok, but a little weird, and potentially annoying since there's no api to enable a repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My gut feeling is that this PR is too big for what it does. Part of that is adding data files for unit tests (sources files, in an apt directory structure), and integration tests (key files). Part of that is adding unit and integration tests -- perhaps some redundancy can be eliminated here? But a lot of it is the changes to apt.py ... worth cutting this down if possible?
e.stdout.decode(), | ||
e.stderr.decode(), | ||
) | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding some useful logging
@james-garner-canonical @benhoyt can confirm that these changes work on our end for enabling Noble support in our HPC charms. Thanks for working on this 🤩 |
That's great to hear, @NucciTheBoss! Merged now, thanks for your help with this |
Ubuntu 24.04 uses deb288 format sources, which python-libjuju does not currently support (as pointed out by @yanksyoon and @NucciTheBoss in #135). This PR adds support for reading
*sources
files.RepositoryMapping
now automatically findes them in/etc/apt/sources.list.d
when initialised. Additionally, to avoid anInvalidSourceError
if/etc/apt/sources.list
exists but only includes a comment about the new sources location (as it does on noble), we ignore this error if/etc/apt/sources.list.d/ubuntu.sources
exists.This PR also cleans up the
add
method by callingadd-apt-repository
internally instead of writing directly to a file. Note that this means that adding a disabled repository no longer has any effect (previously it would write a a file containing a single, commented-out one-line repository definition).Original PR content below.
Ubuntu 24.04 uses deb288 format sources, which python-libjuju does not currently support (as pointed out by @yanksyoon and @NucciTheBoss in #135). This PR adds support for reading
*sources
files from/etc/apt/sources.list.d
.While there are existing deb288 parsing implementations, such as python-apt and python-debian, this charm-lib currently has no external dependencies, nor should it have any, so I've taken a stab at implementing deb288 parsing in the
RepositoryMapping
class.In
__init__
, if files matching/etc/apt/sources.list.d/*sources
exist, they are loaded using the newload_deb822
method. Loading and parsing is broken down into collecting the lines that make up a paragraph/entry (_iter_deb822_paragraphs
), parsing the lines into key/value pairs (_get_deb822_options
), and validating these entries to constructDebianRepository
s (_parse_deb822_paragraph
).Unit tests cover this logic with text from a good deb822 format file, and a malformed version of this file. However, there are no doubt a number of edge cases that haven't been covered here.
I haven't made any changes to integration tests, but with the functionality added they currently load a
.sources
file intests/integration/test_apt.py
'stest_install_package_external_repository
andtest_list_file_generation_external_repository
. However it might be good to explicitly test this in integration tests too.I'd welcome suggestions for test cases, as well as pointing out any deb822 features that are missing or not implemented correctly.