Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: load deb822 sources; use add-apt-repository to add #137

Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
7472c51
feat: support deb822 style source specification
james-garner-canonical Nov 20, 2024
66f42cd
refactor: separate file reading and object updating from parsing lines
james-garner-canonical Nov 20, 2024
20b0f9e
refactor: preserve line nos when parsing deb822 lines into paragraphs
james-garner-canonical Nov 20, 2024
0e6c8b3
refactor: move turning numbered lines into options dict to helper
james-garner-canonical Nov 20, 2024
8dbd158
feat: support inline comments in deb822 source files
james-garner-canonical Nov 21, 2024
9d9c9fa
tests: add some unit tests for _iter_deb822_paragraphs
james-garner-canonical Nov 21, 2024
5a7c22e
fix: raise an InvalidSourceError on missing required deb822 keys
james-garner-canonical Nov 21, 2024
bf26ce0
tests: add some unit tests for _get_deb822_options
james-garner-canonical Nov 21, 2024
d43074b
fix: correctly handle newline terminated lines in paragraph iteration
james-garner-canonical Nov 21, 2024
484b106
tests: add some unit tests for _iter_deb822_paragraphs
james-garner-canonical Nov 21, 2024
86e78b5
tests: add some tests for _parse_deb822_lines
james-garner-canonical Nov 21, 2024
2dd9dcd
feat: add the number of errors to the debug output in load_deb822
james-garner-canonical Nov 21, 2024
6c39d86
tests: add some unit tests for load_deb822
james-garner-canonical Nov 21, 2024
246bd56
tests: add a unit test for initialiasing RepositoryMapping with deb822
james-garner-canonical Nov 21, 2024
8717652
style: tox -e fmt tests/unit/test_apt.py
james-garner-canonical Nov 21, 2024
9dfadc1
fix: correct docstring
james-garner-canonical Nov 21, 2024
c44a4a5
fix: python3.8 context managers and clean up linter directives
james-garner-canonical Nov 21, 2024
2df185e
feat: make DebianRepository aware of deb822 format for some operations
james-garner-canonical Nov 27, 2024
c963280
fix: remove extraneous information from docstring
james-garner-canonical Nov 27, 2024
7fa6295
feat: use apt-add-repository for RepositoryMapping.add
james-garner-canonical Nov 28, 2024
86ebdd1
refactor: logic and testing cleanup
james-garner-canonical Dec 1, 2024
c364111
style: type annotate integration test helper function
james-garner-canonical Dec 2, 2024
71c5f90
feat: support apt autoremove in remove_package
james-garner-canonical Dec 2, 2024
afc85b6
feat: debug useful information on error in apt.update
james-garner-canonical Dec 2, 2024
3b7872a
feat: log when writing to a sources file in from_repo_line
james-garner-canonical Dec 3, 2024
e24001a
feat: make the apt directories (private) attributes (for testing)
james-garner-canonical Dec 3, 2024
497bf0e
style: correct RepositoryMapping.__iter__ annotation and add FIXME
james-garner-canonical Dec 3, 2024
7c70fa8
feat: log the filename as well as the number of repos when parsing
james-garner-canonical Dec 3, 2024
10df1e8
feat: refactor RepositoryMapping.add to possible support remove as well
james-garner-canonical Dec 3, 2024
2b726c4
feat: allow _Deb822Stanza from an empty set of lines
james-garner-canonical Dec 3, 2024
9d3230a
tests: add unit tests that use files on disk
james-garner-canonical Dec 3, 2024
620cb15
tests: update integration tests
james-garner-canonical Dec 3, 2024
eafe869
feat: use sourceslist (one-per-line) format for add-apt-repository
james-garner-canonical Dec 3, 2024
2945765
feat: call _add_apt_repository in from_repo_line (don't reimplement l…
james-garner-canonical Dec 3, 2024
1232175
tests: re-enable package cleanup in integration tests
james-garner-canonical Dec 3, 2024
569c39e
tests: add test case from hardware-observer charm and cleanup keys
james-garner-canonical Dec 3, 2024
b081253
tests: refactor deb822 unit tests
james-garner-canonical Dec 3, 2024
da057f5
tests: iterate on deb822 tests
james-garner-canonical Dec 3, 2024
08aa90d
tests: fully refactor and expand deb822 unit tests
james-garner-canonical Dec 3, 2024
8008a97
style: make linting happy
james-garner-canonical Dec 4, 2024
58c61be
test: fix an erroneous assert in integration tests
james-garner-canonical Dec 4, 2024
89014aa
refactor: clean up diff, signatures, etc
james-garner-canonical Dec 4, 2024
d0674e9
refactor: move keys to files
james-garner-canonical Dec 4, 2024
3745426
fix: don't warn about key file on remove
james-garner-canonical Dec 4, 2024
865aad8
style: use tuple for endswith
james-garner-canonical Dec 4, 2024
b993c54
style: paragraph -> stanza, repositories -> repos
james-garner-canonical Dec 4, 2024
9ce3e48
test: switch from inkscape to terminator ppa for ci
james-garner-canonical Dec 4, 2024
2b7cb4e
test: switch from terminator to fish ppa for ci
james-garner-canonical Dec 5, 2024
709e2c7
chore: remove ValueError from RepositoryMapping.add
james-garner-canonical Dec 5, 2024
2820497
test: remove old tests from test_repo.py
james-garner-canonical Dec 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
213 changes: 207 additions & 6 deletions lib/charms/operator_libs_linux/v0/apt.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,14 +102,15 @@

import fileinput
import glob
import itertools
import logging
import os
import re
import subprocess
from collections.abc import Mapping
from enum import Enum
from subprocess import PIPE, CalledProcessError, check_output
from typing import Iterable, List, Optional, Tuple, Union
from typing import Dict, Iterable, Iterator, List, Optional, Tuple, Union
from urllib.parse import urlparse

logger = logging.getLogger(__name__)
Expand All @@ -122,7 +123,7 @@

# Increment this PATCH version before using `charmcraft publish-lib` or reset
# to 0 if you are raising the major API version
LIBPATCH = 14
LIBPATCH = 15


VALID_SOURCE_TYPES = ("deb", "deb-src")
Expand Down Expand Up @@ -1198,7 +1199,7 @@ class RepositoryMapping(Mapping):
"""

def __init__(self):
self._repository_map = {}
self._repository_map: Dict[str, DebianRepository] = {}
# Repositories that we're adding -- used to implement mode param
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the block of code below also needs to be updated to account for the case where /etc/apt/sources.list is just comments telling you "hey use ubuntu.sources instead", otherwise you will still see the InvalidSourcesError reported in #135 as self.load fails to find any valid sources in /etc/apt/sources.list:

        # Repositories that we're adding -- used to implement mode param
        self.default_file = "/etc/apt/sources.list"

        # read sources.list if it exists
        if os.path.isfile(self.default_file):
            self.load(self.default_file)

The easiest thing I think we can do here to handle this case, where /etc/apt/sources.list only contains comments, is to first check for the existence of /etc/apt/sources.list.d/ubuntu.sources and then fall back to /etc/apt/sources.list if */ubuntu.sources doesn't exist:

default_list = "/etc/apt/sources.list"
default_sources = "/etc/apt/sources.list.d/ubuntu.sources"
if os.path.isfile(default_sources):
    self.default_file = default_sources
else:
    logger.debug("%s not found. defaulting to %s", default_sources, default_list)
    self.default_file = default_list
    self.load(self.default_file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point, will add a fix like the one you've suggested

self.default_file = "/etc/apt/sources.list"

Expand All @@ -1210,6 +1211,9 @@ def __init__(self):
for file in glob.iglob("/etc/apt/sources.list.d/*.list"):
self.load(file)

for file in glob.iglob("/etc/apt/sources.list.d/*.sources"):
self.load_deb822(file)

def __contains__(self, key: str) -> bool:
"""Magic method for checking presence of repo in mapping."""
return key in self._repository_map
Expand All @@ -1231,13 +1235,13 @@ def __setitem__(self, repository_uri: str, repository: DebianRepository) -> None
self._repository_map[repository_uri] = repository

def load(self, filename: str):
"""Load a repository source file into the cache.
"""Load a one-line-style format repository source file into the cache.

Args:
filename: the path to the repository file
"""
parsed = []
skipped = []
parsed: List[int] = []
skipped: List[int] = []
with open(filename, "r") as f:
for n, line in enumerate(f):
try:
Expand Down Expand Up @@ -1314,6 +1318,203 @@ def _parse(line: str, filename: str) -> DebianRepository:
else:
raise InvalidSourceError("An invalid sources line was found in %s!", filename)

def load_deb822(self, filename: str) -> None:
"""Load a deb822 format repository source file into the cache.

Args:
filename: the path to the repository file

In contrast to one-line-style, the deb822 format specifies a repository
using a multi-line paragraph. Paragraphs are separated by whitespace,
and each definition consists of lines that are either key: value pairs,
or continuations of the previous value.

Read more about the deb822 format here:
https://manpages.ubuntu.com/manpages/noble/en/man5/sources.list.5.html
For instance, ubuntu 24.04 (noble) lists its sources using deb822 style in:
/etc/apt/sources.list.d/ubuntu.sources

The semantics of `load_deb822` slightly different to `load`:
`load` calls `_parse`, which reads a commented out line as an entry that is not enabled
`load_deb822` strips out comments entirely when parsing a file into paragraphs, and
assumes that comments have been removed when parsing individual paragraphs/entry,
instead only reading the 'Enabled' key to determine if an entry is enabled
"""
with open(filename, "r") as f:
repos, errors = self._parse_deb822_lines(f, filename=filename)

for repo in repos:
repo_identifier = "{}-{}-{}".format(repo.repotype, repo.uri, repo.release)
self._repository_map[repo_identifier] = repo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there should be a new class - e.g. something like Debian822Repository - or a new attribute on DebianRepository - e.g. DebianRepository(...).source_format - for sources that are written in deb822 style format 🤔

Reason saying is that I think the current DebianRepository class is well suited for one-line style sources, but not deb822 style. If a particular DebianRepository generated from the deb822 source, how does that get mapped back correctly into the *.sources file? Even though we're able to convert the deb822 style source to multiple one-lines, the RepositoryMapping cache loses how the source is actually represented in the *.sources file it was read from. Perhaps if we had some tag or differentiation between the two formats on DebianRepository, we can signal that the source should be treated different when dumping a source into a file under /etc/apt/sources.list.d/ or modifying an existing source in a *.source file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually a good case for inheritance for maximum backwards compatibility. A Debian822Repository is a DebianRepository, just with different implementation details of some methods.

EDIT: though maybe backwards compatibility in this way isn't possible since a deb822 source can have multiple repotypes, uris, and releases, which a DebianRepository is expected to just have one of each.


This is perhaps a good point to think about how to handle a Signed-by key that was provided inline. It's not clear to me how best to handle the existing gpg_key property in this case, which is expected to be a string filename. There doesn't seem to be a property for getting the key file contents directly -- the other key related methods are all about adding a key to a repository.

Here are some options for how we could handle the inline key case.

  1. return an empty string (default value), and document that users should check gpg_key_inline in this case
    • old code won't instantly break
    • old code will silently assume that inline key entries aren't signed
      • maybe ok if they raise an error in this case, since they can then be fixed (but bad to break them)
      • bad if instead you get a silent behaviour change (seems more likely to me, but idk)
    • new code will have to do isinstance or hasattr or catch AttributeError when checking for gpg_key_inline
  2. return a new sentinel value to indicate that you should check gpg_key_inline
    • instantly breaks old code expecting a string
    • doesn't hide any errors with treating the entry as unsigned
    • new code has to switch based on the return value (but probably is already to handle the empty string case indicating no key)
  3. just return the inline key instead of a filename
    • instantly breaks old code expecting a filename
    • doesn't hide any errors with treating the entry as unsigned
    • annoying for all users to check if it's a filename or key
  4. try to do something clever by writing the inline key to a temporary file (in memory?) and returning the path to that
    • old code works perfectly
    • added complexity? what could go wrong here?


if errors:
logger.debug(
"the following %d error(s) were encountered when reading deb822 format sources:\n%s",
len(errors),
"\n".join(str(e) for e in errors),
)

if repos:
logger.info("parsed %d apt package repositories", len(repos))
else:
raise InvalidSourceError("all repository lines in '{}' were invalid!".format(filename))

@classmethod
def _parse_deb822_lines(
cls,
lines: Iterable[str],
filename: str = "",
) -> Tuple[List[DebianRepository], List[InvalidSourceError]]:
"""Parse lines from a deb822 file into a list of repos and a list of errors."""
repositories: List[DebianRepository] = []
errors: List[InvalidSourceError] = []
for paragraph in cls._iter_deb822_paragraphs(lines):
try:
repos = cls._parse_deb822_paragraph(paragraph, filename=filename)
except InvalidSourceError as e:
errors.append(e)
else:
repositories.extend(repos)
return repositories, errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know regexes invoke a lot of opinions among software engineers, but I found this regex to be pretty useful for parsing deb822 stanzas. You can see it in action on regex101 - https://regex101.com/r/UriO7l/1:

_deb822_matcher = re.compile(
    r"""
    (?:Enabled:\s*)?(?P<enabled>.+)?\s?
    Types:\s*(?P<repo_types>.{3,})\s
    URIs:\s*(?P<uris>.+)\s
    Suites:\s*(?P<suites>.+)\s
    (?:Components:\s*)?(?P<components>.+)?\s?
    (?:Signed-By:\s*)?(?P<gpg_key>.+)?\s?
    (?P<options>((.*:\s*)(.+)\s?)*)?
    """,
    re.VERBOSE,
)

Then you can glom all the stanzas in a .sources file like so:

for stanza in _deb822_matcher.finditer(content):
    groups = stanza.groupdict()
    enabled = groups.pop("enabled")
    repo_types = groups.pop("repo_types").split()
    uris = groups.pop("uris").split()
    suites = groups.pop("suites").split()
    components = groups.pop("components").split()
    gpg_key = groups.pop("gpg_key")
    raw_options = groups.pop("options")

    if enabled not in ["yes", "no", None]:
        raise InvalidSourceError("...")
    else:
        enabled = True if enabled == "yes" or enabled is None else False

    if len(suites) == 1 and suites[0].endswith("/") and components is not None:
        raise InvalidSourceError("...")
    elif components is None:
        raise InvalidSourceError("...")

    options = {}
    for option in raw_options.splitlines():
        k, v = option.split(":", maxsplit=1)
        options[k] = v.strip()

You do lose some of the granularity you're getting such as the line numbers that you're parsing, but there is the added benefit of not needing to maintain a custom file parser. Trade one form of complexity for another. The one benefit of the deb822 format is that it's easier for both humans to understand and for machines to manipulate 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, regexes ... that's how the one-line-style entries are parsed. I certainly wouldn't feel super happy writing a regex like that one from scratch, but maybe it's better to put the complexity in a standard (if controversial) format than using a custom parser.

That said, there are a couple of cases that this PR's implementation handles differently.

  1. regex enforces a specific order of lines in an entry, which afaict deb822 doesn't require
  2. deb822 allows an inline gpg key to be provided instead of a file path, which this regex doesn't capture
  3. how to handle comments? This PR just throws them away, but the regex seems to match on them
  4. the deb822 format allows you to specify arbitrary extra key-value pair options in an entry, which apt just ignores (so we could too?), but other tools might care about. This PR captures them wherever they are and, but it looks like the regex wants them to be only at the end (and they mess things up if they appear elsewhere)

I don't bring these up to nitpick the regex, but because I'm not sure I'd feel super comfortable adding those features to it, and the more features you add, the more the regex scares me haha

There's also the need to consider what to do with fully commented out entries ... in my implementation I just went with stripping out all comments entirely. I think allowing a commented out entry(paragraph) to be read in as a disabled entry opens up a lot of complexity even with a custom parser.

I've added some extra cases for the regex here if you want to take a look https://regex101.com/r/7fPKrM/1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note in case you read the previous comment in emails, I've edited it a bit


@staticmethod
def _iter_deb822_paragraphs(lines: Iterable[str]) -> Iterator[List[Tuple[int, str]]]:
"""Given lines from a deb822 format file, yield paragraphs.

A paragraph is a list of numbered lines that make up a source entry,
with comments stripped out (but accounted for in line numbering).
"""
current_paragraph: List[Tuple[int, str]] = []
for n, line in enumerate(lines): # 0 indexed line numbers, following `load`
if not line.strip(): # blank lines separate paragraphs
if current_paragraph:
yield current_paragraph
current_paragraph = []
continue
content, _delim, _comment = line.partition("#")
if content.strip(): # skip (potentially indented) comment line
current_paragraph.append((n, content.rstrip())) # preserve indent
if current_paragraph:
yield current_paragraph

@classmethod
def _parse_deb822_paragraph(
cls,
lines: List[Tuple[int, str]],
filename: str = "",
) -> List[DebianRepository]:
"""Parse a list of numbered lines forming a deb822 style repository definition.

Args:
lines: a list of numbered lines forming a deb822 paragraph
filename: the name of the file being read (for DebianRepository and errors)

Raises:
InvalidSourceError if the source type is unknown or contains malformed entries
"""
options, line_numbers = cls._get_deb822_options(lines)

enabled_field = options.pop("Enabled", "yes")
if enabled_field == "yes":
enabled = True
elif enabled_field == "no":
enabled = False
else:
raise InvalidSourceError(
(
"Bad value '{value}' for entry 'Enabled' (line {enabled_line})"
" in file {file}. If 'Enabled' is present it must be one of"
" yes or no (if absent it defaults to yes)."
).format(
value=enabled_field,
enabled_line=line_numbers["Enabled"],
file=filename,
)
)

gpg_key = options.pop("Signed-By", "")

try:
repotypes = options.pop("Types").split()
uris = options.pop("URIs").split()
suites = options.pop("Suites").split()
except KeyError as e:
[key] = e.args
raise InvalidSourceError(
"Missing key '{key}' for entry starting on line {line} in {file}.".format(
key=key,
line=min(line_numbers.values()),
file=filename,
)
)

components: List[str]
if len(suites) == 1 and suites[0].endswith("/"):
if "Components" in options:
raise InvalidSourceError(
(
"Since 'Suites' (line {suites_line}) specifies"
" a path relative to 'URIs' (line {uris_line}),"
" 'Components' (line {components_line}) must be ommitted"
" (in file {file})."
).format(
suites_line=line_numbers["Suites"],
uris_line=line_numbers["URIs"],
components_line=line_numbers["Components"],
file=filename,
)
)
components = []
else:
if "Components" not in options:
raise InvalidSourceError(
(
"Since 'Suites' (line {suites_line}) does not specify"
" a path relative to 'URIs' (line {uris_line}),"
" 'Components' must be present in this paragraph"
" (in file {file})."
).format(
suites_line=line_numbers["Suites"],
uris_line=line_numbers["URIs"],
file=filename,
)
)
components = options.pop("Components").split()

return [
DebianRepository(
enabled=enabled,
repotype=repotype,
uri=uri,
release=suite,
groups=components,
filename=filename,
gpg_key_filename=gpg_key, # TODO: gpg_key can be a literal key, not just a filename
options=options,
)
for repotype, uri, suite in itertools.product(repotypes, uris, suites)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should convert deb822 style sources into one-line style format? Reason saying is that the disable methods from the DebianRepository and RepositoryMapping classes assumes that the defined repositories in the sources files are one line format. When you call disable from either class, the method goes into the relevant sources file and comments out the repository line.

This won't work for deb822 style sources since you need to comment out all lines in the stanza for the source to be disabled, however, the sources.list manpage recommends that you just add Enabled: no to the stanza. disable will also be unable to map its generated one-line style representation of the repository to what's in the .sources file as the generated search expression won't map to the deb822 stanza.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great point, I didn't look into how DebianRepository objects are used or the methods they support -- like disabling a source. The repository object we end up with will have to be aware of the format. My initial inclination is a Debian822Repository inheriting from DebianRepository or at least implementing the same 'protocol'.

EDIT: though maybe backwards compatibility in this way isn't possible since a deb822 source can have multiple repotypes, uris, and releases, which a DebianRepository is expected to just have one of each.

This also makes me wonder if we do need to parse in commented out deb822 style entries and treat them as disabled, which could be a pain. More on parsing in next reply.

]

@staticmethod
def _get_deb822_options(
lines: Iterable[Tuple[int, str]]
) -> Tuple[Dict[str, str], Dict[str, int]]:
parts: Dict[str, List[str]] = {}
line_numbers: Dict[str, int] = {}
current = None
for n, line in lines:
assert "#" not in line # comments should be stripped out
if line.startswith(" "): # continuation of previous key's value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are tabs acceptable too? If so, line.startswith((" ", "\t"))

assert current is not None
parts[current].append(line.rstrip()) # preserve indent
continue
raw_key, _, raw_value = line.partition(":")
current = raw_key.strip()
parts[current] = [raw_value.strip()]
line_numbers[current] = n
options = {k: "\n".join(v) for k, v in parts.items()}
return options, line_numbers

def add(self, repo: DebianRepository, default_filename: Optional[bool] = False) -> None:
"""Add a new repository to the system.

Expand Down
Loading
Loading