Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: require keyword arguments for most dataclasses #137

Merged
merged 5 commits into from
Jul 8, 2024

Conversation

tonyandrewmeyer
Copy link
Collaborator

@tonyandrewmeyer tonyandrewmeyer commented May 31, 2024

Makes the arguments to the various dataclasses require keyword args wherever there isn't an obvious positional order.

  • Action - name as first positional argument, and everything else as kw-only (maybe params as second would be ok?)
  • ActionOutput - all args are kw-only - it seems unlikely anyone would be instantiating this themselves
  • Address - all args are kw-only
  • Container - name as first positional argument, and everything else as kw-only
  • Network - all args are kw-only
  • Port - protocol and port are both kw-only
  • Secret - contents as first positional argument (previously id was first) and everything else as kw-only (another 7.0 PR removes the need to manually specify the ID)
  • StoredState - all args are kw-only
  • State - all args are kw-only (it seems like this was the only reasonable way to use it previously anyway)

This is an alternative to #117 - it's a bigger change, but it does seem like the better one, and since it breaks backwards compatibility significantly it'd be better do it now than later if we do want this change.

Compatibility with Python 3.8 is retained - basically, we're sacrificing some cleanliness of implementation in exchange for cleanliness of API. This will pay off, particularly in the longer term when we can move up to Python 3.10+ (likely 3.11, maybe in mid 2025), and all of the subclassing can be replaced with dataclass(kw_only=True) and dataclasses.KW_ARG.

@tonyandrewmeyer
Copy link
Collaborator Author

@benhoyt @PietroPasotti wdyt? I like this over #117, but could be swayed by an argument to just do the smaller #117 and leave this for now. @PietroPasotti with your much larger experience actually writing scenarios I assume you'd have opinions on whether I've picked the right args to be kw-only also.

If we go this way, wdyt about the minimum Python version being higher than ops?

@benhoyt
Copy link
Collaborator

benhoyt commented May 31, 2024

I do prefer this over #117 too. However, I'm not sure about the bump to require Python 3.10. It's fine while ops-scenario is still in a separate repo, but if we're planning to merge this into ops itself, while that's still needs 3.8, that seems like asking for pain.

@tonyandrewmeyer
Copy link
Collaborator Author

However, I'm not sure about the bump to require Python 3.10. It's fine while ops-scenario is still in a separate repo, but if we're planning to merge this into ops itself, while that's still needs 3.8, that seems like asking for pain.

Ah, good point, I hadn't thought about that. Even if it's a pip install ops[testing] that pulls Scenario from a different repo into the ops.testing namespace, that's probably not ideal.

If @PietroPasotti is also on board with this over #117, then I'll rework it to be Python 3.8 compatible.

Is the ops minimum version essentially constrained to be the minimum that's in an Ubuntu LTS that's still in standard support?

@benhoyt
Copy link
Collaborator

benhoyt commented May 31, 2024

Is the ops minimum version essentially constrained to be the minimum that's in an Ubuntu LTS that's still in standard support?

Yeah, or the minimum from the oldest LTS that Juju supports.

@PietroPasotti
Copy link
Collaborator

I like this one more too.

Copy link
Collaborator

@dimaqq dimaqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me code-wise.

Deferring to Pietro wrt. the API.

scenario/mocking.py Outdated Show resolved Hide resolved
tests/helpers.py Outdated Show resolved Hide resolved
@tonyandrewmeyer
Copy link
Collaborator Author

@benhoyt @PietroPasotti I've reworked this to be Python 3.8 compatible, so it's ready for review when you have time.

Copy link
Collaborator

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the intent of this change, but now that I see the implementation, and how much (error-prone) boilerplate it is, I'm not sure about this! Does it gain us enough for the significant implementation awkwardness?

I think I'd be inclined to go back to the much simpler #117, and wait with this keyword-only thing till we are actually on Python 3.10.

But worth a voice discussion to try to come to consensus.

scenario/context.py Outdated Show resolved Hide resolved
scenario/context.py Outdated Show resolved Hide resolved
tests/test_consistency_checker.py Outdated Show resolved Hide resolved
scenario/context.py Outdated Show resolved Hide resolved
scenario/state.py Outdated Show resolved Hide resolved
scenario/state.py Outdated Show resolved Hide resolved
@tonyandrewmeyer
Copy link
Collaborator Author

I like the intent of this change, but now that I see the implementation, and how much (error-prone) boilerplate it is, I'm not sure about this! Does it gain us enough for the significant implementation awkwardness?

I think I'd be inclined to go back to the much simpler #117, and wait with this keyword-only thing till we are actually on Python 3.10.

I lean towards implementation pain now in order to have the backwards incompatibility land now, rather than have to do a Scenario 8.0 (or ops 3.0, depending on how things end up working) when Python 3.10 comes along.

But worth a voice discussion to try to come to consensus.

Sounds good :)

@benhoyt
Copy link
Collaborator

benhoyt commented Jun 6, 2024

I lean towards implementation pain now in order to have the backwards incompatibility land now, rather than have to do a Scenario 8.0 (or ops 3.0, depending on how things end up working) when Python 3.10 comes along.

We had the voice discussion, and I was forgetting that this was a backwards-incompatible change (not just a usability improvement). So yeah, after discussion I agree that this is probably worth it.

It's annoying and error-prone, but simple enough to understand.

@tonyandrewmeyer's going to do one more thing before we commit to this implementation: look at the code for kw_only=True in the Python 3.10 dataclasses implementation to see if it's easy/easier to reuse that instead of doing it by hand. But if not, or if we're on the fence, let's just keep doing it by hand.

@tonyandrewmeyer
Copy link
Collaborator Author

The PR that introduced keyword-only arguments to dataclasses isn't huge (particularly if you ignore the tests and docs - there are a couple of small bug fixes later, but this PR seems to have most of the code).

I could either factor that out to have a dataclass sub-class that had the extra functionality, or just lift the 3.10 dataclasses functionality entirely, into some sort of _dataclasses_compat.py module that we could use until we have 3.10 as a minimum.

This would contain the changes to a single place, keeping scenario.state clean, but it does feel like a more substantial change, compared to a bunch of __init__ that, while noisy and boilerplate, are pretty trivial to understand.

@benhoyt
Copy link
Collaborator

benhoyt commented Jun 6, 2024

Yep, fair enough -- let's KISS and put up with the implementation messiness for now. (And be careful we've double-checked all the types and defaults.) I do like all your "Python 3.10+" reminder comments though, that'll help us easily search and clean it up later.

@tonyandrewmeyer tonyandrewmeyer marked this pull request as draft June 6, 2024 07:52
@tonyandrewmeyer tonyandrewmeyer marked this pull request as ready for review June 6, 2024 20:48
@tonyandrewmeyer tonyandrewmeyer requested a review from benhoyt June 6, 2024 20:48
README.md Outdated Show resolved Hide resolved
scenario/state.py Outdated Show resolved Hide resolved
@tonyandrewmeyer
Copy link
Collaborator Author

@PietroPasotti this has been a pretty noisy PR, sorry (it probably would have been better in an issue and then a couple of draft PRs where the approaches were compared). But it's ready from my point of view and @benhoyt is happy with it. Would you mind doing a review when you have some time? Thanks!

Copy link
Collaborator

@PietroPasotti PietroPasotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of the way it's implemented honestly, let's see whether it makes sense to vendor our own dataclass-like decorator or use pydantic instead.

scenario/context.py Outdated Show resolved Hide resolved
scenario/context.py Outdated Show resolved Hide resolved
scenario/state.py Outdated Show resolved Hide resolved
scenario/state.py Outdated Show resolved Hide resolved
scenario/state.py Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
@tonyandrewmeyer
Copy link
Collaborator Author

tonyandrewmeyer commented Jun 14, 2024

Not a fan of the way it's implemented honestly, let's see whether it makes sense to vendor our own dataclass-like decorator or use pydantic instead.

I'm not a fan of the implementation either (and @benhoyt already pointed out he wasn't, above), although I am still very keen on getting the API change into 7.0.

I took another look at vendoring in the stdlib functionality. I think this would be ok if we can lift it mostly unchanged (I've had success doing that before elsewhere). I think if there's a lot of changes to make then it ends up being too brittle to maintain and the risk of introducing a bug too high.

Unfortunately, the stdlib implementation isn't all contained in the class - there are a bunch of private module-level functions that are involved. So vendoring by just subclassing, or really anything clean, doesn't seem viable.

The current main module has a lot of changes that don't work in 3.8 (f-strings that only work with the new parser, match-args, new typing stuff, etc).

The latest version in the 3.10 branch:

  • Uses types.GenericAlias
    • For InitVar.__repr__ which we could probably have broken if needed.
    • To test for ClassVar (which we are not currently using, and can still use explicitly, just not if ClassVar is aliased)
    • In is_dataclass() - if the object is a GenericAlias then it's not a dataclass. I think we could lose that, although I'm not 100% clear on the consequences.
    • Field.__class_getitem__ is set to classmethod(GenericAlias). I'm moderately sure that as long as we don't have any dataclass fields that are generic then we don't need this.
  • Calls abc.update_abstractmethods() on the dataclass. I'm moderately sure that as long as none of the dataclasses are ABCs, this is not required.

Other than that, it seems to work with some limited manual testing. To be more confident I'd want to really use it and at minimum make sure the Scenario tests pass, maybe actually copy the dataclass tests from the stdlib and check those too. Even with if does only require the 5 changes above, that still leaves me -0 on this approach.

I'm not necessarily opposed to pydantic, and I agree that it seems likely that some of the checking could be moved as a result. It feels like a more substantial change, though. I'm -0 on this.

@PietroPasotti also suggested just implementing a basic "these have to be keywords" checker ourselves (I think this also came up before somewhere, but can't find where - maybe it was the voice discussion). For example:

def _kwarg_only(*keywords):
    def decorator(cls):
        @wraps(cls)
        def call(*args, **kwargs):
            if diff := set(keywords).difference(set(kwargs)):
                raise TypeError(
                    f"{cls.__name__} requires {diff} as keyword argument(s)"
                )
            return cls(*args, **kwargs)

        return call

    return decorator


def _dc_base(kw_only_fields):
    deco_1 = dataclass(frozen=True)
    deco_2 = _kwarg_only(*kw_only_fields)
    return lambda cls: deco_2(deco_1(cls))

@_dc_base(kw_only_fields=('foo', 'bar'))
class SomeStateThing:
    ...

Or with __new__ rather than a decorator:

class _RequireKeywordArgs:
    def __new__(cls, *args, **kwargs):
        if diff := set(cls._kwargs).difference(set(kwargs)):
            raise TypeError(
                f"{cls.__name__} requires {', '.join(diff)} as keyword argument(s)"
            )
        return super().__new__(cls)


@dataclass(frozen=True)
class SomeStateThing(_RequireKeywordArgs):
    _kwargs = frozenset({'foo', 'bar'})
    ...

It would be more work to migrate this to the native support (than vendoring; less than the current approach), but it is vastly simpler, and does achieve the actual API goal I'm going for. I'm +1 on this approach.

@benhoyt
Copy link
Collaborator

benhoyt commented Jun 14, 2024

Thanks, Tony. Appreciate the push-back, Pietro. Yeah, we hate the current implementation too. I don't think we should move to Pydantic because of this; if we do that, we should consider it separately. And back-porting the stdlib's dataclasses module seems complicated and error-prone.

I think the approaches Tony suggested do seem cleaner; of the two, I think the _RequireKeywordArgs parent class seems simplest. However, I would want to make sure IDEs can still autocomplete the state class fields (and typecheckers work) with the overridden __new__.

@dimaqq
Copy link
Collaborator

dimaqq commented Jun 14, 2024

After thinking on this for a bit, how about this approach:

For the interim, where we've got to support Py 3.8 and 3.9, let's add an explicit custom __new__ to each leaf dataclass.

if Py 3.10 code would ultimately look like this:

@dataclasses.dataclass
class Example:
    a: int
    b: int
    _: dataclasses.KW_ONLY
    c: int
    d: int = 4

Then interim code would look like this:

@dataclasses.dataclass
class Example:
    a: int
    b: int
    c: int
    d: int = 4
    
    def __new__(cls, a: int, b: int, *, c: int, d: int = 4) -> Self:
        i = super().__new__(cls)
        i.__init__(a, b, c=c, d=d)
        return i

In other words, take the hit for supporting older python, write out things by hand, don't create magical base classes or metaclasses.

WDYT?

@tonyandrewmeyer
Copy link
Collaborator Author

The example above doesn't do the right job, because it requires that all of the specified arguments are present (and passed as keywords). We want them to be ok if they are missing and there's a default value as well.

However, this does work:

class _MaxPositionalArgs:
    """Raises TypeError when instantiating objects if arguments are not passed as keywords.
    
    Looks for a `_max_positional_args` class attribute, which should be an int
    indicating the maximum number of positional arguments that can be passed to
    `__init__` (excluding `self`). If not present, no limit is applied.
    """
    def __new__(cls, *args, **_):
        if len(args) > getattr(cls, "_max_positional_args", float("inf")):
            raise TypeError(
                f"{cls.__name__}.__init__() takes {cls._max_positional_args + 1} "
                f"positional arguments but {len(args) + 1} were given"
            )
        return super().__new__(cls)


@dataclasses.dataclass(frozen=True)
class Container(_MaxPositionalArgs):
    _max_positional_args = 1
    name: str

    can_connect: bool = False
    ...

The question about IDEs and type checkers is a good one, thanks! The decorator approach does seem to break both the IDE (language server) info and type checking.

With __new__, the hinting in the IDE isn't broken, although it's not quite as correct as with the explicit (super ugly) implementation.

Correct:

image

__new__:

image

However, I can still use auto-complete.

As expected, pyright can't detect as many issues. For example, with this code:

c = scenario.Container('foo')
c2 = scenario.Container('bar', can_connect=True)
c3 = scenario.Container(name='baz')
c4 = scenario.Container(name='qux', can_connect=True)
c5 = scenario.Container('foo', True)  # bad
c6 = scenario.Container()  # bad
c7 = scenario.Container('foo', rubbish=False)  # bad

The ugly but correct implementation gives:

$ pyright charm_test.py 
/tmp/kwarg-scenario-poc/charm_test.py
  /tmp/kwarg-scenario-poc/charm_test.py:7:32 - error: Expected 1 positional argument (reportCallIssue)
  /tmp/kwarg-scenario-poc/charm_test.py:8:6 - error: Argument missing for parameter "name" (reportCallIssue)
  /tmp/kwarg-scenario-poc/charm_test.py:9:32 - error: No parameter named "rubbish" (reportCallIssue)
3 errors, 0 warnings, 0 informations 

And with __new__ I get:

$ pyright charm_test.py 
/tmp/kwarg-scenario-poc/charm_test.py
  /tmp/kwarg-scenario-poc/charm_test.py:8:6 - error: Argument missing for parameter "name" (reportCallIssue)
  /tmp/kwarg-scenario-poc/charm_test.py:9:32 - error: No parameter named "rubbish" (reportCallIssue)
2 errors, 0 warnings, 0 informations 

So it's working in the sense that there aren't false positives, and it can still detect problems outside of the keyword requirement.

I think the much simpler implementation, without all the boilerplate and object.__setattr__ noise, outweighs the small advantage of the IDE and type checker also checking the keyword requirement. It's still there and any tests that are not using keywords will fail at runtime, so the API change is there, which is (imo) most important.

I played around a bit with having Python 3.10+ use the native stdlib functionality and 3.8 & 3.9 use this, but that got messy pretty quickly (particularly with the classes that aren't entirely keyword only), and when it gets too messy pyright and the IDE start losing the ability to follow along.

So: @benhoyt and @PietroPasotti I say we go with the simple __new__ approach. Fairly clean, gives the desired API change, IDE & checkers work just without picking up the requirement. What say you?

@dimaqq
Copy link
Collaborator

dimaqq commented Jun 14, 2024

I'm not sure whether Tony's comment about "the example above" applies to my __new__ proposal or something else... I've updated my __new__ crud to show how a default value would be handled.

@tonyandrewmeyer
Copy link
Collaborator Author

tonyandrewmeyer commented Jun 14, 2024

For the interim, where we've got to support Py 3.8 and 3.9, let's add an explicit custom __new__ to each leaf dataclass.

This is still a reasonable amount of boilerplate. It avoids the object.__setattr__ but we still have to write out a constructor for each class.

    def __new__(cls, a: int, b: int, *, c: int, d: int) -> Self:
        i = super().__new__(cls)
        i.__init__(a, b, c=c, d=d)
        return i

I'm pretty sure this calls __init__ twice, because __new__ doesn't normally call __init__, so after this __new__ calls __init__ explicitly, it will be called again. Probably that's just wasteful and not dangerous with a dataclass default __init__ I guess?

It actually gets simpler if that's fixed. For example, this would be Container (with the dicts typed more exactly):

    def __new__(cls, name, *, can_connect: bool=False, _base_plan: dict=None, layers: dict=None, service_status: dict=None, mounts: Dict=None, exec_mock: dict=None):
        return super().__new__(cls)

In my IDE (VS Code, with the default language server) and with pyright it has the same behaviour as the other __new__, ie. nothing breaks, but they also can't tell that there's a requirement (even though it's much more explicit in this case).

It's a reasonable suggestion, thanks! If it had made either the IDE or pyright detect the requirement to use a keyword arg, then I would lean towards it. However, since it's giving the same behaviour, but we still have to have all of the arguments written out, I'd still prefer to just centralise it in a common base class. It's not that magical, and it's not a metaclass (I did play with those to see if I could get this nicer, but had no luck so gave up).

I'm not sure whether Tony's comment about "the example above" applies to my __new__ proposal or something else...

The comment above was about my original __new__ proposal (properly implemented). I was writing the (long) comment above when yours appeared, so didn't see it before posting.

@benhoyt
Copy link
Collaborator

benhoyt commented Jun 14, 2024

IDE and type checking behaviour is not perfect but seems okay. I'm on board with the __new__ method; let's see what @PietroPasotti thinks.

@dimaqq
Copy link
Collaborator

dimaqq commented Jun 14, 2024

TIL

If __new__() is invoked during object construction and it returns an instance of cls, then the new instance’s __init__() method will be invoked like __init__(self[, ...]), where self is the new instance and the remaining arguments are the same as were passed to the object constructor.

If __new__() does not return an instance of cls, then the new instance’s __init__() method will not be invoked.

@PietroPasotti
Copy link
Collaborator

Yes, the __new__ approach sounds better to me.
Thanks for indulging me in my dislike of object.__setattr__ :)

scenario/state.py Outdated Show resolved Hide resolved
@tonyandrewmeyer
Copy link
Collaborator Author

@PietroPasotti I've cleaned up all the commit history (and it's clean against current 7.0 HEAD) but it's otherwise unchanged. Do you have time for a final (:crossed_fingers:) review before you head off? If not I could get someone in Charm-Tech to do it if you're still ok with the approach.

@dataclasses.dataclass(frozen=True)
class Secret:
class Secret(_MaxPositionalArgs):
# mapping from revision IDs to each revision's contents
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could turn this into an attribute docstring instead

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I realized I had not submitted a couple of random comments from 2 weeks ago... here they are)


_max_positional_args: Final = 1

def __post_init__(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we put this validation in _Port?

Copy link
Collaborator

@PietroPasotti PietroPasotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of suggestions regarding the maxpositionalargs class, for the rest I don't have enough time (before my holiday) to give another in-depth review but in general I remain OK with the approach and all the changes.

README.md Outdated Show resolved Hide resolved

def __new__(cls, *args, **_):
if len(args) > getattr(cls, "_max_positional_args", float("inf")):
raise TypeError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we can/should be more helpful here. We can make a pretty good guess of what attributes should be passed as keyword-only using inspect.

Not sure this is the best way, but:

    def __new__(cls, *args, **_):
        if (n_posargs:=len(args)) > (max_posargs := getattr(cls, "_max_positional_args", float("inf"))):
            kw_only = [{name: getattr(_ann, "__name__", str(_ann))} for name, _ann in 
                       cls.__annotations__.items() if _ann is not Final][max_posargs: n_posargs]
            raise TypeError(
                f"Error instantiating {cls.__name__}. "
                f"The following arguments are keyword-only: {kw_only}",
            )
        return super().__new__(cls)

This gets you:

image

Which is a bit more helpful imho than a generic typeerror?

WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just replicating what you get in Python if you use regular kwonly args (either just in a normal function or with the dataclasses.KW_ONLY thing; I assume it's the same under the hood).

But yes, we could improve on what the language does here. I'll work on that.

scenario/state.py Outdated Show resolved Hide resolved
@@ -120,6 +122,26 @@ class MetadataNotFoundError(RuntimeError):
"""Raised when Scenario can't find a metadata.yaml file in the provided charm root."""


# This can be replaced with the KW_ONLY dataclasses functionality in Python 3.10+.
class _MaxPositionalArgs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I like this approach a little bit better:

def max_posargs(n: int = None):
    class _MaxPositionalArgs:
        """Raises TypeError when instantiating objects if arguments are not passed as keywords.
        Looks for a `_max_positional_args` class attribute, which should be an int
        indicating the maximum number of positional arguments that can be passed to
        `__init__` (excluding `self`). If not present, no limit is applied.
        """
        _max_positional_args = n or float("inf")

        def __new__(cls, *args, **_):
            if (n_posargs := len(args)) > (max_n_posargs := cls._max_positional_args):
                kw_only = [{name: getattr(_ann, "__name__", str(_ann))} for name, _ann in
                           cls.__annotations__.items() if _ann is not Final][max_n_posargs: n_posargs]
                raise TypeError(
                    f"Error instantiating {cls.__name__}. "
                    f"The following arguments are keyword-only: {kw_only}",
                )
            return super().__new__(cls)
    return _MaxPositionalArgs


@dataclasses.dataclass
class Foo(max_posargs(2)):
    a: int
    b: int
    c: Dict[str, str]
    d: int = 1

instead of having to assign the _max_positional_args by hand every time, pass it to the superclass (factory).

If you don't like the idea of a max_posargs class factory, we could do the slightly more convoluted:

class _MaxPositionalArgs:
    def __new__(cls, n: int = None):
        class _MaxPositionalArgsInner:
            """Raises TypeError when instantiating objects if arguments are not passed as keywords.
            Looks for a `_max_positional_args` class attribute, which should be an int
            indicating the maximum number of positional arguments that can be passed to
            `__init__` (excluding `self`). If not present, no limit is applied.
            """
            _max_positional_args = n or float("inf")

            def __new__(cls, *args, **_):
                if (n_posargs := len(args)) > (max_n_posargs := cls._max_positional_args):
                    kw_only = [{name: getattr(_ann, "__name__", str(_ann))} for name, _ann in
                               cls.__annotations__.items() if _ann is not Final][max_n_posargs: n_posargs]
                    raise TypeError(
                        f"Error instantiating {cls.__name__}. "
                        f"The following arguments are keyword-only: {kw_only}",
                    )
                return super().__new__(cls)
        return _MaxPositionalArgsInner

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class factory seems good to me, thanks!

Also fix doing copy/deepcopy of the objects, and provide more information when the args are wrong (also per review).
@tonyandrewmeyer tonyandrewmeyer merged commit c3b5eba into canonical:7.0 Jul 8, 2024
2 checks passed
tonyandrewmeyer added a commit to canonical/operator that referenced this pull request Dec 17, 2024
The Scenario code that limits the number of positional arguments (needed
in Python 3.8 since the dataclasses there do not have `KW_ONLY`) is
quite expensive. We have a method of providing this functionality in
3.8, but it is unfortunately expensive, and therefore a regression from
older versions of Scenario.

I considered 4 options here:
* Manually write out the dozen or so `__init__` signatures (as we had at
one point in the [original
PR](canonical/ops-scenario#137)). I'd rather not
for the same reasons we made the decision at that time.
* Figure out a way to have a syntax that allows using regular
`dataclasses.KW_ONLY` with modern Python and only use our custom code on
older Python. I spent some time looking into this, and couldn't figure
out a way to do it where the code still seemed readable and maintainable
enough.
* Cache the most expensive work (what's in this PR).
* Don't bother doing anything now, and eagerly wait for the ability to
drop 3.8 support. A roughly 5% performance regression felt worth fixing
as long as the change is fairly straightforward (although this change
only gets about half of that back).

The most expensive parts of the `__new__` method are the ones that work
with `inspect`: `inspect.signature` in particular, but also getting the
default value of the parameters. If we assume that no-one is run-time
altering the signature of the class (I believe no-one should be) then
the values of these never actually change, but we are currently
calculating them every time an instance of the class is created. This PR
changes that to cache those three values the first time they are needed.

There's one additional performance tweak in the branch that doesn't make
a significant difference, but is trivial to do: when checking if the
YAML files exist, skip the filesystem `exists()` call if we just created
an empty temporary directory a few lines above, since we know that it
will never exist.

A drive-by: I happened to notice while working on this branch Harness
referring to `options.yaml`, which does not exist (any more?) as far as
I know, so a docs tweak to address that.

Timing (best of 3):

|Suite|main|branch|
|-|-|-|
|operator unit tests (no concurrency)|165.25s|161.89s|
|traefik scenario tests|45.49|44.30s|
|kafka-k8s-operator scenario tests|4.48s|4.38s|

Refs #1434
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants