-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: require keyword arguments for most dataclasses #137
feat!: require keyword arguments for most dataclasses #137
Conversation
@benhoyt @PietroPasotti wdyt? I like this over #117, but could be swayed by an argument to just do the smaller #117 and leave this for now. @PietroPasotti with your much larger experience actually writing scenarios I assume you'd have opinions on whether I've picked the right args to be kw-only also. If we go this way, wdyt about the minimum Python version being higher than ops? |
I do prefer this over #117 too. However, I'm not sure about the bump to require Python 3.10. It's fine while ops-scenario is still in a separate repo, but if we're planning to merge this into |
Ah, good point, I hadn't thought about that. Even if it's a If @PietroPasotti is also on board with this over #117, then I'll rework it to be Python 3.8 compatible. Is the ops minimum version essentially constrained to be the minimum that's in an Ubuntu LTS that's still in standard support? |
Yeah, or the minimum from the oldest LTS that Juju supports. |
I like this one more too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me code-wise.
Deferring to Pietro wrt. the API.
@benhoyt @PietroPasotti I've reworked this to be Python 3.8 compatible, so it's ready for review when you have time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the intent of this change, but now that I see the implementation, and how much (error-prone) boilerplate it is, I'm not sure about this! Does it gain us enough for the significant implementation awkwardness?
I think I'd be inclined to go back to the much simpler #117, and wait with this keyword-only thing till we are actually on Python 3.10.
But worth a voice discussion to try to come to consensus.
I lean towards implementation pain now in order to have the backwards incompatibility land now, rather than have to do a Scenario 8.0 (or ops 3.0, depending on how things end up working) when Python 3.10 comes along.
Sounds good :) |
We had the voice discussion, and I was forgetting that this was a backwards-incompatible change (not just a usability improvement). So yeah, after discussion I agree that this is probably worth it. It's annoying and error-prone, but simple enough to understand. @tonyandrewmeyer's going to do one more thing before we commit to this implementation: look at the code for |
The PR that introduced keyword-only arguments to dataclasses isn't huge (particularly if you ignore the tests and docs - there are a couple of small bug fixes later, but this PR seems to have most of the code). I could either factor that out to have a dataclass sub-class that had the extra functionality, or just lift the 3.10 dataclasses functionality entirely, into some sort of _dataclasses_compat.py module that we could use until we have 3.10 as a minimum. This would contain the changes to a single place, keeping scenario.state clean, but it does feel like a more substantial change, compared to a bunch of |
Yep, fair enough -- let's KISS and put up with the implementation messiness for now. (And be careful we've double-checked all the types and defaults.) I do like all your "Python 3.10+" reminder comments though, that'll help us easily search and clean it up later. |
@PietroPasotti this has been a pretty noisy PR, sorry (it probably would have been better in an issue and then a couple of draft PRs where the approaches were compared). But it's ready from my point of view and @benhoyt is happy with it. Would you mind doing a review when you have some time? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of the way it's implemented honestly, let's see whether it makes sense to vendor our own dataclass-like decorator or use pydantic instead.
I'm not a fan of the implementation either (and @benhoyt already pointed out he wasn't, above), although I am still very keen on getting the API change into 7.0. I took another look at vendoring in the stdlib functionality. I think this would be ok if we can lift it mostly unchanged (I've had success doing that before elsewhere). I think if there's a lot of changes to make then it ends up being too brittle to maintain and the risk of introducing a bug too high. Unfortunately, the stdlib implementation isn't all contained in the class - there are a bunch of private module-level functions that are involved. So vendoring by just subclassing, or really anything clean, doesn't seem viable. The current main module has a lot of changes that don't work in 3.8 (f-strings that only work with the new parser, match-args, new typing stuff, etc). The latest version in the 3.10 branch:
Other than that, it seems to work with some limited manual testing. To be more confident I'd want to really use it and at minimum make sure the Scenario tests pass, maybe actually copy the dataclass tests from the stdlib and check those too. Even with if does only require the 5 changes above, that still leaves me -0 on this approach. I'm not necessarily opposed to pydantic, and I agree that it seems likely that some of the checking could be moved as a result. It feels like a more substantial change, though. I'm -0 on this. @PietroPasotti also suggested just implementing a basic "these have to be keywords" checker ourselves (I think this also came up before somewhere, but can't find where - maybe it was the voice discussion). For example: def _kwarg_only(*keywords):
def decorator(cls):
@wraps(cls)
def call(*args, **kwargs):
if diff := set(keywords).difference(set(kwargs)):
raise TypeError(
f"{cls.__name__} requires {diff} as keyword argument(s)"
)
return cls(*args, **kwargs)
return call
return decorator
def _dc_base(kw_only_fields):
deco_1 = dataclass(frozen=True)
deco_2 = _kwarg_only(*kw_only_fields)
return lambda cls: deco_2(deco_1(cls))
@_dc_base(kw_only_fields=('foo', 'bar'))
class SomeStateThing:
... Or with class _RequireKeywordArgs:
def __new__(cls, *args, **kwargs):
if diff := set(cls._kwargs).difference(set(kwargs)):
raise TypeError(
f"{cls.__name__} requires {', '.join(diff)} as keyword argument(s)"
)
return super().__new__(cls)
@dataclass(frozen=True)
class SomeStateThing(_RequireKeywordArgs):
_kwargs = frozenset({'foo', 'bar'})
... It would be more work to migrate this to the native support (than vendoring; less than the current approach), but it is vastly simpler, and does achieve the actual API goal I'm going for. I'm +1 on this approach. |
Thanks, Tony. Appreciate the push-back, Pietro. Yeah, we hate the current implementation too. I don't think we should move to Pydantic because of this; if we do that, we should consider it separately. And back-porting the stdlib's dataclasses module seems complicated and error-prone. I think the approaches Tony suggested do seem cleaner; of the two, I think the |
After thinking on this for a bit, how about this approach: For the interim, where we've got to support Py 3.8 and 3.9, let's add an explicit custom if Py 3.10 code would ultimately look like this: @dataclasses.dataclass
class Example:
a: int
b: int
_: dataclasses.KW_ONLY
c: int
d: int = 4 Then interim code would look like this: @dataclasses.dataclass
class Example:
a: int
b: int
c: int
d: int = 4
def __new__(cls, a: int, b: int, *, c: int, d: int = 4) -> Self:
i = super().__new__(cls)
i.__init__(a, b, c=c, d=d)
return i In other words, take the hit for supporting older python, write out things by hand, don't create magical base classes or metaclasses. WDYT? |
The example above doesn't do the right job, because it requires that all of the specified arguments are present (and passed as keywords). We want them to be ok if they are missing and there's a default value as well. However, this does work: class _MaxPositionalArgs:
"""Raises TypeError when instantiating objects if arguments are not passed as keywords.
Looks for a `_max_positional_args` class attribute, which should be an int
indicating the maximum number of positional arguments that can be passed to
`__init__` (excluding `self`). If not present, no limit is applied.
"""
def __new__(cls, *args, **_):
if len(args) > getattr(cls, "_max_positional_args", float("inf")):
raise TypeError(
f"{cls.__name__}.__init__() takes {cls._max_positional_args + 1} "
f"positional arguments but {len(args) + 1} were given"
)
return super().__new__(cls)
@dataclasses.dataclass(frozen=True)
class Container(_MaxPositionalArgs):
_max_positional_args = 1
name: str
can_connect: bool = False
... The question about IDEs and type checkers is a good one, thanks! The decorator approach does seem to break both the IDE (language server) info and type checking. With Correct:
However, I can still use auto-complete. As expected, pyright can't detect as many issues. For example, with this code: c = scenario.Container('foo')
c2 = scenario.Container('bar', can_connect=True)
c3 = scenario.Container(name='baz')
c4 = scenario.Container(name='qux', can_connect=True)
c5 = scenario.Container('foo', True) # bad
c6 = scenario.Container() # bad
c7 = scenario.Container('foo', rubbish=False) # bad The ugly but correct implementation gives: $ pyright charm_test.py
/tmp/kwarg-scenario-poc/charm_test.py
/tmp/kwarg-scenario-poc/charm_test.py:7:32 - error: Expected 1 positional argument (reportCallIssue)
/tmp/kwarg-scenario-poc/charm_test.py:8:6 - error: Argument missing for parameter "name" (reportCallIssue)
/tmp/kwarg-scenario-poc/charm_test.py:9:32 - error: No parameter named "rubbish" (reportCallIssue)
3 errors, 0 warnings, 0 informations And with $ pyright charm_test.py
/tmp/kwarg-scenario-poc/charm_test.py
/tmp/kwarg-scenario-poc/charm_test.py:8:6 - error: Argument missing for parameter "name" (reportCallIssue)
/tmp/kwarg-scenario-poc/charm_test.py:9:32 - error: No parameter named "rubbish" (reportCallIssue)
2 errors, 0 warnings, 0 informations So it's working in the sense that there aren't false positives, and it can still detect problems outside of the keyword requirement. I think the much simpler implementation, without all the boilerplate and I played around a bit with having Python 3.10+ use the native stdlib functionality and 3.8 & 3.9 use this, but that got messy pretty quickly (particularly with the classes that aren't entirely keyword only), and when it gets too messy pyright and the IDE start losing the ability to follow along. So: @benhoyt and @PietroPasotti I say we go with the simple |
I'm not sure whether Tony's comment about "the example above" applies to my |
This is still a reasonable amount of boilerplate. It avoids the
I'm pretty sure this calls It actually gets simpler if that's fixed. For example, this would be def __new__(cls, name, *, can_connect: bool=False, _base_plan: dict=None, layers: dict=None, service_status: dict=None, mounts: Dict=None, exec_mock: dict=None):
return super().__new__(cls) In my IDE (VS Code, with the default language server) and with pyright it has the same behaviour as the other It's a reasonable suggestion, thanks! If it had made either the IDE or pyright detect the requirement to use a keyword arg, then I would lean towards it. However, since it's giving the same behaviour, but we still have to have all of the arguments written out, I'd still prefer to just centralise it in a common base class. It's not that magical, and it's not a metaclass (I did play with those to see if I could get this nicer, but had no luck so gave up).
The comment above was about my original |
IDE and type checking behaviour is not perfect but seems okay. I'm on board with the |
TIL
|
Yes, the |
af8c7cd
to
fbec5ad
Compare
@PietroPasotti I've cleaned up all the commit history (and it's clean against current 7.0 HEAD) but it's otherwise unchanged. Do you have time for a final (:crossed_fingers:) review before you head off? If not I could get someone in Charm-Tech to do it if you're still ok with the approach. |
@dataclasses.dataclass(frozen=True) | ||
class Secret: | ||
class Secret(_MaxPositionalArgs): | ||
# mapping from revision IDs to each revision's contents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could turn this into an attribute docstring instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I realized I had not submitted a couple of random comments from 2 weeks ago... here they are)
|
||
_max_positional_args: Final = 1 | ||
|
||
def __post_init__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we put this validation in _Port
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of suggestions regarding the maxpositionalargs class, for the rest I don't have enough time (before my holiday) to give another in-depth review but in general I remain OK with the approach and all the changes.
scenario/state.py
Outdated
|
||
def __new__(cls, *args, **_): | ||
if len(args) > getattr(cls, "_max_positional_args", float("inf")): | ||
raise TypeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we can/should be more helpful here. We can make a pretty good guess of what attributes should be passed as keyword-only using inspect
.
Not sure this is the best way, but:
def __new__(cls, *args, **_):
if (n_posargs:=len(args)) > (max_posargs := getattr(cls, "_max_positional_args", float("inf"))):
kw_only = [{name: getattr(_ann, "__name__", str(_ann))} for name, _ann in
cls.__annotations__.items() if _ann is not Final][max_posargs: n_posargs]
raise TypeError(
f"Error instantiating {cls.__name__}. "
f"The following arguments are keyword-only: {kw_only}",
)
return super().__new__(cls)
This gets you:
Which is a bit more helpful imho than a generic typeerror?
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just replicating what you get in Python if you use regular kwonly args (either just in a normal function or with the dataclasses.KW_ONLY thing; I assume it's the same under the hood).
But yes, we could improve on what the language does here. I'll work on that.
scenario/state.py
Outdated
@@ -120,6 +122,26 @@ class MetadataNotFoundError(RuntimeError): | |||
"""Raised when Scenario can't find a metadata.yaml file in the provided charm root.""" | |||
|
|||
|
|||
# This can be replaced with the KW_ONLY dataclasses functionality in Python 3.10+. | |||
class _MaxPositionalArgs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I like this approach a little bit better:
def max_posargs(n: int = None):
class _MaxPositionalArgs:
"""Raises TypeError when instantiating objects if arguments are not passed as keywords.
Looks for a `_max_positional_args` class attribute, which should be an int
indicating the maximum number of positional arguments that can be passed to
`__init__` (excluding `self`). If not present, no limit is applied.
"""
_max_positional_args = n or float("inf")
def __new__(cls, *args, **_):
if (n_posargs := len(args)) > (max_n_posargs := cls._max_positional_args):
kw_only = [{name: getattr(_ann, "__name__", str(_ann))} for name, _ann in
cls.__annotations__.items() if _ann is not Final][max_n_posargs: n_posargs]
raise TypeError(
f"Error instantiating {cls.__name__}. "
f"The following arguments are keyword-only: {kw_only}",
)
return super().__new__(cls)
return _MaxPositionalArgs
@dataclasses.dataclass
class Foo(max_posargs(2)):
a: int
b: int
c: Dict[str, str]
d: int = 1
instead of having to assign the _max_positional_args
by hand every time, pass it to the superclass (factory).
If you don't like the idea of a max_posargs class factory, we could do the slightly more convoluted:
class _MaxPositionalArgs:
def __new__(cls, n: int = None):
class _MaxPositionalArgsInner:
"""Raises TypeError when instantiating objects if arguments are not passed as keywords.
Looks for a `_max_positional_args` class attribute, which should be an int
indicating the maximum number of positional arguments that can be passed to
`__init__` (excluding `self`). If not present, no limit is applied.
"""
_max_positional_args = n or float("inf")
def __new__(cls, *args, **_):
if (n_posargs := len(args)) > (max_n_posargs := cls._max_positional_args):
kw_only = [{name: getattr(_ann, "__name__", str(_ann))} for name, _ann in
cls.__annotations__.items() if _ann is not Final][max_n_posargs: n_posargs]
raise TypeError(
f"Error instantiating {cls.__name__}. "
f"The following arguments are keyword-only: {kw_only}",
)
return super().__new__(cls)
return _MaxPositionalArgsInner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class factory seems good to me, thanks!
Also fix doing copy/deepcopy of the objects, and provide more information when the args are wrong (also per review).
The Scenario code that limits the number of positional arguments (needed in Python 3.8 since the dataclasses there do not have `KW_ONLY`) is quite expensive. We have a method of providing this functionality in 3.8, but it is unfortunately expensive, and therefore a regression from older versions of Scenario. I considered 4 options here: * Manually write out the dozen or so `__init__` signatures (as we had at one point in the [original PR](canonical/ops-scenario#137)). I'd rather not for the same reasons we made the decision at that time. * Figure out a way to have a syntax that allows using regular `dataclasses.KW_ONLY` with modern Python and only use our custom code on older Python. I spent some time looking into this, and couldn't figure out a way to do it where the code still seemed readable and maintainable enough. * Cache the most expensive work (what's in this PR). * Don't bother doing anything now, and eagerly wait for the ability to drop 3.8 support. A roughly 5% performance regression felt worth fixing as long as the change is fairly straightforward (although this change only gets about half of that back). The most expensive parts of the `__new__` method are the ones that work with `inspect`: `inspect.signature` in particular, but also getting the default value of the parameters. If we assume that no-one is run-time altering the signature of the class (I believe no-one should be) then the values of these never actually change, but we are currently calculating them every time an instance of the class is created. This PR changes that to cache those three values the first time they are needed. There's one additional performance tweak in the branch that doesn't make a significant difference, but is trivial to do: when checking if the YAML files exist, skip the filesystem `exists()` call if we just created an empty temporary directory a few lines above, since we know that it will never exist. A drive-by: I happened to notice while working on this branch Harness referring to `options.yaml`, which does not exist (any more?) as far as I know, so a docs tweak to address that. Timing (best of 3): |Suite|main|branch| |-|-|-| |operator unit tests (no concurrency)|165.25s|161.89s| |traefik scenario tests|45.49|44.30s| |kafka-k8s-operator scenario tests|4.48s|4.38s| Refs #1434
Makes the arguments to the various dataclasses require keyword args wherever there isn't an obvious positional order.
Action
- name as first positional argument, and everything else as kw-only (maybeparams
as second would be ok?)ActionOutput
- all args are kw-only - it seems unlikely anyone would be instantiating this themselvesAddress
- all args are kw-onlyContainer
- name as first positional argument, and everything else as kw-onlyNetwork
- all args are kw-onlyPort
- protocol and port are both kw-onlySecret
- contents as first positional argument (previously id was first) and everything else as kw-only (another 7.0 PR removes the need to manually specify the ID)StoredState
- all args are kw-onlyState
- all args are kw-only (it seems like this was the only reasonable way to use it previously anyway)This is an alternative to #117 - it's a bigger change, but it does seem like the better one, and since it breaks backwards compatibility significantly it'd be better do it now than later if we do want this change.
Compatibility with Python 3.8 is retained - basically, we're sacrificing some cleanliness of implementation in exchange for cleanliness of API. This will pay off, particularly in the longer term when we can move up to Python 3.10+ (likely 3.11, maybe in mid 2025), and all of the subclassing can be replaced with
dataclass(kw_only=True)
anddataclasses.KW_ARG
.