-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive dataclasses #2
Comments
I think we could fix this (and the problem with annotations defined in control flow structures) by having class Annotations(MutableMapping):
def __init__(self):
self.thunks = {}
self.items = {}
def __getitem__(self, key):
with suppress(KeyError):
self.items[key] = self.thunks[key]()
del self.thunks[key]
return self.items[key]
def __setitem__(self, key, value):
with suppress(KeyError):
del self.thunks[key]
self.items[key] = value
def __delitem__(self, key):
with suppress(KeyError):
del self.thunks[key]
del self.items[key]
def __len__(self):
return len(self.thunks) + len(self.items)
def __keys__(self):
return set(self.thunks) | set(self.items)
...
# now function annotation
def foo(x: int = 3, y: MyType = None) -> float: ...
# is sugar for
def foo(x=3, y=None): ...
foo.__annotations__ = Annotations()
foo.__annotations__.thunks['x'] = lambda: int
foo.__annotations__.thunks['y'] = lambda: MyType
foo.__annotations__.thunks['return'] = lambda: float
# class annotation
class MyClass:
a: int
b: MyType
if condition:
c: float
# becomes sugar for
class MyClass:
__annotations__ = Annotations()
__annotations__.thunks['a'] = lambda: int
__annotations__.thunks['b'] = lambda: MyType
if condition:
__annotations__.thunks['c'] = lambda: float As long as |
I don't think that's true, unless I'm misunderstanding. The class is completely defined by the time the decorator is run. |
@ericvsmith the name only gets bound after decorators run. Given this code:
It gets compiled to:
The STORE_FAST storing the name X gets run after the CALL_FUNCTION implementing the decorator. |
You learn something new every day! Maybe we should change that. |
That would be a viable option for fixing the issue. We'd have to store the name, run the decorator, then store the name again. It would change observable behavior in cases like this:
Though I suppose we could make it |
Would the proposed solution work for dataclasses defined using a base class, like this one? By the way, I'm very excited about PEP 649. |
That link points to a random line in the file (presumably the file got updated since you took the link). Please use permalinks. Even better, post a simpler example here that shows the issue. I'm guessing you meant this but that class definition doesn't really help me understand your question. Jelle's example above is clear and self-contained, I like that. |
If I understand correctly, one idea here is to bind the name before calling the decorator. The problem with that approach is that the decorator may return a different class. I did that once, in my "bound inner classes" recipe: https://code.activestate.com/recipes/577070-bound-inner-classes/ This makes a class declared inside another class behave more like a function declared inside a class: If we bound the class before calling the descriptor, then in this case we'd have to bind it a second time, binding it to what's returned by the descriptor. Also, annotations for class members that refer to the class itself would retain a reference to the original class, not the class returned by the descriptor, which means they'd fail comparisons--the p.s. credit where credit is due: Alex Martelli wrote the original version of "bound inner classes", in response to my question on StackOverflow. |
I have one idea: dataclasses could inject its own descriptor for |
One workaround for dataclasses going with lazy approach would be something like, from __future__ import annotations
from dataclasses import dataclass
def lazy_dataclass(tp: type) -> type:
class LazyDataclass:
def __new__(cls, *args: object, **kwargs: object) -> object:
if not hasattr(cls, "actual_cls"):
cls.actual_cls = dataclass(tp)
return cls.actual_cls(*args, **kwargs)
return LazyDataclass Basic usage looks to be fine. The recursive User example also is fine. @dataclass
class Foo:
x: int
y: str
@lazy_dataclass
class Foo2:
x: int
y: str
foo1 = Foo(1, "a")
foo2 = Foo2(1, "a")
print(foo1) # Foo(x=1, y='a')
print(foo2) # Foo2(x=1, y='a')
@lazy_dataclass
class User:
name: str
friends: list[User] # type: ignore
user1 = User("Alice", friends=[User("Bob", friends=[])])
print(user1) # User(name='Alice', friends=[User(name='Bob', friends=[])]) This does definitely have user visible differences on the the type made although maybe you can get differences minimal enough to preserve public api. You also don't need to make the whole thing lazy. You could have only type analysis portions computed lazily on the first time they're really needed while as much else as possible be computed eagerly. I had same type of issue with a type analysis decorator and ended up moving all introspection logic to internal cached properties and tried to avoid calling them until really needed. |
The easiest workaround would be to simply not use the decorator syntax, instead calling from dataclasses import dataclass
class User:
name: str
friends: list[User]
User = dataclass(User) This would work fine with PEP 649 active. |
It mostly doesn't car exactly what is in the annotation, but it uses the keys of the dict to find out what the fields are, and it needs to find out when a variable is annotated with I don't particularly like the various workarounds suggested above. @hmc-cs-mdrissi's |
That would be essentially the same as binding the name before the decorator ran, right? |
Yes, but it works today without changing the language, and it's explicit rather than implicit. |
Solutions for references to the defining class itself (like @dataclass
class User:
name: str
group: Group # <--- error here
@dataclass
class Group:
name: str
admins: list[User] # Mutual recursion Existing solutions (without PEP 649) are:
PEP 649 as currently written does not solve this, and if we were to chose it in favor of PEP 563, the solution would be to use string quotes. A modified PEP 649 that returned some kind of error marker (similar to |
If we are willing to have typing tooling and dataclasses get their hands a bit dirty in Python's introspective flexibility, I think we could have our cake and eat it too. Larry can keep the default behavior of The trick is that if you need more magical behavior, instead of calling Let's say you are a documentation tool, and you want a dict with string representations of the annotations (a la PEP 563), without needing to worry about the real objects referenced in the annotations at all or which namespace they live in or whether they are forward refs. You inspect Dataclasses, which only cares about the keys in I think a similar approach can be used for any typing-related tooling that wants to get annotations at runtime while protecting itself against the possibility of |
Could that magic globals be a defaultdict? |
Thanks @carljm, this is a great idea! @gvanrossum defaultdict works at runtime, but it's not a great fit for many use cases, because defaultdict's default function doesn't have access to the key. Something like this would work well though:
|
The one thing I'm not sure about is whether using a non-dict for globals (if it works; afk now and haven't tested) is a language feature or a cpython implementation detail. Docs for eval suggest it has to be a dict? But maybe that's meant to include dict subclasses too. https://docs.python.org/3/library/functions.html#eval I think the idea can work either way given the option to introspect |
In CPython, only locals may be a non-dict mapping. |
Currently the documentation for exec() specifies that globals must be exactly a dict (not a subclass), but the one for eval() just says dict. python/cpython#25039 is proposing to remove the no-subclass requirement. Subclasses work at runtime for both (at least in 3.11 where I tried). |
Arriving from https://discuss.python.org/t/finding-edge-cases-for-peps-484-563-and-649-type-annotations/14314/2 my own thoughts were along the lines of @carljm's suggestion, but thinking at the decorator level (i.e. running the decorators in general with a pseudo-namespace in place that includes a name binding for the class, rather than only doing that for annotation evaluation). However, that isn't really any different in practice from simply doing early binding in the namespace containing the class definition, with the problems @larryhastings described above: it works fine for decorators that preserve the class identity (although they may still mutate the class in place, as What's making me more amenable to the idea is that a variant of this problem already exists, and has existed since we introduced the zero-arg super support:
Given that precedent, the notion of "when a decorator runs, the name in the defining namespace will refer to the same object as is being passed in as the first argument" (rather than leaving it bound to whatever it was previously bound to) seems less radical. I also don't think the notion of trapping |
Does this imply that in a dataclass created with slots=True, zero-arg super
doesn't work?
…On Thu, Apr 14, 2022 at 9:15 PM Nick Coghlan ***@***.***> wrote:
Arriving from
https://discuss.python.org/t/finding-edge-cases-for-peps-484-563-and-649-type-annotations/14314/2
my own thoughts were along the lines of @carljm
<https://github.com/carljm>'s suggestion, but thinking at the decorator
level (i.e. running the decorators in general with a pseudo-namespace in
place that includes a name binding for the class, rather than only doing
that for annotation evaluation).
However, that isn't really any different in practice from simply doing
early binding in the namespace containing the class definition, with the
problems @larryhastings <https://github.com/larryhastings> described
above: it works fine for decorators that preserve the class identity
(although they may mutate the class in place), but is potentially fraught
with confusion if the decorators change the class identity.
What's making me more amenable to the idea is that a variant of this
problem already exists, and has existed since we introduced the zero-arg
super support: __class__ (and zero arg super) always refers to the result
of the "class" statement *prior* to decorator application, so changing
identity in a class decorator already has the potential to introduce
problems.
>>> def replace_the_class(cls):
... global replaced_cls
... replaced_cls = cls
... return object()
...
>>> @replace_the_class
... class Example:
... @classmethod
... def get_defining_class(cls):
... return __class__
...
>>> Example
<object object at 0x0000022496E848C0>
>>> replaced_cls
<class '__main__.Example'>
>>> replaced_cls.get_defining_class()
<class '__main__.Example'>
Given that precedent, the notion of "when a decorator runs, the name in
the defining namespace will refer to the same object as is being passed in
as the first argument" (rather than leaving it bound to whatever it was
previously bound to) seems less radical.
I also don't think the notion of trapping NameError is viable, as there's
no guarantee that NameError will be raised: the class name might already
be defined and refer to something else. That's technically also a backwards
compatibility challenge for changing the name binding behaviour, but lazy
annotations will require a future import anyway, so an eager name binding
change could be tied in to that same feature flag.
—
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAWCWMT3FXN6SASWGVFLZ4LVFDUNJANCNFSM4Z5HINSA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
--Guido van Rossum (python.org/~guido)
|
Indeed, looks like that's a genuine existing bug arising from the pre-decoration/post-decoration discrepancy:
Seems to be an existing issue report for the problem here: python/cpython#90562 |
I think that (and similar) bugs may turn the potential precedent on its head: the existing problems with the That would push more towards approaches like @carljm's suggestion, where work would focus on providing ways to customise the annotation name resolution at runtime for use cases that wanted to implicitly deal with forward references (including references to a class currently being defined). |
Yes. For an example of this, see python/cpython#90055 |
On 4/15/22 02:05, Eric V. Smith wrote:
Does this imply that in a dataclass created with slots=True,
zero-arg super doesn't work?
Yes. For an example of this, see python/cpython#90055
<python/cpython#90055>
So, if we change Python so that processing of __slots__ is done lazily,
during the creation of the first instance of the class (or a subclass),
dataclass could skip its subclass hack and this problem would cease to
exist?
//arry/
|
Yes. It also occurs to me that dataclasses could create the new class first, then add methods, instead of creating the class after the methods are added, which is what it does now. This wouldn’t help with user methods, but I think would fix dataclasses’s added methods. But maybe there’s another issue with this, I haven’t tried it yet. |
And beware that other implementations would have to follow suit. |
I'm sorry if this was discussed previously but wouldn't the following solve the issues while being simpler than current proposals ? The idea is to have Python automatically replace undefined closures in annotations by some So the user code above would be automatically transformed by python to: @dataclasses.dataclass
class User:
name: str
group: ForwardRef(lambda: Group)
@dataclasses.dataclass
class Group:
name: str
admins: list[ForwardRef(lambda: User)] Then Here is an example: https://gist.github.com/Conchylicultor/d41f3637e44be7dcd361af37b597451d print(type_hints(User)) # {'name': <class 'str'>, 'group': <class '__main__.Group'>}
print(type_hints(Group)) # {'name': <class 'str'>, 'admins': list[__main__.User]} And it also works when classes are defined within functions: User, Group = user_group_class_factory() # class defined inside the function
print(type_hints(User)) # {'name': <class 'str'>, 'group': <class '__main__.user_group_class_factory.<locals>.Group'>}
print(type_hints(Group)) # {'name': <class 'str'>, 'admins': list[__main__.user_group_class_factory.<locals>.User]} What would be the drawback/limitations ? Am I missing something obvious ? |
The obvious way to write the type for |
Yes, this would still be the case. Did I missunderstand you ?
I think it does: Compared to PEP 563, it solve all limitations pointed out by https://peps.python.org/pep-0649/#motivation
Compared to current PEP 649, I think the lazy
class A:
x: B
# Can inspect typing annotations even when `B` is not yet defined
print(A.__annotations__) # {'a': ForwardRef(lambda: B)}
# Mutate works as expected
A.__annotations__['y'] = int
class B:
pass
# Only works after B has been resolved
type_hints(A) # {'x': B, 'y': int}
|
@Conchylicultor I think your proposal is effectively similar to what @JelleZijlstra proposed in #3 , with the added wrinkle of replacing the If your proposal aims to be simpler than PEP 649 by not wrapping the overall evaluation of annotations in a function, but eagerly executing annotations and only wrapping undefined names in a |
One small wrinkle I noticed with the "eval the code object" trick is that currently I think this should be pretty easily resolvable; in principle I don't think there's any reason |
I still feel that the code objects computing the annotations are sufficiently special to allow whatever special-casing we need to give it the best user experience. |
So I tried to implement lazy To make @dataclasses.dataclass
class User:
def __co_annotations__():
return {"name": str, "group": Group}
# Get incomplete type hints
incomplete = type_hints(User) # {"name": str, "group": ForwardRef(Group)}
@dataclasses.dataclass
class Group:
def __co_annotations__():
return {
"name": str,
"group": list[User],
"inexisting": DoesNotExists, # e.g. a `if TYPE_CHECKIING:` import
}
# Once `Group` is created, ForwardRef can be resolved
assert incomplete['group'].resolve() is Group
# Calling the annotations after the forward refs are resolved by default.
type_hints(User) # {"name": str, "group": Group}
type_hints(Group) # {"name": str, "group": list[User], "inexisting": ForwardRef(DoesNotExists)} And this also works if There's some edge cases I'm missing (e.g. This would also fix #1 (at least at the same level as PEP 563) But overall, once all edge cases are fixed, maybe |
The accepted version of PEP 649 covers this. |
Another problem brought up by Joseph Perez on python-dev:
This will break because the implementation of the
dataclass
decorator will access the class's__annotations__
, but at that pointUser
is not defined yet, so the user will get a NameError.This is a similar problem to #1, and possible solutions are similar:
list["User"]
.The text was updated successfully, but these errors were encountered: