Replies: 4 comments 5 replies
-
This is really interesting! I was glossing over until I got to the Python <-> native type interop.Will there be some way for a DataFrame library to specify a mapping/conversion of Python to native types? In particular, it would be nice if something like this would type check: def int_to_str(x: int) -> str:
return str(x)
def str_to_int(x: str) -> int:
return int(x)
Series([1, 2, 3, 4], dtype=int64).apply(int_to_str).dtype # utf8
Series([1, 2, 3, 4], dtype=int64).apply(str_to_int) # error This also has interplay with Dynamic assignment of series/creation of a derived TDAn operation that is very common with data frames but uncommon with TypedDict is re-assigning a field to a different type: df = DataFrame(dtypes={"a": int64, "b": utf8})
df["b"] = Series(dtype=float) |
Beta Was this translation helpful? Give feedback.
-
I like this idea, one small point, how does this interact with PEP 695? |
Beta Was this translation helpful? Give feedback.
-
Here's an alternative implementation strategy that would work well with PEP 695 (without a need for further syntax changes) and perhaps make the behavior more obvious (cross-posted from https://discuss.python.org/t/pep-696-type-defaults-for-typevarlikes/22569/15?u=jelle):
from typing import TypedDict
class MyGeneric[TD: TypedDict]: ...
d: MyGeneric[TypedDict[{"foo": int, "bar": str, "baz": bool}]] = ... This would work mostly like an existing TypeVar, except that type checkers would allow TypedDict-specific operations on values of type
from typing import Literal, TypedDict, KeyType
def want_literals(arg: Literal["a", "b"]): ...
arg1: KeyType[TypedDict[{"a": int, "b": str}]]
want_literals(arg1) # ok
arg2: KeyType[TypedDict[{"a": int, "c": str}]]
want_literals(arg2) # rejected, Literal["a", "c"] is incompatible with Literal["a", "b"] This operator would work on both concrete TypedDicts and TypeVars bound to TypedDict or a subtype. We would also add Edit: Eric actually suggested something very similar above (#1387 (reply in thread)). |
Beta Was this translation helpful? Give feedback.
-
There may be value in extending this mechanism to from __future__ import annotations
from typing import *
from enum import auto, Enum
class SomeEnum(Enum):
ALFA = auto()
BRAVO = auto()
SomeEnumName = Literal['ALFA', 'BRAVO'] # not DRY
def some_func(some_enum: Union[SomeEnum, SomeEnumName]) -> None:
# preprocess 'some_enum' arg:
if isinstance(some_enum, str):
some_enum = SomeEnum[some_enum]
assert isinstance(some_enum, SomeEnum)
# do useful things
... |
Beta Was this translation helpful? Give feedback.
-
prior discussion:
Table of contents
.key
and.value
work?Map
work?**
unpacking not needed?TD.key_union
andTD.value_union
dataclass_transform
Basic idea
Nikita Sobolev proposed a nice inline syntax for TypedDict which will probably end up looking like this:
The nice thing is that it doesn’t require any grammar change in Python.
I think the problem of “key types” for Pandas
DataFrames
and other TypedDict-like containers can be solved the same way! Without any grammar changes!We just need a new TypeVar-like:
TypeVarDict
, which is a generalization ofTypeVarTuple
but also shares a lot of traits withParamSpec
. Basic usage:but to make it really useful, we also need
TD.key
,TD.value
and the new special formMap
.As a motivating example, here is how you would type-annotate
pandas.DataFrame
with this (the most important part is the definition of__getitem__
):Now I’ll explain everything in more detail.
How do
.key
and.value
work?If
TD
is aTypeVarDict
, then whenever you useTD.key
in a function signature, you also have to useTD.value
and vice versa (just like withParamSpec
’s.args
and.kwargs
).TD.key
andTD.value
are essentially expanded as overloads. So, for example, say we have the following class, which usesTD.key
andTD.value
in its__setitem__
method:Then this is equivalent to:
TD.key
andTD.value
can also appear in the return type, as in:How does
Map
work?To really make
TypeVarDict
useful, the special formMap
has to be introduced as well.Map
was introduced in this proto-PEP.It works like this:
This is needed for example in the definition of
read_csv
:The
dtype
object that you pass in will look something like{"col1": np.int64}
but that has typedict[{"col1": type[np.int64]}]
, and not typedict[{"col1": np.int64}]
which is what we need in order to infer the correct type for theDataFrame
.So, the
type[]
needs to be stripped away somehow. That is whatMap
does: thedtype
we pass in has typedict[{"col1": type[np.int64]}]
which gets matched todict[Map[type, TD]]
which means thatTD
is inferred as{"col1": np.int64}
, just as we wanted.Aside: the proto-PEP linked above defines
Map
to be used onTypeVarTuples
like this:Why is
**
unpacking not needed?In PEP 646 where
TypeVarTuples
were introduced, it is specified thatTypeVarTuples
must always be unpacked with*
, as inTuple[*Ts]
. Why is this not needed here?That's because there's not actually anything to spread. In this way,
TypeVarDict
is more akin toParamSpec
thanTypeVarTuple
.Consider:
can be
A[int, str]
orA[bool, bool, bool]
. That is, the*Ts
takes up an arbitrary number of “top-level slots” in theA[...]
expression.But ParamSpec with
only takes up one “top-level slot”:
B[[str, str]]
, meaning that anotherTypeVar
could come after:with, for example,
C[[int, str], bool]
. And indeed, it is very common to have aTypeVar
after aParamSpec
(which isn’t possible withTypeVarTuple
).Like
ParamSpec
,TypeVarDict
also only takes up one “top-level slot”:can be
D[{"foo": str}, bool]
orD[{"bar": bool, "baz": bool}, str]
.So, as with
ParamSpec
, there should be no unpacking withTypeVarDict
.The unpacking is only needed for annotating
**kwargs
as specified in PEP 692, whereTD
acts like an arbitraryTypedDict
:Though, the grammar change in PEP 692 was rejected, so in practice it’s:
Bonus feature:
TD.key_union
andTD.value_union
In addition to
TD.key
andTD.value
, there could also beTD.key_union
andTD.value_union
.TD.key_union
would be the union of allkey
literals andTD.value_union
would be the union of all value types.This would, for example, be useful for typing
.keys()
and.values()
inTypedDict
s:Who would use this?
Any library that has DataFrame-like objects:
Dict-wrappers like ModuleDict in PyTorch.
Potentially, ORMs like SQLAlchemy?
Aside on the rejected PEP 637
PEP 637 proposed to add this syntax:
matrix[row=20, col=40]
, which would have been a perfect fit forTypeVarDict
, but I think the syntax with the curly braces is also fine.Comparison to
dataclass_transform
PEP 681’s
dataclass_transform
allows us to create a base class such that all subclasses act likedataclasses
:This allows you to get somewhat similar behavior to the proposed
TypeVarDict
, but I see several shortcomings:Map
functionality which allows us to return aSeries
object fordf["col1"]
instead of just thedtype
as in the abovedataclass_transform
exampleBeta Was this translation helpful? Give feedback.
All reactions