-
Notifications
You must be signed in to change notification settings - Fork 794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type hints: Parts of folders "vegalite", "v5", and "utils" #2976
Conversation
…why mypy suddenly complains about those...
…e might not yet want users to rely on them. Gives more flexibility in the future to change them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments inline, but this looks great.
I'm looking forward to Altair being fully typed!
|
||
class _DataJsonUrlDict(TypedDict): | ||
url: str | ||
format: _JsonFormatDict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should format be optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far this is only used as a return type for to_json
which always returns format: _JsonFormatDict
so I think it's accurate for that usage. But maybe you mean it could be confused to mean that a data json dictionary always needs format specified? I renamed the classes to make it clearer that these are not general schemas but specific type hints for those functions.
Does that address it?
_TDataType = TypeVar("_TDataType", bound=_DataType) | ||
|
||
_VegaLiteDataDict = Dict[str, Union[str, dict, List[dict]]] | ||
_ToValuesReturnType = Dict[str, Union[dict, List[dict]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the best way to do this, but should the type encode that this dictionary only has a top-level "values" key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree it would be nice and we could use a TypedDict
. Unfortunately, these lines in the function allow a dictionary to have more then just values
as a top-level key in case data
is a dictionary:
elif isinstance(data, dict):
if "values" not in data:
raise KeyError("values expected in data dict, but not present.")
return data
I think there is currently no way to use a TypedDict
with some fixed keys and then allow for arbitrary extra keys, see this discussion over in the mypy repo.
…such a redundant statement
…rn type of functions
Thank you @jonmmease for the review! I addressed your comments and left two open for you to check. In any case, I'll leave it open for a while to see if Mattijn has any inputs. There is no rush to get this merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates and clarifications. lgtm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @binste! One in-line comment + question.
altair/utils/core.py
Outdated
|
||
|
||
def infer_vegalite_type( | ||
data: Union[np.ndarray, pd.Series] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question, here you mention: data: Union[np.ndarray, pd.Series]
. When I look to where this is used: https://github.com/altair-viz/altair/blob/master/altair/vegalite/v5/api.py#L2411, I cannot understand why we need to be restrictive on the input types from numpy and pandas here. I really like types like DataFrameLike
and SupportsGeoInterface
for input data features (so we can be flexible on input and restrictive on output).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to define a type hint named ArrayLike1D
something that can serve for one-dimensional data of various forms, but not sure if this is possible yet (eg. python/mypy#12280).
Otherwise, maybe something like they define as type hint for the values
of a polars series? There they define an ArrayLike
type as such (see here:
ArrayLike = Union[
Sequence[Any],
"Series",
"pa.Array",
"pa.ChunkedArray",
"np.ndarray",
"pd.Series",
"pd.DatetimeIndex",
]
And in the docstring for values of the series they mention:
values : ArrayLike, default None
One-dimensional data in various forms. Supported are: Sequence, Series,
pyarrow Array, and numpy ndarray.
I don't think this type-hint is strictly 1D as the docstring suggest, but it can serve as a reference.
Somehow wished it would be possible to include range
as well (ref: #2877).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, that's too restrictive. I used the type hints which were already present in the docstring but seems like this function supports anything that the pandas function infer_type
can handle. According to their own type hints this is all object
so basically everything. I'd suggest that we type hint it the same to be consistent with pandas. Implemented in e974bc9
Btw, the function infer_vegalite_type
is used only here. Your link points to the usage of infer_encoding_types
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is flexible enough, thanks for the change.
… hint that is used by pandas for their infer_type function
Thank you for the reviews! Just fyi, I won't be able to do any development in the next 2 weeks as I'm travelling. I will continue with the type hints in July/August. |
I think it's easier to type hint and review if we can iteratively do it for a few files, review, merge and then continue so I'll stop here and will do the other files in these folders once this is merged.
See #2951 for the overall progress of type hinting Altair.
DataFrameLike
protocol from there instead of SupportsDataFrame. Or useSupportsDataFrame
?