Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type hints: Parts of folders "vegalite", "v5", and "utils" #2976

Merged
merged 21 commits into from
Jun 18, 2023

Conversation

binste
Copy link
Contributor

@binste binste commented Mar 18, 2023

I think it's easier to type hint and review if we can iteratively do it for a few files, review, merge and then continue so I'll stop here and will do the other files in these folders once this is merged.

See #2951 for the overall progress of type hinting Altair.

@binste binste mentioned this pull request Mar 18, 2023
14 tasks
@binste binste changed the title Type hints: Vegalite v5 folder without schema files but partially with some related files Type hints: Folders vegalite, v5 without schema files and api.py, utils Mar 18, 2023
@binste binste changed the title Type hints: Folders vegalite, v5 without schema files and api.py, utils Type hints: Folders vegalite, v5 without schema files and api.py, utils without schemapi.py Mar 18, 2023
@binste binste changed the title Type hints: Folders vegalite, v5 without schema files and api.py, utils without schemapi.py Type hints: For folders "vegalite", "v5" without schema files and api.py, "utils" without schemapi.py Mar 18, 2023
@binste binste changed the title Type hints: For folders "vegalite", "v5" without schema files and api.py, "utils" without schemapi.py Type hints: Parts of folders "vegalite", "v5", and "utils" Jun 10, 2023
@binste binste marked this pull request as ready for review June 10, 2023 13:59
@binste binste requested a review from mattijn June 10, 2023 14:01
Copy link
Contributor

@jonmmease jonmmease left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments inline, but this looks great.
I'm looking forward to Altair being fully typed!

altair/utils/core.py Outdated Show resolved Hide resolved
altair/utils/data.py Outdated Show resolved Hide resolved

class _DataJsonUrlDict(TypedDict):
url: str
format: _JsonFormatDict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should format be optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far this is only used as a return type for to_json which always returns format: _JsonFormatDict so I think it's accurate for that usage. But maybe you mean it could be confused to mean that a data json dictionary always needs format specified? I renamed the classes to make it clearer that these are not general schemas but specific type hints for those functions.

Does that address it?

_TDataType = TypeVar("_TDataType", bound=_DataType)

_VegaLiteDataDict = Dict[str, Union[str, dict, List[dict]]]
_ToValuesReturnType = Dict[str, Union[dict, List[dict]]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the best way to do this, but should the type encode that this dictionary only has a top-level "values" key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree it would be nice and we could use a TypedDict. Unfortunately, these lines in the function allow a dictionary to have more then just values as a top-level key in case data is a dictionary:

    elif isinstance(data, dict):
        if "values" not in data:
            raise KeyError("values expected in data dict, but not present.")
        return data

I think there is currently no way to use a TypedDict with some fixed keys and then allow for arbitrary extra keys, see this discussion over in the mypy repo.

@binste
Copy link
Contributor Author

binste commented Jun 16, 2023

Thank you @jonmmease for the review! I addressed your comments and left two open for you to check. In any case, I'll leave it open for a while to see if Mattijn has any inputs. There is no rush to get this merged.

@jonmmease jonmmease self-requested a review June 16, 2023 19:30
Copy link
Contributor

@jonmmease jonmmease left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates and clarifications. lgtm!

Copy link
Contributor

@mattijn mattijn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @binste! One in-line comment + question.

altair/utils/core.py Outdated Show resolved Hide resolved


def infer_vegalite_type(
data: Union[np.ndarray, pd.Series]
Copy link
Contributor

@mattijn mattijn Jun 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question, here you mention: data: Union[np.ndarray, pd.Series]. When I look to where this is used: https://github.com/altair-viz/altair/blob/master/altair/vegalite/v5/api.py#L2411, I cannot understand why we need to be restrictive on the input types from numpy and pandas here. I really like types like DataFrameLike and SupportsGeoInterface for input data features (so we can be flexible on input and restrictive on output).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to define a type hint named ArrayLike1D something that can serve for one-dimensional data of various forms, but not sure if this is possible yet (eg. python/mypy#12280).

Otherwise, maybe something like they define as type hint for the values of a polars series? There they define an ArrayLike type as such (see here:

ArrayLike = Union[
    Sequence[Any],
    "Series",
    "pa.Array",
    "pa.ChunkedArray",
    "np.ndarray",
    "pd.Series",
    "pd.DatetimeIndex",
]

And in the docstring for values of the series they mention:

values : ArrayLike, default None
    One-dimensional data in various forms. Supported are: Sequence, Series,
    pyarrow Array, and numpy ndarray.

I don't think this type-hint is strictly 1D as the docstring suggest, but it can serve as a reference.
Somehow wished it would be possible to include range as well (ref: #2877).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that's too restrictive. I used the type hints which were already present in the docstring but seems like this function supports anything that the pandas function infer_type can handle. According to their own type hints this is all object so basically everything. I'd suggest that we type hint it the same to be consistent with pandas. Implemented in e974bc9

Btw, the function infer_vegalite_type is used only here. Your link points to the usage of infer_encoding_types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is flexible enough, thanks for the change.

@binste
Copy link
Contributor Author

binste commented Jun 18, 2023

Thank you for the reviews!

Just fyi, I won't be able to do any development in the next 2 weeks as I'm travelling. I will continue with the type hints in July/August.

@binste binste merged commit f13f8f1 into vega:master Jun 18, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants