Skip to content

align typing with pandas source #1219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

Dr-Irv
Copy link
Collaborator

@Dr-Irv Dr-Irv commented May 18, 2025

Possibly closes #128

I went through the current pandas source in pandas/_typing.py and separated the types declared there from the "private" types only used in the stubs, which now appear at the bottom of the file, and added documentation where appropriate to indicate where things are different.

Unclear whether the "private" types in the stubs should all be preceded by underscores. They are imported from pandas/_typing, so that makes them private anyway. When I wrote #128, I suggested to make them all private. Now that I have more experience with typing, I don't think it's necessary.

By doing this, I have a list of types that should be "public" which will eventually go in pandas.api.typing.aliases . That will then allow me to work on pandas-dev/pandas#55231

Asking for opinions of @loicdiridollou and a review/approval/merge from @twoertwein

TakeIndexer,
np_ndarray_bool,
)

IntervalOrNA: TypeAlias = Interval | float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep this in this file or should we move it to _typing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably _IntervalOrNA

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to be consistent with pandas for now. This statement works: from pandas.core.arrays.interval import IntervalOrNA

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a shift in policy (from public == what is documented on the web) to more how the python community/type checkers defines public/private? That is fine with me, just want to make sure if that is the case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a shift in policy (from public == what is documented on the web) to more how the python community/type checkers defines public/private? That is fine with me, just want to make sure if that is the case.

Not really. I just want to be consistent right now with the types in pandas. I think we need to change some of types likes these that are in pandas to be private (precede with underscores), which is a separate effort.

TimeUnit,
)

DTScalarOrNaT: TypeAlias = DatetimeLikeScalar | NaTType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this stay in this file or should we aim to centralize those in _typing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be used only here so it might be nice to keep it here. I think this needs to be _DTScalarOrNaT.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to be consistent with what is in pandas for right now. This statement currently works: from pandas.core.arrays.datetimelike import DTScalarOrNaT

@loicdiridollou
Copy link
Member

Great job in here, I think this is a major improvement. Hopefully that can unblock the other issue in pandas repo.
Added a few comments about some types being defined in files instead of _typing.pyi.
Not totally sure what is the exact direction that would fit the team at pandas but this is a ton of work already here and we can continue to use the stubtest to clean up some of the function signatures that have drifted from the pandas implementaion.

@Dr-Irv
Copy link
Collaborator Author

Dr-Irv commented May 18, 2025

Added a few comments about some types being defined in files instead of _typing.pyi.

For now, I'd rather mirror what is in the pandas source, where both of those types are defined in the respective files. That's where I picked them up from. If someone really wants access to them, they can import them from those files, which are not in private modules.

TimeUnit,
)

DTScalarOrNaT: TypeAlias = DatetimeLikeScalar | NaTType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be used only here so it might be nice to keep it here. I think this needs to be _DTScalarOrNaT.

TakeIndexer,
np_ndarray_bool,
)

IntervalOrNA: TypeAlias = Interval | float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably _IntervalOrNA

SequenceIndexer,
)

class ellipsis(Enum):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this class public in pandas?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works in pandas:

from pandas.core.arrays.sparse.array import ellipsis

@Dr-Irv
Copy link
Collaborator Author

Dr-Irv commented May 18, 2025

@twoertwein thanks for the review. I've pushed a new version that addresses your comments (and commented on a few other comments where I didn't make a change)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLEAN: Reconcile types in pandas._typing.py and pandas-stubs._typing.pyi
3 participants