-
Notifications
You must be signed in to change notification settings - Fork 973
Add pyarrow stubs to mypy environment and fix associated errors #20118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vyasr
wants to merge
38
commits into
rapidsai:branch-25.12
Choose a base branch
from
vyasr:fix/pyarrow_typing
base: branch-25.12
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add proper type casting for StructScalar to resolve mypy errors introduced by pyarrow type stubs. The generic pa.Scalar type doesn't include the items() method, but StructScalar does. Note: ListScalar iteration issue remains and will be fixed separately. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Updated DecimalDtype.from_arrow to accept Decimal32Type, Decimal64Type, and Decimal128Type with TODO comment for future narrowing - Added explicit check for unsupported Decimal256Type with clear error message - Restructured decimal type checking to help mypy understand supported types 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Updated function signature to correctly accept pd.ArrowDtype instead of pa.DataType, which matches the actual usage where .pyarrow_dtype attribute is accessed. Updated docstring to reflect this change. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Add cast to handle type mismatch where schema.metadata is dict[bytes, bytes] but pa.schema expects dict[bytes | str, bytes | str] | None. Added explanatory comment for the type cast. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
The pyarrow type stubs are too strict for pa.struct() which should accept dict[str, DataType] but the stubs expect a more restrictive type signature. Added type ignore comment with explanation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…utils.py - Added proper type annotation list[pa.DataType] for types list to resolve mypy errors about incompatible DataType append operations - Added type ignore for pa.pandas_compat.construct_metadata since pyarrow stubs don't recognize this valid attribute 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Handle case where self.ordered can be None by providing False as default when calling DictionaryArray.from_arrays. Added TODO comment to investigate if ordered can actually be None in this context. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Add proper cast to DictionaryArray when accessing indices and dictionary attributes. The generic Array[Any] type doesn't include these attributes, but DictionaryArray does. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
The pyarrow-stubs incorrectly type ListScalar iteration - it should yield Scalar objects but stubs indicate Array objects. Added TODO to fix upstream and type ignore comment. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Changed tuple to list for from_buffers compatibility and added type ignore for None buffer with explanatory comment about strict pyarrow stubs. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Added runtime check to ensure data is ExtensionArray before accessing storage attribute, providing clear error message for invalid input. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Removed the base class _normalize_binop_operand method and inlined its simple NA-checking functionality directly into each subclass. This resolves type signature incompatibilities between the base class and decimal column implementation. Each subclass now handles NA values directly without inheritance complications. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Added type ignore comments for from_buffers calls that need to accept None values for missing buffers, with explanatory comment about strict pyarrow stubs. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Added cudf.DateOffset to the return type annotation for _normalize_binop_operand to properly handle datetime operations with date offsets. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Changed tuple to list for from_buffers compatibility and added type ignore for children parameter where pyarrow stubs are overly strict. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Added type ignore with explanatory comment for buffers parameter where pyarrow stubs are too strict about None buffer values. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…umn/datetime.py - Fixed type ignore comment for time_unit attribute access - Added proper type casting for assume_timezone to handle strict pyarrow overloads - Added cast import from typing module 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
The na_sentinel parameter was always converted to -1 internally when None or invalid, so simplified the implementation to always use -1 directly. This eliminates the typing issues with pa.Scalar assignment in algorithms.py. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Updated from_arrow method signature to accept pa.ChunkedArray in addition to pa.Array, matching the documented behavior and actual implementation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Added type ignore for pa.pandas_compat.construct_metadata since pyarrow stubs don't recognize this valid attribute. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Updated interval and decimal column from_arrow signatures to accept pa.ChunkedArray in addition to pa.Array, maintaining Liskov substitution principle compatibility with the base ColumnBase class. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Added proper DictionaryArray casting for each chunk in ChunkedArray case, matching the casting done for single Array case. This ensures consistent type handling between Array and ChunkedArray code paths. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Added ChunkedArray handling by combining chunks before accessing buffers, since ChunkedArray doesn't have a buffers() method like Array does. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Added proper type annotations for codes and dictionary variables to handle both Array and ChunkedArray types consistently in the from_arrow method. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Added proper StringColumn cast for replace_re method call and moved StringColumn import out of TYPE_CHECKING block as required by ruff. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Added error checking for conflicting root_path and partition_cols in kwargs to prevent users from passing them both as direct arguments and in kwargs. Added type ignore for mypy complaint about potential duplicate arguments from *args with explanatory comment about API design. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Cast range_index_meta["stop"] to int to resolve type mismatch where expression has type "int | str | None" but variable expects "int". Added type ignore comment for the int() cast. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Removed the _normalize_binop_operand method from ListColumn and inlined its simple logic directly into _binaryop. The function only checked if the other operand was the same type and returned NotImplemented otherwise. This is now simplified to a direct isinstance check. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Removed the _normalize_binop_operand method from StringColumn and inlined its logic directly into _binaryop. The function handled scalar conversion to pyarrow scalars with NA handling, and type checking for StringColumns. Added type ignore for scalar conversion of non-string types. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Removed the _normalize_binop_operand method from CategoricalColumn and inlined its logic directly into _binaryop. The function handled dtype validation for categorical columns and encoding for scalar values. Simplified the dtype checking logic for better readability. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Removed the _normalize_binop_operand method from NumericalBaseColumn and inlined its complex logic directly into _binaryop. The function handled ColumnBase type checking, numpy array conversion, scalar type promotion, and dtype inference with pandas compatibility. Added type ignores for edge cases in min_signed_type conversion. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Removed the _normalize_binop_operand method from DecimalBaseColumn and inlined its logic directly into _binaryop. The function had a complex tuple return type that caused mypy issues. Inlining eliminates the tuple return and simplifies the type checking by handling each case directly in the binary operation logic. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
This reverts commit d14d6af.
mroeschke
reviewed
Sep 26, 2025
mroeschke
reviewed
Sep 26, 2025
mroeschke
approved these changes
Sep 26, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
improvement
Improvement / enhancement to an existing function
non-breaking
Non-breaking change
Python
Affects Python cuDF API.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
A number of typing issues cannot be solved correctly without the pyarrow-stubs. Once I added them, a number of additional errors were also revealed, so this PR fixes those as well. Most of the changes are to typing, but it also revealed some real code issues that I fixed.
Contributes to #17470
Checklist