Skip to content

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Sep 26, 2025

Description

A number of typing issues cannot be solved correctly without the pyarrow-stubs. Once I added them, a number of additional errors were also revealed, so this PR fixes those as well. Most of the changes are to typing, but it also revealed some real code issues that I fixed.

Contributes to #17470

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

vyasr and others added 30 commits September 26, 2025 00:33
Add proper type casting for StructScalar to resolve mypy errors
introduced by pyarrow type stubs. The generic pa.Scalar type doesn't
include the items() method, but StructScalar does.

Note: ListScalar iteration issue remains and will be fixed separately.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Updated DecimalDtype.from_arrow to accept Decimal32Type, Decimal64Type,
  and Decimal128Type with TODO comment for future narrowing
- Added explicit check for unsupported Decimal256Type with clear error message
- Restructured decimal type checking to help mypy understand supported types

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Updated function signature to correctly accept pd.ArrowDtype instead of
pa.DataType, which matches the actual usage where .pyarrow_dtype attribute
is accessed. Updated docstring to reflect this change.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Add cast to handle type mismatch where schema.metadata is dict[bytes, bytes]
but pa.schema expects dict[bytes | str, bytes | str] | None. Added explanatory
comment for the type cast.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
The pyarrow type stubs are too strict for pa.struct() which should accept
dict[str, DataType] but the stubs expect a more restrictive type signature.
Added type ignore comment with explanation.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
…utils.py

- Added proper type annotation list[pa.DataType] for types list to resolve
  mypy errors about incompatible DataType append operations
- Added type ignore for pa.pandas_compat.construct_metadata since pyarrow
  stubs don't recognize this valid attribute

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Handle case where self.ordered can be None by providing False as default
when calling DictionaryArray.from_arrays. Added TODO comment to investigate
if ordered can actually be None in this context.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Add proper cast to DictionaryArray when accessing indices and dictionary
attributes. The generic Array[Any] type doesn't include these attributes,
but DictionaryArray does.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
The pyarrow-stubs incorrectly type ListScalar iteration - it should yield
Scalar objects but stubs indicate Array objects. Added TODO to fix upstream
and type ignore comment.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Changed tuple to list for from_buffers compatibility and added type ignore
for None buffer with explanatory comment about strict pyarrow stubs.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Added runtime check to ensure data is ExtensionArray before accessing
storage attribute, providing clear error message for invalid input.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Removed the base class _normalize_binop_operand method and inlined its
simple NA-checking functionality directly into each subclass. This
resolves type signature incompatibilities between the base class and
decimal column implementation. Each subclass now handles NA values
directly without inheritance complications.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Added type ignore comments for from_buffers calls that need to accept
None values for missing buffers, with explanatory comment about strict
pyarrow stubs.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Added cudf.DateOffset to the return type annotation for _normalize_binop_operand
to properly handle datetime operations with date offsets.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Changed tuple to list for from_buffers compatibility and added type ignore
for children parameter where pyarrow stubs are overly strict.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Added type ignore with explanatory comment for buffers parameter where
pyarrow stubs are too strict about None buffer values.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
…umn/datetime.py

- Fixed type ignore comment for time_unit attribute access
- Added proper type casting for assume_timezone to handle strict pyarrow overloads
- Added cast import from typing module

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
The na_sentinel parameter was always converted to -1 internally when None
or invalid, so simplified the implementation to always use -1 directly.
This eliminates the typing issues with pa.Scalar assignment in algorithms.py.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Updated from_arrow method signature to accept pa.ChunkedArray in addition
to pa.Array, matching the documented behavior and actual implementation.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Added type ignore for pa.pandas_compat.construct_metadata since pyarrow
stubs don't recognize this valid attribute.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Updated interval and decimal column from_arrow signatures to accept
pa.ChunkedArray in addition to pa.Array, maintaining Liskov substitution
principle compatibility with the base ColumnBase class.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Added proper DictionaryArray casting for each chunk in ChunkedArray case,
matching the casting done for single Array case. This ensures consistent
type handling between Array and ChunkedArray code paths.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Added ChunkedArray handling by combining chunks before accessing buffers,
since ChunkedArray doesn't have a buffers() method like Array does.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Added proper type annotations for codes and dictionary variables to handle
both Array and ChunkedArray types consistently in the from_arrow method.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Added proper StringColumn cast for replace_re method call and moved
StringColumn import out of TYPE_CHECKING block as required by ruff.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Added error checking for conflicting root_path and partition_cols in kwargs
to prevent users from passing them both as direct arguments and in kwargs.
Added type ignore for mypy complaint about potential duplicate arguments
from *args with explanatory comment about API design.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Cast range_index_meta["stop"] to int to resolve type mismatch where
expression has type "int | str | None" but variable expects "int".
Added type ignore comment for the int() cast.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Removed the _normalize_binop_operand method from ListColumn and inlined
its simple logic directly into _binaryop. The function only checked if
the other operand was the same type and returned NotImplemented otherwise.
This is now simplified to a direct isinstance check.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Removed the _normalize_binop_operand method from StringColumn and inlined
its logic directly into _binaryop. The function handled scalar conversion
to pyarrow scalars with NA handling, and type checking for StringColumns.
Added type ignore for scalar conversion of non-string types.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
vyasr and others added 6 commits September 26, 2025 01:00
Removed the _normalize_binop_operand method from CategoricalColumn and
inlined its logic directly into _binaryop. The function handled dtype
validation for categorical columns and encoding for scalar values.
Simplified the dtype checking logic for better readability.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Removed the _normalize_binop_operand method from NumericalBaseColumn and
inlined its complex logic directly into _binaryop. The function handled
ColumnBase type checking, numpy array conversion, scalar type promotion,
and dtype inference with pandas compatibility. Added type ignores for
edge cases in min_signed_type conversion.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Removed the _normalize_binop_operand method from DecimalBaseColumn and
inlined its logic directly into _binaryop. The function had a complex
tuple return type that caused mypy issues. Inlining eliminates the tuple
return and simplifies the type checking by handling each case directly
in the binary operation logic.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@vyasr vyasr self-assigned this Sep 26, 2025
@vyasr vyasr requested review from a team as code owners September 26, 2025 03:27
@vyasr vyasr added the improvement Improvement / enhancement to an existing function label Sep 26, 2025
@vyasr vyasr added the non-breaking Non-breaking change label Sep 26, 2025
@github-actions github-actions bot added the Python Affects Python cuDF API. label Sep 26, 2025
@GPUtester GPUtester moved this to In Progress in cuDF Python Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants