You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
jorisvandenbossche
changed the title
Overflow in subtract_checked(timestamp, timestamp) after casting to pandas and back.
[C++] Overflow in subtract_checked(timestamp, timestamp) after casting to pandas and back.
Jun 25, 2024
The root cause is that subtract_checked kernel is not ignoring overflow errors for values masked by the validity bitmap (i.e. nulls). And the roundtrip to/from pandas changed this value. Using nanoarrow to quickly inspect the buffers of x and y:
So in the original array, pyarrow uses a default of 0 for the values behind the mask. But on roundtrip to pandas, pandas uses NaT as missing value sentinel, and when then converting back to pyarrow the integer representation of this (i.e. the smallest int64) is kept, as it is masked anyway)
Interesting. Could it possibly be related to performance optimizations of nullable integer arithmetic? With the convention that nulls have value 0, subtraction $z←x-y$ of nullable integers can be implemented branchless as:
Describe the bug, including details regarding any error messages, version, and platform.
Cross post from pandas-dev/pandas#59082
Component(s)
Python
The text was updated successfully, but these errors were encountered: