Skip to content

BUG: Raise OutOfBoundsDatetime in DataFrame.replace when value exceeds datetime64[ns] bounds (GH#61671) #61717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

iabhi4
Copy link
Contributor

@iabhi4 iabhi4 commented Jun 27, 2025

Fixes a bug where DataFrame.replace would raise a generic AssertionError when trying to replace np.nan in a datetime64[ns] column with an out-of-bounds datetime.datetime object (e.g., datetime(3000, 1, 1)).

This PR fixes that by explicitly raising OutOfBoundsDatetime when the replacement datetime can't safely fit into the datetime64[ns] dtype.

Let me know if you'd like to test other edge cases or if there's a more idiomatic way to handle this!

@iabhi4
Copy link
Contributor Author

iabhi4 commented Jun 27, 2025

looking into the CI failures

@iabhi4
Copy link
Contributor Author

iabhi4 commented Jun 28, 2025

Regarding CI failures —

So after the changes in find_result_type, we're now catching cases like Timestamp('1677-09-21 00:12:43.145224193') early and raising OutOfBoundsDatetime during coercion itself, which makes sense and is in line with what #56410 was aiming for (no silent truncation between datetime units).

Because of that, test_clip_with_timestamps_and_oob_datetimes_non_nano is now failing since it hits the error earlier with, Just wanted to confirm, should I go ahead and update the test to reflect this message? or is the earlier failure point problematic?

Happy to revert or gate the check if needed.

@mroeschke mroeschke requested a review from jbrockmendel June 30, 2025 17:37
raise OutOfBoundsDatetime(
f"{right!r} overflows datetime64[ns] during dtype inference"
)
except (OverflowError, ValueError) as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick can you call this err instead of e. try to avoid 1-letter variable names

@@ -685,6 +685,7 @@ Datetimelike
- Bug in :func:`tseries.frequencies.to_offset` would fail to parse frequency strings starting with "LWOM" (:issue:`59218`)
- Bug in :meth:`DataFrame.fillna` raising an ``AssertionError`` instead of ``OutOfBoundsDatetime`` when filling a ``datetime64[ns]`` column with an out-of-bounds timestamp. Now correctly raises ``OutOfBoundsDatetime``. (:issue:`61208`)
- Bug in :meth:`DataFrame.min` and :meth:`DataFrame.max` casting ``datetime64`` and ``timedelta64`` columns to ``float64`` and losing precision (:issue:`60850`)
- Bug in :meth:`DataFrame.replace` where attempting to replace a ``datetime64[ns]`` column with an out-of-bounds timestamp would raise an ``AssertionError`` or silently coerce. Now correctly raises ``OutOfBoundsDatetime``. (:issue:`61671`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's wasn't just replace? it was also happening with .iloc and __setitem__?

df = pd.DataFrame([np.nan], dtype="datetime64[ns]")
df.iloc[0, 0] = datetime.datetime(3000, 1, 1)
# AssertionError: Something has gone wrong, please report a bug at https://github.com/pandas-dev/pandas/issues

maybe a more generic note and tests for the other cases?

@simonjayhawkins simonjayhawkins added Bug Datetime Datetime data dtype labels Jul 5, 2025
@iabhi4 iabhi4 force-pushed the datetime-issue-61671 branch from cd0dd3a to d41405c Compare July 14, 2025 02:07
@iabhi4
Copy link
Contributor Author

iabhi4 commented Jul 14, 2025

Thanks for the review and suggestions @simonjayhawkins @jbrockmendel!
I did some testing to better understand how different assignment and replacement operations behave with out-of-bounds datetimes, both tz-naive and tz-aware. Here's what I found:


Observed Behavior (Confirmed via Logs)

Operation Value Type Outcome Notes
replace(np.nan, ts) Timestamp("3000-01-01") Raises OutOfBoundsDatetime Expected behavior, works as intended
replace(np.nan, ts) Timestamp("3000-01-01", tz) Succeeds silently Column upcasts to object dtype silently
df.iloc[0, 0] = ts Timestamp("3000-01-01") Raises OutOfBoundsDatetime Same as above, correct behavior
df.iloc[0, 0] = ts Timestamp("3000-01-01", tz) Raises TypeError Due to tz-naive column (datetime64[ns]) being incompatible with tz-aware value

Additional Context

  • For tz-naive out-of-bounds values:
    • Both replace() and iloc correctly raise OutOfBoundsDatetime.
  • For tz-aware values:
    • replace() allows insertion silently by upcasting the column to object dtype (confirmed via debug logs).
    • iloc correctly raises a TypeError because of tz-awareness mismatch (datetime64[ns] vs tz-aware).

Next Steps from My Side

Before I add tests for these cases, just wanted to check:

  • Should we treat tz-aware out-of-bounds timestamps as valid (fallback to object)?
  • Or do we want to enforce stricter checks across the board?

Happy to add the tests once I get a bit of guidance on how we want to handle these edge cases consistently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: np.nan to datetime assertionerror when too large datetime given
3 participants