Skip to content

Commit

Permalink
Cap max_date in the meta_date table to avoid Pandas bug
Browse files Browse the repository at this point in the history
It can't represent dates past 2262, which we sometimes have due to
typos etc.

Plus, it doesn't really makes sense to say that a run of the data
metrics covers data in the future.

So instead, we cap to CURRENT_DATE. No fix yet for typos deep in the
past.
  • Loading branch information
mikix committed Oct 11, 2024
1 parent d20e5eb commit 7958e43
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 2 deletions.
2 changes: 1 addition & 1 deletion cumulus_library_data_metrics/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""Data Metrics study for Cumulus Library"""

__version__ = "5.0.1"
__version__ = "5.1.0"
6 changes: 5 additions & 1 deletion cumulus_library_data_metrics/meta/dates.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,11 @@ unified AS (

SELECT
MIN(min_date) AS min_date,
MAX(max_date) AS max_date
-- Cap max_date to NOW because (A) it's not entirely accurate to say this study goes to the
-- year 2990 just because someone makes a typo and (B) Pandas has issues with parsing dates
-- past 2262 by default (search "timestamp limitations" in pandas docs) and gets confused
-- when exporting this table.
LEAST(MAX(max_date), CURRENT_DATE) AS max_date
FROM unified

);

0 comments on commit 7958e43

Please sign in to comment.