bug: allow floats as timestamp column #777

jordanrfrazier · 2023-09-26T19:04:00Z

Allows timestamp column to be float for parquet sources in the new prepare path.

Also fixes the time_unit, which was previously not being taken into account.

Closes #776

jordanrfrazier · 2023-09-26T19:05:25Z

crates/sparrow-runtime/src/prepare/column_behavior.rs

+    } else {
+        Ok(time.clone())
+    }
+}


A lot of this is halfway duplicated from the prepare_batch code. It's not a complete duplicate, but there's definitely room to simplify, but it would require more refactoring of the column_behavior code. Not sure how much we want to do now.

The best options seem to be either:

Drop the column behavior stuff and have both paths call prepare_batch (the newer code).

Add a column behavior corresponding to numeric_to_timestamp which prepare_batch uses for the timestamp case.

Either would let us avoid adding as much code to column_behavior.

bjchambers · 2023-09-26T19:45:35Z

crates/sparrow-runtime/src/prepare/column_behavior.rs

@@ -24,8 +25,15 @@ pub enum ColumnBehavior {
    /// Cast the given column to the given data type.


Do we need to use / change the column behavior? I think we could just use the prepare_batch method and avoid that. Alternatively, it seems like we could have a case here that called into the numeric_to_timestamp and at least shared that case (rather than adding more complexity to column behaviors).

bjchambers · 2023-09-26T19:46:46Z

crates/sparrow-runtime/src/prepare/column_behavior.rs

+    } else {
+        Ok(time.clone())
+    }
+}


The best options seem to be either:

Drop the column behavior stuff and have both paths call prepare_batch (the newer code).

Add a column behavior corresponding to numeric_to_timestamp which prepare_batch uses for the timestamp case.

Either would let us avoid adding as much code to column_behavior.

python/pytests/parquet_source_test.py

bjchambers · 2023-09-26T19:47:13Z

python/pytests/parquet_source_test.py

+    )
+    golden.jsonl(source)
+
+async def test_time_column_as_float_can_cast_ns(golden) -> None:


Allow floats as timestamp column

23299cf

jordanrfrazier requested a review from bjchambers September 26, 2023 19:04

cla-bot bot added the cla-signed Set when all authors of a PR have signed our CLA label Sep 26, 2023

github-actions bot added bug Something isn't working sparrow labels Sep 26, 2023

jordanrfrazier commented Sep 26, 2023

View reviewed changes

bjchambers reviewed Sep 26, 2023

View reviewed changes

use order preserving cast

640d283

jordanrfrazier closed this Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: allow floats as timestamp column #777

bug: allow floats as timestamp column #777

jordanrfrazier commented Sep 26, 2023 •

edited

Loading

jordanrfrazier Sep 26, 2023

bjchambers Sep 26, 2023

bjchambers Sep 26, 2023

bjchambers Sep 26, 2023

bjchambers Sep 26, 2023

		@@ -24,8 +25,15 @@ pub enum ColumnBehavior {
		/// Cast the given column to the given data type.

bug: allow floats as timestamp column #777

bug: allow floats as timestamp column #777

Conversation

jordanrfrazier commented Sep 26, 2023 • edited Loading

jordanrfrazier Sep 26, 2023

Choose a reason for hiding this comment

bjchambers Sep 26, 2023

Choose a reason for hiding this comment

bjchambers Sep 26, 2023

Choose a reason for hiding this comment

bjchambers Sep 26, 2023

Choose a reason for hiding this comment

bjchambers Sep 26, 2023

Choose a reason for hiding this comment

jordanrfrazier commented Sep 26, 2023 •

edited

Loading