Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double type argument for to_timestamp function #8159

Merged
merged 10 commits into from
Nov 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions datafusion/expr/src/built_in_function.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1012,6 +1012,7 @@ impl BuiltinScalarFunction {
1,
vec![
Int64,
Float64,
Timestamp(Nanosecond, None),
Timestamp(Microsecond, None),
Timestamp(Millisecond, None),
Expand Down
5 changes: 5 additions & 0 deletions datafusion/physical-expr/src/datetime_expressions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -971,6 +971,11 @@ pub fn to_timestamp_invoke(args: &[ColumnarValue]) -> Result<ColumnarValue> {
&DataType::Timestamp(TimeUnit::Nanosecond, None),
None,
),
DataType::Float64 => cast_column(
&args[0],
&DataType::Timestamp(TimeUnit::Nanosecond, None),
None,
),
DataType::Timestamp(_, None) => cast_column(
&args[0],
&DataType::Timestamp(TimeUnit::Nanosecond, None),
Expand Down
20 changes: 18 additions & 2 deletions datafusion/physical-expr/src/expressions/cast.rs
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,20 @@ pub fn cast_column(
kernels::cast::cast_with_options(array, cast_type, &cast_options)?,
)),
ColumnarValue::Scalar(scalar) => {
let scalar_array = scalar.to_array()?;
let scalar_array = if cast_type
== &DataType::Timestamp(arrow_schema::TimeUnit::Nanosecond, None)
{
if let ScalarValue::Float64(Some(float_ts)) = scalar {
ScalarValue::Int64(
Some((float_ts * 1_000_000_000_f64).trunc() as i64),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here complexity can be reduced. too many conditions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are allowing this type cast to be pushed down to the arrow crate, then I can add this logic to kernel::compute::cast_with_options or the appropriate fn

)
.to_array()?
} else {
scalar.to_array()?
}
} else {
scalar.to_array()?
};
let cast_array = kernels::cast::cast_with_options(
&scalar_array,
cast_type,
Expand All @@ -201,7 +214,10 @@ pub fn cast_with_options(
let expr_type = expr.data_type(input_schema)?;
if expr_type == cast_type {
Ok(expr.clone())
} else if can_cast_types(&expr_type, &cast_type) {
} else if can_cast_types(&expr_type, &cast_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to have a followup PR in arrow-rs, I'll do it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if this can be pushed down to the arrow crate, the complexity in datafusion would be reduced. I wasn't sure if doing so was appropriate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked arrow-rs so such cast is supported https://github.com/apache/arrow-rs/blob/master/arrow-cast/src/cast.rs#L224

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can_cast_types was returning false in my testing yesterday for Float64 -> Timestamp(Nanosecond, None), seemingly because the line you linked has not been released yet. The type check was changed from is_integer to is_numeric only a few days ago, whereas the last arrow-rs release was 3 weeks ago.

Should we wait until the next arrow-rs release so I can leverage this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay to let it go, because this is important piece. I'll create a followup issue to move to arrow-rs cast and small other refactoring. Thanks @spaydar for your work

|| (expr_type == DataType::Float64
&& cast_type == DataType::Timestamp(arrow_schema::TimeUnit::Nanosecond, None))
{
Ok(Arc::new(CastExpr::new(expr, cast_type, cast_options)))
} else {
not_impl_err!("Unsupported CAST from {expr_type:?} to {cast_type:?}")
Expand Down
29 changes: 29 additions & 0 deletions datafusion/sqllogictest/test_files/timestamps.slt
Original file line number Diff line number Diff line change
Expand Up @@ -291,6 +291,35 @@ SELECT COUNT(*) FROM ts_data_secs where ts > to_timestamp_seconds('2020-09-08T12
----
2


# to_timestamp float inputs

query PPP
SELECT to_timestamp(1.1) as c1, cast(1.1 as timestamp) as c2, 1.1::timestamp as c3;
----
1970-01-01T00:00:01.100 1970-01-01T00:00:01.100 1970-01-01T00:00:01.100

query PPP
SELECT to_timestamp(-1.1) as c1, cast(-1.1 as timestamp) as c2, (-1.1)::timestamp as c3;
----
1969-12-31T23:59:58.900 1969-12-31T23:59:58.900 1969-12-31T23:59:58.900

query PPP
SELECT to_timestamp(0.0) as c1, cast(0.0 as timestamp) as c2, 0.0::timestamp as c3;
----
1970-01-01T00:00:00 1970-01-01T00:00:00 1970-01-01T00:00:00

query PPP
SELECT to_timestamp(1.23456789) as c1, cast(1.23456789 as timestamp) as c2, 1.23456789::timestamp as c3;
----
1970-01-01T00:00:01.234567890 1970-01-01T00:00:01.234567890 1970-01-01T00:00:01.234567890

query PPP
SELECT to_timestamp(123456789.123456789) as c1, cast(123456789.123456789 as timestamp) as c2, 123456789.123456789::timestamp as c3;
----
1973-11-29T21:33:09.123456784 1973-11-29T21:33:09.123456784 1973-11-29T21:33:09.123456784


# from_unixtime

# 1599566400 is '2020-09-08T12:00:00+00:00'
Expand Down
4 changes: 2 additions & 2 deletions docs/source/user-guide/sql/scalar_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -1442,9 +1442,9 @@ extract(field FROM source)
### `to_timestamp`

Converts a value to a timestamp (`YYYY-MM-DDT00:00:00Z`).
Supports strings, integer, and unsigned integer types as input.
Supports strings, integer, unsigned integer, and double types as input.
Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00')
Integers and unsigned integers are interpreted as seconds since the unix epoch (`1970-01-01T00:00:00Z`)
Integers, unsigned integers, and doubles are interpreted as seconds since the unix epoch (`1970-01-01T00:00:00Z`)
return the corresponding timestamp.

```
Expand Down