You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, a call to TO_DATE or TO_TIMESTAMP* UDFs with a Utf8View datatypes fails. After the change that fixes this issue, it should not.
> create table ts_utf8_data(ts varchar(100), format varchar(100)) as values
('2020-09-08 12/00/00+00:00', '%Y-%m-%d %H/%M/%S%#z'),
('2031-01-19T23:33:25+05:00', '%+'),
('08-09-2020 12:00:00+00:00', '%d-%m-%Y %H:%M:%S%#z'),
('1926632005', '%s'),
('2000-01-01T01:01:01+07:00', '%+');
0 row(s) fetched.
Elapsed 0.062 seconds.
> create table ts_utf8view_data as
select arrow_cast(ts, 'Utf8View') as ts, arrow_cast(format, 'Utf8View') as format from ts_utf8_data;
0 row(s) fetched.
Elapsed 0.010 seconds.
> SELECT to_timestamp(t.ts, t.format),
to_timestamp_seconds(t.ts, t.format),
to_timestamp_millis(t.ts, t.format),
to_timestamp_micros(t.ts, t.format),
to_timestamp_nanos(t.ts, t.format)
from ts_utf8view_data as t;
Execution error: to_timestamp function unsupported data type at index 1: Utf8View
We are working to add complete StringView support in DataFusion, which permits potentially much faster processing of string data. See #10918 for more background.
Today, most DataFusion string functions support DataType::Utf8 and DataType::LargeUtf8 and when called with a StringView argument DataFusion will cast the argument back to DataType::Utf8 which is expensive.
To realize the full speed of StringView, we need to ensure that all string functions support the DataType::Utf8View directly.
Describe the solution you'd like
Update the functions to support DataType::Utf8View directly
Describe alternatives you've considered
No response
Additional context
The typical steps are:
Write some tests showing the function doesn't support Utf8View (see the tests in [string_view.slt](https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/string_view.slt) to ensure the arguments are not being cast
Change the Signature of the function to accept Utf8View in addition to Utf8/LargeUtf8
Update the implementation of the function to operate on Utf8View
Example PRs
Update to use an arrow kernel that already supports StringView:
#11787
Change the implementation to support StringView directly: #11676
Change implementation (option 2): #11556
The text was updated successfully, but these errors were encountered:
Omega359
changed the title
Update TO_DATE, TO_TIMESTAMP scalar functions to support Utf8View
Update TO_DATE, TO_TIMESTAMP scalar functions to support LargeUtf8, Utf8View
Oct 15, 2024
Is your feature request related to a problem or challenge?
Part of #11752 and #11790
Currently, a call to TO_DATE or TO_TIMESTAMP* UDFs with a Utf8View datatypes fails. After the change that fixes this issue, it should not.
We are working to add complete StringView support in DataFusion, which permits potentially much faster processing of string data. See #10918 for more background.
Today, most DataFusion string functions support DataType::Utf8 and DataType::LargeUtf8 and when called with a StringView argument DataFusion will cast the argument back to DataType::Utf8 which is expensive.
To realize the full speed of StringView, we need to ensure that all string functions support the DataType::Utf8View directly.
Describe the solution you'd like
Update the functions to support DataType::Utf8View directly
Describe alternatives you've considered
No response
Additional context
The typical steps are:
Example PRs
#11787
Change the implementation to support StringView directly:
#11676
Change implementation (option 2):
#11556
The text was updated successfully, but these errors were encountered: