-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the CONCAT scalar function to support Utf8View #12224
Merged
alamb
merged 18 commits into
apache:main
from
devanbenz:feat/11836-update-concat-utf8view
Sep 3, 2024
Merged
Changes from 7 commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
723ceb7
wip
devanbenz f7abdd5
feat: Update the CONCAT scalar function to support Utf8View
devanbenz 503d5b9
fmt
devanbenz b30330d
fmt and add default return type for concat
devanbenz 76d6b5f
fix clippy lint
devanbenz 9798ed3
fmt
devanbenz 91d04ff
add more tests for sqllogic
devanbenz 6d28927
make sure no casting with LargeUtf8
devanbenz 769d99d
fixing utf8large
devanbenz ac30a83
fix large utf8
devanbenz 7ea6e0a
fix large utf8
devanbenz dd3ad39
add test
devanbenz 504459c
fmt
devanbenz e081934
make it so Utf8View just returns Utf8
devanbenz 0069c1a
wip -- trying to build a stringview with columnar refs
devanbenz 0929a4b
built stringview builder but it does allocate a new String each iter :(
devanbenz f16de44
add some testing
devanbenz d0bf3ba
clippy
devanbenz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -776,7 +776,7 @@ EXPLAIN SELECT | |
FROM test; | ||
---- | ||
logical_plan | ||
01)Projection: concat(CAST(test.column1_utf8view AS Utf8), CAST(test.column2_utf8view AS Utf8)) AS c | ||
01)Projection: concat(test.column1_utf8view, test.column2_utf8view) AS c | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💪 |
||
02)--TableScan: test projection=[column1_utf8view, column2_utf8view] | ||
|
||
## Ensure no casts for CONCAT_WS | ||
|
@@ -863,6 +863,39 @@ XIANGPENG | |
RAPHAEL | ||
NULL | ||
|
||
## Should run CONCAT successfully | ||
query T | ||
SELECT | ||
concat(column1_utf8view, column2_utf8view) as c | ||
FROM test; | ||
---- | ||
AndrewX | ||
XiangpengXiangpeng | ||
RaphaelR | ||
R | ||
|
||
## Should run CONCAT successfully with utf8 and utf8view | ||
query T | ||
SELECT | ||
concat(column1_utf8view, column2_utf8) as c | ||
FROM test; | ||
---- | ||
AndrewX | ||
XiangpengXiangpeng | ||
RaphaelR | ||
R | ||
|
||
## Should run CONCAT successfully with utf8 utf8view and largeutf8 | ||
query T | ||
SELECT | ||
concat(column1_utf8view, column2_utf8, column2_large_utf8) as c | ||
FROM test; | ||
---- | ||
AndrewXX | ||
XiangpengXiangpengXiangpeng | ||
RaphaelRR | ||
RR | ||
|
||
## Ensure no casts for LPAD | ||
query TT | ||
EXPLAIN SELECT | ||
|
@@ -1307,3 +1340,4 @@ select column2|| ' ' ||column3 from temp; | |
---- | ||
rust fast | ||
datafusion cool | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps as a follow on PR we can expand this to also support
LargeUtf8
s... though it looks like maybe they are from the tests? Does this just need to be updated and it'll work?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's casting to Utf8 from LargeUtf8? Not sure. I can follow up and modify this code to include that though it should be relatively quick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apache/datafusion/pull/12224/files#diff-71970189679c6dd5b3b677bb21603234b488e68d1601be9c4d400d40e430a909R70 pretty sure this line captures the LargeUtf8 usage at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, should've been clearer, to your point, I think it works but causes a cast to utf8.
I think if you were to concat two largeutf8s together, you should get that same type as the return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct - I will need to modify the input/return values for this. I'll add an additional test for LargeUtf8 and fix in this PR tonight. It shouldn't take long, I think I have a path forward to change that and it should be relatively painless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction** I fixed it now it was really quick resolution lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it's erroring on cast to generic string array. I need to modify 😆 whoops still not fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's resolved 👍