-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(prefix sort): Support string type key in prefix sort #11527
feat(prefix sort): Support string type key in prefix sort #11527
Conversation
✅ Deploy Preview for meta-velox canceled.
|
cc @jinchengchenghh , @skadilover , @xiaoxmeng , could you help to review this PR? Thanks. |
And we can do the further optimization after we add the RowContainer stats, which may record the maxSize of each row, if encodeSize is more than maxSize, we can say it can be fully encoded. |
If maxSize < prefixsort_string_prefix_length, the encoded size should be maxSize + 1, indicating the column is fully encoded. This allows inclusion of following keys in the prefix and less memory allocation for prefix. |
4c28fa5
to
6ddfde4
Compare
6ddfde4
to
18e71c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, some minors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much appreciate for your contribution.
Can you rerun the benchmark to measure the effect that avoid copy string? |
Cases where strings are stored separately in different blocks should be rare, so the results won’t differ much.
|
Hi @zhli1142015 I assume the extra prefix will introduce higher memory footprint, do you happen to have some metrics on comparing the memory usage? thanks, |
8bd3b02
to
6b37fb8
Compare
Please check: #11272 (comment) |
2a57c7a
to
9edab0a
Compare
6fc1dff
to
23e5d78
Compare
Hello @xiaoxmeng, I’ve updated the PR to leverage the |
477152c
to
0a54657
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhli1142015 thanks for adding string support % minors.
address comments address comments address comments address comments address comments fix build address comments get string column max length from row container fix ut collect stats for orderBy op
3204cce
to
907a8b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhli1142015 LGTM % minors. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhli1142015 LGTM. Thanks for the iterations!
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Please also update here, https://github.com/facebookincubator/velox/blob/main/velox/exec/PrefixSort.cpp#L190 Move the partial normalized string key to the last. |
Now VARCHAR columns' encode size is assigned a large value UINT_MAX in |
Yes, thanks for your clarification. No problem. Good change. |
@xiaoxmeng merged this pull request in 960c2af. |
…cubator#11527) Summary: Support string type key in PrefixSort. We introduce the configuration parameter `prefixsort_max_string_length`, which sets the maximum prefix length for strings. The implementation dynamically determines the prefix length by comparing the configured maximum with the actual maximum column length from RowContainer, using the smaller of the two. This ensures efficient and flexible prefix sorting for variable-length string types. Default value of `prefixsort_max_string_length` is 16. Perf result: ``` StdSort_no-payloads_1_varchar_1k 158.23ns 6.32M PrefixSort 415.89% 38.05ns 26.28M StdSort_no-payloads_2_varchar_1k 186.19ns 5.37M PrefixSort 197.21% 94.41ns 10.59M StdSort_no-payloads_3_varchar_1k 197.90ns 5.05M PrefixSort 148.93% 132.89ns 7.53M StdSort_no-payloads_4_varchar_1k 211.35ns 4.73M PrefixSort 126.75% 166.75ns 6.00M StdSort_no-payloads_1_varchar_10k 257.23ns 3.89M PrefixSort 358.08% 71.84ns 13.92M StdSort_no-payloads_2_varchar_10k 272.61ns 3.67M PrefixSort 227.46% 119.85ns 8.34M StdSort_no-payloads_3_varchar_10k 295.37ns 3.39M PrefixSort 170.20% 173.55ns 5.76M StdSort_no-payloads_4_varchar_10k 319.42ns 3.13M PrefixSort 152.60% 209.31ns 4.78M StdSort_no-payloads_1_varchar_100k 348.19ns 2.87M PrefixSort 403.18% 86.36ns 11.58M StdSort_no-payloads_2_varchar_100k 409.94ns 2.44M PrefixSort 261.56% 156.73ns 6.38M StdSort_no-payloads_3_varchar_100k 469.93ns 2.13M PrefixSort 206.32% 227.76ns 4.39M StdSort_no-payloads_4_varchar_100k 526.94ns 1.90M PrefixSort 186.66% 282.29ns 3.54M StdSort_no-payloads_1_varchar_1000k 780.28ns 1.28M PrefixSort 627.88% 124.27ns 8.05M StdSort_no-payloads_2_varchar_1000k 976.32ns 1.02M PrefixSort 491.56% 198.62ns 5.03M StdSort_no-payloads_3_varchar_1000k 1.08us 928.51K PrefixSort 376.44% 286.10ns 3.50M StdSort_no-payloads_4_varchar_1000k 1.12us 889.85K PrefixSort 321.50% 349.54ns 2.86M ``` Pull Request resolved: facebookincubator#11527 Reviewed By: Yuhta Differential Revision: D67149095 Pulled By: xiaoxmeng fbshipit-source-id: 79f02c81165a873aa8068260b5580850f30a4fc5
Support string type key in PrefixSort.
We introduce the configuration parameter
prefixsort_max_string_length
,which sets the maximum prefix length for strings. The implementation dynamically
determines the prefix length by comparing the configured maximum with the
actual maximum column length from RowContainer, using the smaller of the two.
This ensures efficient and flexible prefix sorting for variable-length string types.
Default value of
prefixsort_max_string_length
is 16.Perf result: