-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
GH-43495: [C++][Compute] Widen the row offset of the row table to 64-…
…bit (#43389) ### Rationale for this change The row table uses `uint32_t` as the row offset within the row data buffer, effectively limiting the row data from growing beyond 4GB. This is quite restrictive, and the impact is described in more detail in #43495. This PR proposes to widen the row offset from 32-bit to 64-bit to address this limitation. #### Benefits Currently, the row table has three major limitations: 1. The overall data size cannot exceed 4GB. 2. The size of a single row cannot exceed 4GB. 3. The number of rows cannot exceed 2^32. This enhancement will eliminate the first limitation. Meanwhile, the second and third limitations are less likely to occur. Thus, this change will enable a significant range of use cases that are currently unsupported. #### Overhead Of course, this will introduce some overhead: 1. An extra 4 bytes of memory consumption for each row due to the offset size difference from 32-bit to 64-bit. 2. A wider offset type requires a few more SIMD instructions in each 8-row processing iteration. In my opinion, this overhead is justified by the benefits listed above. ### What changes are included in this PR? Change the row offset of the row table from 32-bit to 64-bit. Relative code in row comparison/encoding and swiss join has been updated accordingly. ### Are these changes tested? Test included. ### Are there any user-facing changes? Users could potentially see higher memory consumption when using acero's hash join and hash aggregation. However, on the other hand, certain use cases used to fail are now able to complete. * GitHub Issue: #43495 Authored-by: Ruoxi Sun <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
- Loading branch information
1 parent
1ae38d0
commit 5e68513
Showing
15 changed files
with
802 additions
and
343 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.