You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
currently hash joins use a monolithic in-memory hash table for joining, which may cause oom in the case where offheap memory is small.
Describe the solution you'd like
add a row/memory limit for building hash table. when exceeded, turn into a spill-merge method:
build side data is shuffled into N buckets. (say N=1024)
build buckets into separated hash tables, small buckets can be coalesced.
shuffle probe side into the same N partitions.
read each partition, join with the corresponding hash table.
Describe alternatives you've considered
this solves oom problem in most cases, however when there are data skewing, the shuffle does not work, we may fallback to sort-based joining in such situation.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
currently hash joins use a monolithic in-memory hash table for joining, which may cause oom in the case where offheap memory is small.
Describe the solution you'd like
add a row/memory limit for building hash table. when exceeded, turn into a spill-merge method:
Describe alternatives you've considered
this solves oom problem in most cases, however when there are data skewing, the shuffle does not work, we may fallback to sort-based joining in such situation.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: