Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exec: hash joiner #1

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

exec: hash joiner #1

wants to merge 7 commits into from

Conversation

changangela
Copy link

@changangela changangela commented Oct 15, 2018

Toy hash joiner for joining on int-int columns where we build with the left relation (requires unique key on join column) and probe on right relation.

Results of go test -bench=BenchmarkHashJoin:

Using the hashJoinBuilder:

BenchmarkHashJoin/name=random_source/rows=0-8  	   30000	     53737 ns/op
BenchmarkHashJoin/name=random_source/rows=4096-8         	    5000	    325867 ns/op	 804.45 MB/s
BenchmarkHashJoin/name=random_source/rows=16384-8        	    1000	   1405501 ns/op	 746.05 MB/s
BenchmarkHashJoin/name=random_source/rows=262144-8       	     100	  22095146 ns/op	 759.32 MB/s
BenchmarkHashJoin/name=random_source/rows=4194304-8      	       3	 360555462 ns/op	 744.51 MB/s
BenchmarkHashJoin/name=random_source/rows=67108864-8     	       1	10443118639 ns/op	 411.27 MB/s
BenchmarkHashJoin/name=uniformly_distinct_source/rows=0-8         	   30000	     54243 ns/op
BenchmarkHashJoin/name=uniformly_distinct_source/rows=4096-8      	    5000	    287274 ns/op	 912.52 MB/s
BenchmarkHashJoin/name=uniformly_distinct_source/rows=16384-8     	    1000	   1286287 ns/op	 815.20 MB/s
BenchmarkHashJoin/name=uniformly_distinct_source/rows=262144-8    	     100	  21394291 ns/op	 784.19 MB/s
BenchmarkHashJoin/name=uniformly_distinct_source/rows=4194304-8   	       3	 351375238 ns/op	 763.96 MB/s
BenchmarkHashJoin/name=uniformly_distinct_source/rows=67108864-8  	       1	5903473387 ns/op	 727.53 MB/s

Using the hashJoinGroupBuilder:

BenchmarkHashJoin/name=random_source/rows=0-8  	   30000	     53264 ns/op
BenchmarkHashJoin/name=random_source/rows=4096-8         	    5000	    309597 ns/op	 846.73 MB/s
BenchmarkHashJoin/name=random_source/rows=16384-8        	    1000	   1246564 ns/op	 841.17 MB/s
BenchmarkHashJoin/name=random_source/rows=262144-8       	     100	  19342936 ns/op	 867.36 MB/s
BenchmarkHashJoin/name=random_source/rows=4194304-8      	       5	 317785006 ns/op	 844.71 MB/s
BenchmarkHashJoin/name=random_source/rows=67108864-8     	       1	8037002408 ns/op	 534.40 MB/s
BenchmarkHashJoin/name=uniformly_distinct_source/rows=0-8         	   30000	     56612 ns/op
BenchmarkHashJoin/name=uniformly_distinct_source/rows=4096-8      	    5000	    257718 ns/op	1017.17 MB/s
BenchmarkHashJoin/name=uniformly_distinct_source/rows=16384-8     	    2000	   1098873 ns/op	 954.23 MB/s
BenchmarkHashJoin/name=uniformly_distinct_source/rows=262144-8    	     100	  18678830 ns/op	 898.19 MB/s
BenchmarkHashJoin/name=uniformly_distinct_source/rows=4194304-8   	       5	 310974801 ns/op	 863.21 MB/s
BenchmarkHashJoin/name=uniformly_distinct_source/rows=67108864-8  	       1	5158706255 ns/op	 832.57 MB/s

@changangela changangela changed the title Hash joiner [wip] exec: hash joiner Oct 16, 2018
@changangela changangela changed the title [wip] exec: hash joiner exec: hash joiner Oct 16, 2018
Copy link
Owner

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very cool stuff. I think this is looking good for a first cut, but I don't fully understand everything in the implementation yet. Many of my comments are about improved documentation.

I would recommend going through and adding documentation to each of the main components - cleaning things up. Then, I think you should switch the implementation to use the exec.Operator version to make sure we don't keep diverging from what's in the main repo, and PR it. We can continue the review there.

const hashTableBucketSize = 1 << 16

type hashTableInt struct {
first []int
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the other code in this repo isn't well commented, but we'll need to add comments to all of these fields once we productionize it - otherwise, it'll be impossible for people to understand what's going on.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should describe the contract of hashTableInt as well. How is it used? What does it do exactly? At least a few sentences would be helpful.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, what's first and next? How do they work? What's the overall structure of the hash table, what guarantees does it provide?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I've started migrating this code out to cockroachdb and I promise there is better documentation in there 😆


func (hashTable *hashTableInt) grow(amount int) {
hashTable.next = append(hashTable.next, make([]int, amount)...)
hashTable.keys = append(hashTable.keys, make(intColumn, amount)...)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may cause multiple allocations. Depending on whether you want fine-grained control over how much the slice will grow, I think the right way to do this is to allocate a new slice if the old's slice capacity is too small to fit the new amount, and copy the old slice into the new slice...

I hear Go 1.11 is more optimal for this case (https://go-review.googlesource.com/c/go/+/109517) but unfortunately we're still on 1.10 for other reasons. If this is your bottleneck i'd consider changing it, otherwise I guess we can leave for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense, the implementation's a little different with the ColVec stuff and different types, which we can discuss later.


// hashJoinerIntInt performs a hashJoin joining on two integer columns where the
// left table represents the build relation. It does not work with N - N joins.
type hashJoinerIntInt struct {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would stick this in the top of the file! since it's the first thing somebody will want to read. The rest of the stuff might even belong in a separate file.


hashJoiner.hashTable = makeHashTableInt(hashTableBucketSize, len(hashJoiner.leftCols))

hashJoiner.build()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't belong in Init, which is designed to run before execution starts at all - more of a setup phase than a do work phase. You should put this into Next behind a conditional that will only run once. In distsql we do this with a little state machine infrastructure.


// build performs the build phase of the hash join using the left relation.
// Different builders used different heuristics for the build phase allowing us
// to evaluate cpu-memory trade-offs.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea to have multiple builders! Perhaps we will be able to select a builder at plan time depending on the characteristics of the tables. Do you see any opportunities like that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes definitely :D Found some cool literature on how to optimize the build phase for various tradeoffs... and we will also need different builders/probers when we expand to N-N joins

valCol := builder.hashTable.values[valIdx]

for i:= 0; i < batchSize; i++ {
valCol[i + builder.totalSize + 1] = outCol[i]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another + 1 - why? Seems like you could get rid of these everywhere, maybe, unless I'm missing something.

Copy link
Author

@changangela changangela Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the lack of documentation :P The whole point of the +1 is because in our hash table, the index = 0 is reserved and represents the end of chain. So for every row index, we want to offset by 1 such that next[i + 1] holds the next value in their corresponding bucket chains. Then for consistency, I had keys and values have that same offset by 1. This is how they implemented it in that paper, but now that I think about it, we can just have -1 be equal to end of chain.

break
}

builder.insertBatch(flow, eqColIdx, outCols, batchSize)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a slight complication here which is that you need to be examining the selection vector if it's set... since we don't have a standard way to do that yet, I suggest leaving it out for now, but add a TODO for us to make sure this is fixed later.

builder.hashTable.growNext(builder.totalSize)

for i := 0; i < builder.totalSize; i++ {
builder.hashTable.insertKey(builder.bucket[i], i + 1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this loop is over the same bounds as the one above. Is there a reason you can't/shouldn't do both these steps in one loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. I was trying to figure out why the papers implementation split up the build process into a hash -> bucket -> insert loops, when it could easily have been combined into a single loop. Any idea why they might've done that?

hashTable.next = append(hashTable.next, make([]int, amount)...)
}

func (hashTable *hashTableInt) insertKey(hashKey int, id int) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on this - what does insertKey do exactly?

hashTable.allocated += amount
}

func (hashTable *hashTableInt) insert(hashKey int, key int) (id int) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be unused by the group implementation - why? Also, please add a comment on what it does.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this function before implementing the second builder and it does a bit too much since it inserts to keys which we don't want to do if we are preloading everything.

Copy link
Author

@changangela changangela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I'll continue this in the cockroach repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants