-
Notifications
You must be signed in to change notification settings - Fork 109
Multi column indexes are only used if all columns share operation #261
Comments
Closing, as discussed via slack with @ajnavarro this is expected behaviour. |
maybe we should have a look of how is working on mysql or postgres: https://dev.mysql.com/doc/refman/8.0/en/multiple-column-indexes.html |
In MySQL, it uses the index as long as the value of the leftmost column is specified. e.g., for an index on |
What are we going to do with this in the end @ajnavarro? |
Let's keep that issue open until we find a good solution for that. Right now is not a high priority. |
What if we do not have indexes on multiple columns (internally), so the index on (A, B) would be the same as index on A and index on B. |
I think that might work. I can't think of any reason this may be bad right now. In fact, it should actually help with updates (less indexes to update once repos change). Only downside I can see is the fact that you will have to check all remaining indexes to see if you can delete a pilosa index or not, because it may be in use in another gitbase index. WDYT @src-d/data-retrieval? |
If it is internally, at pilosa implementation level, I'm totally in on that. We should save on pilosa metadata the pilosa indexes that go-mysql-server indexes are using to know if we can delete the pilosa index on a |
Ok, so I'll start prototype the idea of dedup indexes (#261 (comment)) |
I noticed a small thing to improve. Having the following index:
Following
but
but actually, if we have independent indexes (so in this case 2) then for logic operations we may always merge 2 indexes. |
It's quite easy to fix the problem with index on (A, B) which can be used only with one condition, e.g.:
it may require some convention between driver and analyzer (for instance we may always pass to index lookup as many keys as index expressions but we have to keep the order and lookup will skip nil keys), e.g.: // index (A, B), WHERE A=5
index.Get({5, nil}) |
But if you create one index with two columns, we shouldn`t try to use the index with only one of the columns. This can break the intended way to communicate between the Analyzer and the different kind of indexes. Not all indexes can do the same as we are doing with pilosa index, so we should keep the common interface with some constraints. |
@ajnavarro - right, it was just experiment, because with bitmaps it's doable. |
Generally with bitmaps I don't see benefits of having multi-column indexes. It works better if we have index per expression and merge them because merging bitmaps is super fast and it's more flexible from composition point of view. |
Notes from slack:If you create one index on expressions (A, B) , actually we don't index tuples, but independently A values and B values as pilosa fields, so under the hood they are already independent structures.
In this case for requests |
@erizocosmico - where is the main problem of using indexes with 2 different operations? E.g.:
and if we have index per columns (A), (B) instead of one (A, B) it will work? |
There is no problem, we just only did it when they share operations. |
Because our methods for
Index
accept...interface{}
, where the len of that slice is the length of columns in the index, all those methods require all the values, one for each column.This creates a problem: all columns must use the exact same operation.
For example, consider we have an index on
A
andB
:A = 1 AND B = 1
will use the index.A > 1 AND B > 5
will use the index.A = 1 AND B < 5
will not, because=
and<
are not the same operation.The text was updated successfully, but these errors were encountered: