-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 64 bit implementation of roaring.Bitmap #245
Conversation
Fixes #136 - many tests fail - tests have not been augmented with 64 bit values - serialization is explicitly not part of this PR - no special attention has been given to copy-on-write concerns
@lemire do you mind giving this an early review? I want to make sure I'm on the right track. |
@jacobmarble This looks like solid code to me. It ressembles the C++ and Java counterparts of the same design. Obviously, I can only encourage the addition of more testing, and if you are able, I'd throw in fuzz testing as well. |
Any update here? |
@kevinconaway There are some tests and it looks good, but the tests are not as thorough as one would like. I am nervous about merging this and having people's database blow up. We also could use performance and memory usage measures. Would you be willing to add new tests and take it out on a spin? |
This issue #252 might reference this PR. |
This work is really needed. Can I help in some way? |
@guymolinari Yes. Can you review and test this PR? There are tests, but I concerned that they may not be thorough enough. We don't want this code to blow up when it goes into production. |
Will do @lemire. Have a large project with 100s of terabytes of data that will use this. |
Thanks for the chatter here. I wrote what is here so far and then got busy with other work. |
How do you wan't to handle this @jacobmarble? I will definitely need the serialization stuff and am willing to dig in and do it. Also, more tests as needed. I have another outstanding PR with @lemire so I'm not sure working on my fork would make sense. Never forked another owners fork. Should I go this route? |
You could pull down my branch and layer new commits on top, or cherry pick them into your branch. It would be nice to see my name on my commits, but that's all. |
@jacobmarble Could you grant write permissions to me for this repo? |
@guymolinari I've added you as a collaborator, see if that works |
Getting a 403 response
…On Tue, Jun 16, 2020, 1:08 PM Jacob Marble ***@***.***> wrote:
@guymolinari <https://github.com/guymolinari> I've added you as a
collaborator, see if that works
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#245 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADZZAUWLQ6XIIWCPQ4VQST3RW7GNHANCNFSM4LSJUL2A>
.
|
Actually, I had to accept the invitation first. I can now push a new
branch.
Thanks,
Guy
…On Tue, Jun 16, 2020 at 1:26 PM Guy Molinari ***@***.***> wrote:
Getting a 403 response
On Tue, Jun 16, 2020, 1:08 PM Jacob Marble ***@***.***>
wrote:
> @guymolinari <https://github.com/guymolinari> I've added you as a
> collaborator, see if that works
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#245 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ADZZAUWLQ6XIIWCPQ4VQST3RW7GNHANCNFSM4LSJUL2A>
> .
>
|
Added serialization code and test.
@guymolinari Keep us posted. |
Will do @lemire @jacobmarble. Should have an update tomorrow. Next step is to increase test coverage and then should be ready to integrate with my project. |
Added unit tests for 64 bit checked add and remove. Fix bug in Check…
Expand 64 bit value tests, try to improve coverage.
I'm checking in test cases and coverage percentage is dropping. I'm going to call it good for now and move on to integrating the 64 bit version into my project and see how it works. We should have functional parity between the legacy and 64 bit APIs.
Cheers, |
@lemire @jacobmarble The roaring64 implementation is missing the ParOr function so I will have to add that. It is required for my project and also the new PR for the bit slice indexing library. @lemire I guess we have a separate 64 bit version of that as well? |
Great work.
Is that blocking for you, or something you can add later?
Can your rephrase?
I am not forgetting that PR, but let us keep things separate for now. |
@lemire It is a blocker but not for long. I'll implement in the next
day or two. I have some large data sets to process and will have a good
idea of processing times and can verify correctness against another system
of record.
…On Mon, Jun 22, 2020 at 7:17 PM Daniel Lemire ***@***.***> wrote:
Great work.
The roaring64 implementation is missing the ParOr function so I will have
to add that. It is required for my project
Is that blocking for you, or something you can add later?
I guess we have a separate 64 bit version of that as well?
Can your rephrase?
and also the new PR for the bit slice indexing library.
I am not forgetting that PR, but let us keep things separate for now.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#245 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADZZAUQRMZJO5P6ABVOZFWLRYAGDNANCNFSM4LSJUL2A>
.
|
Added basic implementations of fast aggregates and ParOr.
Added some temporary hacked up aggregations tests.
@lemire @jacobmarble I am happy to say that application level testing went well. I upgraded my system to utilize the new 64 bit library and processed about 1TB of data. I tested under 2 scenarios. Both results were compared against the system of record without issue. Another thing of note. I had to create a 64 bit version of the new BSI library as well as it is also heavily utilized (will submit a separate PR when you are ready @lemire).
Either way I think this is a +1 from my perspective. Additional test coverage would be nice. Also, a few features need to be ported to 64 bit as well. |
@guymolinari Note that I merged the BSI code and then applied maintenance on it. |
@guymolinari Did we test the COW functionality at some point? |
Awesome.
…On Sat, Jun 27, 2020, 8:49 AM Daniel Lemire ***@***.***> wrote:
@guymolinari <https://github.com/guymolinari> Note that I merged the BSI
code and then applied maintenance on it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#245 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADZZAURV2NLRQAKRQ3TMNV3RYYIHRANCNFSM4LSJUL2A>
.
|
I do believe I added some related tests. I'll double check this evening.
…On Sat, Jun 27, 2020, 8:49 AM Daniel Lemire ***@***.***> wrote:
@guymolinari <https://github.com/guymolinari> Did we test the COW
functionality at some point?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#245 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADZZAUV46DTRJESEWNVEKPLRYYIJZANCNFSM4LSJUL2A>
.
|
@lemire I moved the COW tests from roaring into roaring64. One of the
tests is failing so I'm looking at fixing the bug. This is just testing
the 64 bit API but not 64 bit values. After I fix the issue I will look
at expanding this to 64 bit values.
…On Sat, Jun 27, 2020 at 1:45 PM Guy Molinari ***@***.***> wrote:
I do believe I added some related tests. I'll double check this evening.
On Sat, Jun 27, 2020, 8:49 AM Daniel Lemire ***@***.***>
wrote:
> @guymolinari <https://github.com/guymolinari> Did we test the COW
> functionality at some point?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#245 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ADZZAUV46DTRJESEWNVEKPLRYYIJZANCNFSM4LSJUL2A>
> .
>
|
Added COW tests. Fix COW bug with AddMany()
Added some 64 bit values to COW tests.
Add ParOr test with NumCPU thread count.
Try to improve coverage of ParOr.
Add 64 bit value test for ParOr.
@guymolinari Any update? |
Hello @lemire,
Things are looking pretty good (I think). There are a few higher
level features (FastAnd, HeapOr, etc.) that haven't been migrated to 64 bit
yet. Not sure if we need these to get the PR over the finish line or
not. I do have a 64 bit version of BSI that I haven't teed up a PR for yet.
Let me know what I can do to help.
…On Wed, Jul 15, 2020 at 11:10 AM Daniel Lemire ***@***.***> wrote:
@guymolinari <https://github.com/guymolinari> Any update?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#245 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADZZAUQ6GJXEBFF3RXGPHGTR3XWLBANCNFSM4LSJUL2A>
.
|
@guymolinari If you could just let us know whether you recommend that the PR be merged, that would be great. If you do, I will then review it and hopefully merge it soon after. Missing features that you do not consider essential are ok. They can come later. |
This is all additive. It works. Test coverage passes.
+1 to review and merge
@lemire for review
…On Wed, Jul 15, 2020 at 2:03 PM Daniel Lemire ***@***.***> wrote:
@guymolinari <https://github.com/guymolinari> If you could just let us
know whether you recommend that the PR be merged, that would be great. If
you do, I will then review it and hopefully merge it soon after.
Missing features that you do not consider essential are ok. They can come
later.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#245 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADZZAURDCJTKJIYJ5HPR5W3R3YKQ3ANCNFSM4LSJUL2A>
.
|
It looks good to me. I will merge and possibly do some minor edits. |
Fixes #136
This PR is very much a work-in-progress. I've tried to stay with the original 32 bit method signatures as much as possible. Serialization is explicitly not part of this PR.