Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support signed differential coding #9

Closed
AnthonyLloyd opened this issue May 15, 2015 · 9 comments
Closed

Support signed differential coding #9

AnthonyLloyd opened this issue May 15, 2015 · 9 comments

Comments

@AnthonyLloyd
Copy link

Would you have a recommendation for data like a price tick stream where the values are uint32 and there are small deltas but are unsorted? The next price should be a small delta from the previous but it would be signed.

@lemire lemire changed the title Question on unsorted but continuous data Supposed signed differential coding May 15, 2015
@lemire lemire changed the title Supposed signed differential coding Support signed differential coding May 15, 2015
@lemire
Copy link
Member

lemire commented May 15, 2015

For now, the lame answer is that the library only supports unsigned deltas.

The more interesting answer is that I believe I know how to do this properly and I am interested in doing it. So I am turning your issue into a feature request.

So if you are willing to wait a bit (or better: help a bit) then this could be resolved in the relatively-near future.

@AnthonyLloyd
Copy link
Author

That sounds great. I'm happy to help if I can. I'm not an expert in this area but will give it a go.

@lemire
Copy link
Member

lemire commented May 15, 2015

@AnthonyLloyd Ok. Once I have something, I will get back to you. (There might be some delay however... as I have many other things to finish first. You know how it is.)

@AnthonyLloyd
Copy link
Author

Sure thanks

@cruppstahl
Copy link
Contributor

How about this: store the absolute value of the (signed) delta, shift it 1 bit to the left and store the signed bit in the lsb? This should result in small unsigned values.

int delta = ...;
uint32_t udelta = (abs(delta) << 1) | signed_bit;

@lemire
Copy link
Member

lemire commented May 15, 2015

@cruppstahl @AnthonyLloyd

You can use zigzag encoding which is what I think you are alluding to:

https://github.com/lemire/JavaFastPFOR/blob/master/src/main/java/me/lemire/integercompression/DeltaZigzagEncoding.java#L38

That is the basic approach. Now, you could take your data, run zigzag encoding through its deltas and then pack it all up with this library (simdcomp). It would work.

Well, if someone wanted to help, he could implement a zigzag encoding approach, that would be useful as it could serve as a reference for a better approach that could come later.

@AnthonyLloyd
Copy link
Author

That looks like what needs to be done but it would be very good is there was a SIMD way of doing it.

@AnthonyLloyd
Copy link
Author

I'll have a go at doing this. I'll see if I can do it with SIMD instructions. Do you think there could be another approach apart from zigzag?

@lemire
Copy link
Member

lemire commented May 18, 2015

@AnthonyLloyd

Zigzag encoding is the standard way. I think it is the reference.

Yes, there are other ways, of course, but I think one must first start with zigzag encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants