Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using prefix code and zigzag encoding for variable-length integers #8

Open
vshymanskyy opened this issue Jul 27, 2022 · 2 comments
Labels

Comments

@vshymanskyy
Copy link
Owner

No description provided.

@JobLeonard
Copy link

Proposal: can we broaden this issue to discuss what variable-length integer encoding in general would be most appropriate?

I was unfamiliar with zigzag encoding and what advantages it would have over the current method, so I searched for "leb128 vs zigzag". I found these two related discussions about LEB128 being used in WASM, both of which raised a number of points that seemed relevant to the topic:

https://news.ycombinator.com/item?id=11263378

WebAssembly/design#601

A quick summary of what stood out to me:

  1. zigzag encoding/decoding is allegedly faster for signed integers than plain signed LEB128
  2. implementations for prefix-based varints are generally 2x-3x faster than LEB128
  3. LEB128 allows for locating the start and end with random access, prefix-based varints don't
  4. In the context of UTF8, having non-canonical representations has lead to security issues.
    • Technically LEB128 also lacks canonical encodings for numbers: zero for example would typically be encoded as 00, but technically 80 00, 80 80 00, ... 80 80 ... 80 00 also work. Should this be forbidden? (in practice this means disallowing LEB128 to have trailing 80s followed by 00 for positive values, and similarly FFs + 7F for negative values. Can't think of the top of my head if this applies to zigzag encoding)

I have no strong opinions on any of this, and I do not know what the most relevant criteria are to decide on a varint encoding for muon anyway. But these points they seemed relevant enough to bring up (even if muon sticks to plain LEB128 you can at least say that you looked at alternatives, right?)

PS: for anyone else unfamiliar with zigzag encoding, this blog and this SO question helped me out

@vshymanskyy vshymanskyy changed the title Consider using Zigzag encoding for negative variable-length integers Consider using Zigzag encoding for variable-length integers Aug 17, 2022
@vshymanskyy vshymanskyy closed this as not planned Won't fix, can't repro, duplicate, stale Dec 29, 2022
@vshymanskyy vshymanskyy reopened this Jan 26, 2024
@vshymanskyy vshymanskyy changed the title Consider using Zigzag encoding for variable-length integers Consider using prefix code and zigzag encoding for variable-length integers Jan 26, 2024
@vshymanskyy
Copy link
Owner Author

I think i'll give it another try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants