Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better compression of data structure #6

Open
tokee opened this issue Jul 4, 2019 · 1 comment
Open

Better compression of data structure #6

tokee opened this issue Jul 4, 2019 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@tokee
Copy link
Contributor

tokee commented Jul 4, 2019

The current data structure with nodes-information is list of JSON-blocks, e.g.

{d:"example.com", x:-123.456, y:789.101, d:14, in:[7,103], out:[567, 2009]}

Switching to a single string/entry would minimize the number of objects on the head and still be unambiguous:

"example.com -123.456 789.101 14 [7,103] [567,2009]"

Switching to integer-coordinates by multiplying with 1000

"example.com -123456 789101 14 [7,103] [567,2009]"

and switching to base64 encoding for all integers should pack this further.

TODO: Provide a sample of base64-encoded integers below.

@tokee tokee added the enhancement New feature or request label Jul 4, 2019
@tokee tokee self-assigned this Jul 4, 2019
@tokee
Copy link
Contributor Author

tokee commented Jul 17, 2019

As of 2019-07-17, the data structure for a single node is

{d:"bbc.co.uk", fs:12, x:260.00223, y:-341.51917, r:5.0, in:"7(167.972244,-325.335693~230.078674,-352.999908#c0c0c0);143(178.626465,-311.485352~230.768097,-347.046112#c0c0c0)", out:"48910(-231.007614,-3011.336426~-19.427139,-3377.106689#ff5584)"}

A compact binary representation has 2 gains:

  1. Faster loading
  2. Less memory overhead in the browser

It could be introduced with the following methods:

  • Multiply all decimal numbers with 1,000,000 and represent them as a signed 4-byte integer
  • Represent l (a single link) as 4 (node index) + 24+24 (infix coordinates) = 20 bytes
  • Represent fs and r as an unsigned 2-byte integer (multiplying with 10 first)
  • Represent nc node-color as 3 bytes and drop the colors from links
  • Introduce dc (domain chars) as 1 byte unsigned integer
  • Introduce ic and oc (in- and out-count) as 2 byte unsigned integer
  • Align links to 4 bytes in the structure

which turns the given sample into

[x 4] [y 4] [r 2] [fs 2] [ic 2] [oc 2] [nc 3] [dl 1] bbc.co.uk [l 20][l 20] [l20]

Note how this aligns reasonably well to 4 bytes and how coordinated & domains starts at a fixed point in the structure, making scanning deterministic and hopefully fast. The sample structure uses 92 bytes, down from 239 bytes. If represented as base64 when loading it becomes 122 bytes or about half.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant