Skip to content

Compression format

Konstantin Mochalov edited this page Aug 1, 2019 · 3 revisions

Compression format is based on LZ77 algorithm. Compressed file is bit-based (byte boundaries are meaningless). It's a sequence of nodes:

  • Node type, 1 bit
  • Depending on node type:
    • 1: output next 8 bits
    • 0: output length + 2 bytes from window, starting from offset (measured from the start of window)
      • 12 bits: offset
      • 4 bits: length

So, "simple" node is 9 bits long, "window reference" node is 17 bits long.

Bits are in "big-endian" order (don't confuse with big-endian byte order).

Window is 0x1000 bytes long. As bytes are outputted, they're also written to the window at current write pointer, which is initially at position 1 (the second byte of window, 0 is the first). All writes (and possibly reads, should be checked) of window use wraparound, i.e. 3 bytes starting at 0x0ffe would be at 0x0ffe, 0x0fff, 0x0000.

You should know when to end decoding, it's encoded in header of parent [obfuscated format]. You shouldn't rely on "less than 9 bits left" as EOF.

Clone this wiki locally