General-purpose compression algorithms in memory.
Test Files
There are several test files in the 'test' directory. Each file is in uncompressed form, as well as compressed form using each of the implemented algorithms. The smallest files are good for manual debugging during the initial stages of implementation. The medium size files are for getting some confidence that corner cases are implemented correctly. The largest files are good for benchmarking performance under various scenarios and testing memory bounds in WRAM and MRAM.
- alice (312 bytes) - some text from 'Alice In Wonderland'
- coding (9423 bytes) - the Linux coding standard
- terror2 (105,438 bytes) - some text from the 'Terrorists Handbook'
- plarbn12 (481,861 bytes) - some poetry
- world192 (1,150,480 bytes) - some text from the CIA World Fact Book
- xml (5,345,280 bytes) - collected XML files from Silesia Corpus
- sao (7,251,945 bytes) - the SAO star catalog
- dickens (10,192,446 bytes) - collected works of Charles Dickens
- nci (33,553,445 bytes) - chemical database of structures
- mozilla (51,220,480 bytes) - tarred executables of Mozilla 1.0
- spamfile (84,217,482 bytes) - snapshot of collected spam emails
- [missing] (~64MB) - to stress out the maximum size of MRAM for the DPU
Snappy is written by Google. It is meant for simple & fast decoding, generally for text files which are highly repetitive. The compression ratio can be quite good ('terror2' has a 2:1 ratio) considering the simplicity of the algorithm. To encode files using the original Snappy format, the "scmd" tool found here can be used. The format used in our compressor/decompressor has been slightly modified to allow for a multi-threaded implementation. A description of this format can be found here, alongside the compressor that was used to generate the test files.