The bsc-m03 is experimental block sorting compressor based on M03 context aware compression algorithm invented by Michael Maniscalco:
- Michael Maniscalco M03: A solution for context based blocksort (BWT) compression, 2004
- Jurgen Abel Post BWT stages of the Burrows-Wheeler compression algorithm, 2010
Moreover, the bsc-m03 compressor is a practical implementation of Compression via Substring Enumeration for byte-oriented sources:
- Danny Dube, Vincent Beaudoin Lossless Data Compression via Substring Enumeration, 2010
- Takahiro Ota, Hiroyoshi Morita, Akiko Manada Compression by Substring Enumeration with a Finite Alphabet Using Sorting, 2018
Copyright (c) 2021-2023 Ilya Grebnov [email protected]
The bsc-m03 is released under the GNU General Public License
- 2023-05-08 : Version 0.5.5
- Fixed segmentation fault on Unix based systems.
- 2022-11-27 : Version 0.5.0
- Compression ratio improvements.
- 2022-11-20 : Version 0.4.0
- Compression ratio improvements.
- 2022-11-10 : Version 0.3.0
- Compression ratio improvements.
- 2022-01-08 : Version 0.2.1
- Performance improvements.
- 2022-01-05 : Version 0.2
- Memory usage improvements.
- Compression ratio improvements.
- 2021-12-07 : Version 0.1.1 - 0.1.2
- Minor compression ratio improvements.
- 2021-12-03 : Version 0.1.0
- Initial public release of the bsc-m03.
File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
---|---|---|---|
bib | 111261 | 24479 | 1.760 |
book1 | 768771 | 203745 | 2.120 |
book2 | 610856 | 138870 | 1.819 |
geo | 102400 | 52465 | 4.099 |
news | 377109 | 105621 | 2.241 |
obj1 | 21504 | 9775 | 3.637 |
obj2 | 246814 | 68003 | 2.204 |
paper1 | 53161 | 14957 | 2.251 |
paper2 | 82199 | 22594 | 2.199 |
pic | 513216 | 44424 | 0.692 |
progc | 39611 | 11257 | 2.274 |
progl | 71646 | 13512 | 1.509 |
progp | 49379 | 9248 | 1.498 |
trans | 93695 | 15310 | 1.307 |
File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
---|---|---|---|
alice29.txt | 152089 | 38562 | 2.028 |
asyoulik.txt | 125179 | 35889 | 2.294 |
cp.html | 24603 | 6872 | 2.235 |
fields.c | 11150 | 2685 | 1.926 |
grammar.lsp | 3721 | 1120 | 2.408 |
kennedy.xls | 1029744 | 57440 | 0.446 |
lcet10.txt | 426754 | 94823 | 1.778 |
plrabn12.txt | 481861 | 129770 | 2.154 |
ptt5 | 513216 | 44424 | 0.692 |
sum | 38240 | 11426 | 2.390 |
xargs.1 | 4227 | 1585 | 3.000 |
File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
---|---|---|---|
bible.txt | 4047392 | 698395 | 1.380 |
E.coli | 4638690 | 1126125 | 1.942 |
world192.txt | 2473400 | 376173 | 1.217 |
File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
---|---|---|---|
dickens | 10192446 | 2199344 | 1.726 |
mozilla | 51220480 | 15589159 | 2.435 |
mr | 9970564 | 2156826 | 1.731 |
nci | 33553445 | 1126386 | 0.269 |
ooffice | 6152192 | 2503991 | 3.256 |
osdb | 10085684 | 2223002 | 1.763 |
reymont | 6627202 | 958772 | 1.157 |
samba | 21606400 | 3794300 | 1.405 |
sao | 7251944 | 4649723 | 5.129 |
webster | 41458703 | 6253627 | 1.207 |
xml | 5345280 | 357958 | 0.536 |
x-ray | 8474240 | 3681388 | 3.475 |
File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
---|---|---|---|
chr22.dna | 34553758 | 7206269 | 1.668 |
etext99 | 105277340 | 21422251 | 1.628 |
gcc-3.0.tar | 86630400 | 10046880 | 0.928 |
howto | 39422105 | 7504315 | 1.523 |
jdk13c | 69728899 | 2612434 | 0.300 |
linux-2.4.5.tar | 116254720 | 16351863 | 1.125 |
rctail96 | 114711151 | 9707347 | 0.677 |
rfc | 116421901 | 14871775 | 1.022 |
sprot34.dat | 109617186 | 17157222 | 1.252 |
w3c2 | 104201579 | 5598687 | 0.430 |
File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
---|---|---|---|
A10.jpg | 842468 | 823533 | 7.820 |
AcroRd32.exe | 3870784 | 1555832 | 3.216 |
english.dic | 465211 | 145096 | 2.495 |
FlashMX.pdf | 4526946 | 3712716 | 6.561 |
FP.LOG | 20617071 | 502648 | 0.195 |
MSO97.DLL | 3782416 | 1878076 | 3.972 |
ohs.doc | 4168192 | 803171 | 1.542 |
rafale.bmp | 4149414 | 745470 | 1.437 |
vcfiu.hlp | 4121418 | 604165 | 1.173 |
world95.txt | 2988578 | 442271 | 1.184 |
File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
---|---|---|---|
enwik8 | 100000000 | 20263925 | 1.621 |
enwik9 | 1000000000 | 160018905 | 1.280 |
File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
---|---|---|---|
dblp.xml | 296135874 | 21926695 | 0.592 |
dna | 403927746 | 86414423 | 1.711 |
english.1024MB | 1073741824 | 193810792 | 1.444 |
pitches | 55832855 | 16984071 | 2.434 |
proteins | 1184051855 | 304486803 | 2.057 |
sources | 210866607 | 29749020 | 1.129 |
File name | Input size (bytes) | Output size (bytes) | Bits per symbol |
---|---|---|---|
cere | 461286644 | 8576879 | 0.149 |
coreutils | 205281778 | 4293243 | 0.167 |
einstein.de.txt | 92758441 | 132286 | 0.011 |
einstein.en.txt | 467626544 | 336029 | 0.006 |
Escherichia_Coli | 112689515 | 7928044 | 0.563 |
influenza | 154808555 | 1760692 | 0.091 |
kernel | 257961616 | 2955825 | 0.092 |
para | 429265758 | 10730998 | 0.200 |
world_leaders | 46968181 | 518220 | 0.088 |
fib41 | 267914296 | 83 | 0.000 |
rs.13 | 216747218 | 86 | 0.000 |
tm29 | 268435456 | 158 | 0.000 |