Clone repository:
git clone --recurse-submodules [email protected]:parama/mlsysfinalproject.git
Download SOSD data:
cd SOSD
./scripts/download.sh
Example data layout
project/
├── src/
└── data/
├── wiki_ts_200M_uint64
└── workloads/
└── wiki_ts_200M_uint64_workload100k_alpha1.1
To compile the code, run
sh build.shto compile the executables to the build directory.
Run ./benchmark to benchmark a lookup workloads.
To run SOSD on macOS, we can use the version released under the tag mlforsys19.
After running it locally, we got the following benchmark table
| RMI | RS | ART | FAST | RBS | B-tree | BS | TIP | IS | |
|---|---|---|---|---|---|---|---|---|---|
| amzn32 | 307 | 1222 | n/a | 602 | 376 | 674 | 1100 | 824 | 5021 |
| face32 | 338 | 1018 | 1478 | 515 | 339 | 566 | 954 | 1480 | 1235 |
| logn32 | 116 | 194 | n/a | 561 | 553 | 509 | 880 | 770 | n/a |
| norm32 | 92.2 | 604 | 1596 | 337 | 382 | 562 | 944 | 1472 | 11482 |
| uden32 | 69.7 | 568 | 138 | 486 | 373 | 526 | 815 | 191 | 55.5 |
| uspr32 | 202 | 456 | n/a | 469 | 319 | 602 | 847 | 499 | 499 |
| amzn64 | 979 | 1689 | n/a | n/a | 866 | 727 | 1013 | 897 | 5866 |
| face64 | 981 | 1816 | 10702 | n/a | 424 | 700 | 1758 | 1317 | 2000 |
| logn64 | 662 | 1103 | 386 | n/a | 879 | 647 | 1133 | 1234 | n/a |
| norm64 | 105 | 813 | 1856 | n/a | 434 | 587 | 1719 | 1219 | 13017 |
| osmc64 | 1238 | 1352 | n/a | n/a | 577 | 732 | 956 | 7245 | 106801 |
| uden64 | 72.1 | 664 | 147 | n/a | 403 | 575 | 923 | 223 | 77.8 |
| uspr64 | 243 | 1374 | 345 | n/a | 334 | 622 | 1323 | 506 | 628 |
| wiki64 | 774 | 1189 | n/a | n/a | 482 | 723 | 990 | 2089 | 9819 |
| avg | 441 | 1004 | 2081 | 495 | 481 | 625 | 1096 | 1426 | 13041 |
1. Setup
git clone --depth 1 --branch mlforsys19 https://github.com/learnedsystems/SOSD.git
cd SOSD
. ./scripts/setup_anywhere.sh2. Runs
scripts/download.shdownloads and stores required data from the Internetscripts/build_rmis.shcompiles and builds the RMIs for each datasetscripts/prepare.shconstructs query workloads and compiles the benchmarkscripts/execute.shexecutes the benchmark on each workload, storing the results inresultsscripts/create_leaderboard.shgathers all results inresultsinto an easy to read table
or to run all the steps use reproduce.sh