A grammar self-index of a text
- C++ >= 17
- CMake >= 3.7
- SDSL-lite
The xxHash library is already included in the source files. We include a CMake module that will search for the local installation of the SDSL-library. No need to indicate the path during the compilation.
Clone repository, enter the project folder and execute the following commands:
mkdir build
cd build
cmake ..
make
./lpg index tests/sample_file.txt
The command above will produce the file sample_file.txt.lpg_idx
The current implementation expects a string ending with the null '\0' character. If you have a collection rather than a single string, you need to concatenate the input into one sequence and then append '\0'. Notice that the program will crash if '\0' appears in other places of the file different from the end.
Assuming you are in the folder build
inside the repository. You can run a search example as:
./lpg search sample_file.txt.lpg_idx -F ../tests/sample_file.rand_pat_100_10
The -F
flag expects a file with the pattern list (one element per line). Alternatively, you can use --p
to pass a pattern
in place. The in-place option can take multiple inputs. For instance, --p pat1 pat2 pat3 ..
or
--p pat1 --p pat2 --p pat3
. The options -F
and --p
are complementary, meaning that the program will search for
the combined pattern collection.
The -r
flag in the command line will report the number of occurrences and the elapsed time individually per input
pattern. If you do not use this flag, the program will print the sum of all the pattern occurrences and the total
elapsed time to get them.
This repository is a legacy implementation that has yet to be tested in massive inputs. If you find bugs, please report them here.
If you use this code, please cite the following paper:
Díaz-Domínguez, D., Navarro, G., & Pacheco, A..
An LMS-based grammar self-index with local consistency properties.
In Proc. 28th Symposium on String Processing and Information (SPIRE 2021).