Skip to content

Latest commit

 

History

History
63 lines (46 loc) · 3.45 KB

README.md

File metadata and controls

63 lines (46 loc) · 3.45 KB

BuildIt RegEx

This repo contains an implementation of a regular expression library using BuildIt.

We currently support the following types of matches:

  • full match that checks if the regex exactly matches the text; example code is given in ./samples/sample1.cpp
  • partial match with binary output with an option to extract the first match (./samples/sample2.cpp)
  • all partial matches returned as a list of strings (./samples/sample3.cpp); the output of all partial matches is the same as the output of reapeatedly applying the PCRE or RE2 FindAndConsume function that gives non-overlapping leftmost longest matches

We support the following operators and expressions.

Expression Description
. any character
[xyz], [^xyz] character class
[a-z], [^a-z] character range
x? zero or one x
x+ one or more x
x* zero or more x
(x|y) x or y
x{n} x repeated n times
x{n,m} x repeated between n and m times inclusive
\d, \w, \s, \D, \W escaped character classes

We have a couple of flag options that affect the way the code is generated:

  • specifying the number of interleaving parts for partial matches
  • splitting the code generation on | characters
  • grouping multiple consecutive states into one
  • ignore_case to match both upper and lowercase
  • greedy - set to true to prefer longer partial matches

These options can be set using the RegexOptions struct as shown in ./samples/sample2.cpp.

To compile the code run make from the root directory. To run the sample1 code for example, run ./build/sample1.

Code structure

  • The main code is in ./src and ./include.
  • Testing code is in ./test.
  • Code for measuring performance is in ./benchmarks.

Setting up the benchmarks

Hyperscan

  • To build Hyperscan follow the steps 2 and 3 from here.
  • Use one of the scripts in ./benchmarks/hyperscan/tools/hsbench/scripts to create a corpus SQLite database.
  • Add the regex patterns to a file following this format.
  • From the hyperscan build directory run build/bin/hsbench -e <pattern_file> -c <corpus.db>. More directions are available here.

RE2

  • To build RE2 run make in the ./benchmarks/re2/ directory.

To run the timing experiments on the Twain dataset run ./build/preformance in the ./benchmarks directory.

Datasets

Twain