mvzr: The Minimum Viable Zig Regex Library

Finding myself in need of a regular expressions library for a Zig project, and needing it to build regex at runtime, not just comptime, I ended up speedrunning a little library for just that purpose.

This is that library. It's a simple bytecode-based Commander Pike-style VM. Just under 1500 lines of load-bearing code, no dependencies other than std.

The provided Regex type allows 64 'operations' and 8 unique ASCII character sets. If you would like more, or less, you can call SizedRegex(num_ops, num_sets) to customize the type.

Installation

Drop the file into your project, or use the Zig build system:

zig fetch --save "https://github.com/mnemnion/mvzr/archive/refs/tags/v0.2.5.tar.gz"

I'll do my best to keep that URL fresh, but it pays to check over here: ➔

For the latest release version.

Features

Zero allocation, comptime and runtime compiling and matching
X operations per regex
Y character sets per regex
Greedy qualifiers: *, +, ?
Lazy qualifiers: *?, +?, ??
Possessive/eager qualifiers: *+, ++, ?+
Alternation: foo|bar|baz
Grouping foo|(bar|baz)+|quux
Sets: [abc], [^abc], [a-z], [^a-z], [\w+-], [\x04-\x1b]
Built-in character groups (ASCII): \w, \W, \s, \S, \d, \D
Escape sequences: \t, \n, \r, \xXX hex format
- Same set as Zig: if you need the weird C ones, use \x format
Begin and end ^ and $
Word boundaries \b, \B
{M}, {M,}, {M,N}, {,N}

Limitations and Quirks

No Unicode support to speak of
No fancy modifiers (you want case-insensitive, great, lowercase your string)
. matches any one byte. [^\n\r] works fine if that's not what you want
- Or split into lines first, divide and conquer
- Note: $ permits a final newline, but ^ must be the beginning of a string, and $ only matches a final newline.
Backtracks (sorry. For this to work without backtracking, we need async back)
- Preliminary tests indicate that this backtracking is non-catastrophic
Compiler does some best-effort validation but I haven't really pounded on it
No capture groups. Divide and conquer

As long as you color within the lines, it should be fine.

This library is not intended for use where an attacker could conceivably control the regex pattern.

Much like managing your own memory, if you know your tools and are smart about it, you can get a lot done with mvzr.

Interface

mvzr.Regex is available at comptime or runtime, and returns an mvzr.Match, consisting of a .slice field containing the match, as well as the .start and .end locations in the haystack. This is a borrowed slice, to own it, call match.toOwnedMatch(allocator), and deallocate later with match.deinit(allocator), or just free the .slice.

Similarly, if you need to store a Regex or SizedRegex for later, call regex.toOwnedRegex(allocator), freeing later with allocator.destroy(heap_regex).

// aka SizedRegex(64, 8)
const regex: mvzr.Regex = mvzr.compile(patt_str).?;
// or mvzr.Regex.compile(patt_str)
const match: mvzr.Match = regex.match(haystack).?;
const match2: mvzr.Match = match(haystack, patt_str).?;
const did_match: bool = regex.isMatch(haystack);
const iter: mvzr.RegexIterator = regex.iterator(haystack);

while (iter.next()) |m| {
    // ...
}

// Comptime-only
const ops, const sets = mvzr.resourcesNeeded("abc?d*[^efgh]++2");

// I suggest adding the values directly here once they're established
const SlimmedDownRegex = mvzr.SizedRegex(ops, sets);

Compile Errors

If a regex string is unable to compile, mvzr will return null. It will also log an informative error message. While this is useful, it may not be desirable, so mvzr uses a scoped logger with the scope .mvzr, to make it easy for a custom logging function to filter those messages out.

Bugs

Fewer over time, I hope. The test suite never shrinks.

Bug Reports

Always welcome. Ideally, presented as a failing test block, with a note on expected behavior.

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.zig		build.zig
build.zig.zon		build.zig.zon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mvzr: The Minimum Viable Zig Regex Library

Installation

Features

Limitations and Quirks

Interface

Compile Errors

Bugs

Bug Reports

About

Releases 19

Packages

Languages

License

mnemnion/mvzr

Folders and files

Latest commit

History

Repository files navigation

mvzr: The Minimum Viable Zig Regex Library

Installation

Features

Limitations and Quirks

Interface

Compile Errors

Bugs

Bug Reports

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 19

Packages 0

Languages

Packages