Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding another regexp engine #167

Open
vmarkovtsev opened this issue Oct 21, 2018 · 7 comments
Open

Consider adding another regexp engine #167

vmarkovtsev opened this issue Oct 21, 2018 · 7 comments

Comments

@vmarkovtsev
Copy link
Collaborator

We have already seen that Oniguruma improves enry's performance by a high margin. There are bindings to the regular expression engine in Rust - https://github.com/BurntSushi/rure-go
People say that Rust has one of the fastest implementations, so it makes sense to check how enry works on rure-go.

@bzz
Copy link
Contributor

bzz commented Oct 23, 2018

This looks interesting! Thank you for bringing it up.

Before venturing into integration, I think it's better to find or produce a head-to-head comparison between Rust-based rule-go and C-based Oniguruma regexp impls:

  • feature (do they support same set of features? esp. those outside of RE2 that enry might rely on)
  • performance (how do they both perform on same synthetic benchmark)

Knowing that would help us to make an informed decision on integration.

Same process could be applied to other ways of speeding regexps up e.g using pre-compiled state machines like ragel.

@ajnavarro
Copy link
Contributor

Note that we had some concurrency problems using oniguruma on gitbase: src-d/gitbase#544

We stopped generating binaries using oniguruma, it is not stable.

Other solutions comparison: https://rust-leipzig.github.io/regex/2017/03/28/comparison-of-regex-engines/

Also, there are available other solutions based on Onigmo, an Oniguruma fork: https://github.com/ungerik/gonigmo

@vmarkovtsev
Copy link
Collaborator Author

Another possibility: https://github.com/logrusorgru/grokky based on re2

@kuba--
Copy link

kuba-- commented Nov 3, 2018

rust implementation is just inspired by https://github.com/google/re2.

btw. these libraries are not huge and can improve performance for many projects/teams, so why not to start porting one of solutions to native go (maybe tailored for our needs).
At least we can get rid of cgo.

or just read :)
https://medium.com/@dgryski/speeding-up-regexp-matching-with-ragel-4727f1c16027

@bzz bzz changed the title Consider adding another regexp engine - this time from Rust Consider adding another regexp engine Feb 14, 2019
@bzz
Copy link
Contributor

bzz commented Feb 14, 2019

To keep the party going - if interested in complicating build and CI envs, there is also a https://github.com/intel/hyperscan

@smola
Copy link
Contributor

smola commented Mar 15, 2019

Note that gitbase will continue to use oniguruma. We forked the go bindings for mainteinance.
https://github.com/src-d/go-oniguruma
But AFAIK this introduces race conditions that currently require to be avoided by the user.

@kuba-- or @ajnavarro know more about this.

@kuba--
Copy link

kuba-- commented Mar 15, 2019

go-mysq-server already switched to go-oniguruma as a default regex engine, so gitbase will auto-switch after upgrading mysql.
Regarding race condition - it was already solved in gitbase by introducing a pool of regex parsers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants