-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ASCII-only option, to mimic default RE2 behavior #1
Conversation
b3092ca
to
4cabc4f
Compare
CI is great and it's ready to be merged. As I do not have rights neither to push to this repo, nor add reviews - \cc @kuba-- @creachadair for a quick review. |
This is a workaround, motivated by the difference in handling non-valid UTF8 bytes that Oniriguma has, compared to Go's default RE2. See src-d/enry#225 (comment) Summary of changes: - c: prevent `NewOnigRegex()` from hard-coding UTF8 - c: `NewOnigRegex()` now propely calls to `onig_initialize()` [1] - go: expose new `MustCompileASCII()` \w default charecter class matching only ASCII - go: `MustCompile()` refactored, `initRegexp()` extracted for common UTF8/ASCII logic Encoding was not exposed on Go API level intentionaly for simplisity, in order to avoid introducing complex struct type [2] to API surface. 1. https://github.com/kkos/oniguruma/blob/83572e983928243d741f61ac290fc057d69fefc3/doc/API#L6 2. https://github.com/kkos/oniguruma/blob/83572e983928243d741f61ac290fc057d69fefc3/src/oniguruma.h#L121 Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
Update deb to get fix https://bugs.launchpad.net/ubuntu/+source/dpkg/+bug/1730627 Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
Thank you for the prompt reviews! Feedback was addressed in last 2 commits and it's ready for another round. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more minor comment fixing a typo, otherwise looks good to me!
.travis.yml
Outdated
- sudo apt-get install -y dpkg # dpkg >= 1.17.5ubuntu5.8, which fixes https://bugs.launchpad.net/ubuntu/+source/dpkg/+bug/1730627- sudo dpkg -i libonig-dev_6.9.1-1_amd64.deb | ||
- wget http://archive.ubuntu.com/ubuntu/pool/universe/libo/libonig/libonig5_6.9.1-1_amd64.deb | ||
- sudo dpkg -i libonig5_6.9.1-1_amd64.deb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need non dev version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it's a direct dependency of the dev version
Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
@kuba-- @creachadair thank you for kind feedback! All has been addressed and CI is green Waiting for this to be merged&released to go forward with src-d/enry#227 that is the last thing blocking enry/v2 release with go modules. @smola do you know who has right GH permissions to merge/cut a release of go-oniguruma? |
Signed-off-by: Alexander Bezzubov <[email protected]>
@kuba-- thank you for merging! Would you be so kind to ping me, when the next (I guess |
@bzz - pushed v1.1.0 |
This is a workaround, motivated by the difference in handling non-valid UTF8
bytes that Oniriguma has, compared to Go's default RE2.
See src-d/enry#225 (comment)
Summary of changes:
NewOnigRegex()
calls toonig_initialize()
[1]NewOnigRegex()
from hard-coding UTF8MustCompileASCII()
\w default character class matching only ASCIIMustCompile()
refactored,initRegexp()
extracted for common UTF8/ASCII logicEncoding was not exposed on Go API level intentionally for simplicity,
in order to avoid introducing complex struct type [2] to API surface.
Signed-off-by: Alexander Bezzubov [email protected]