Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ASCII-only option, to mimic default RE2 behavior #1

Merged
merged 9 commits into from
May 8, 2019

Conversation

bzz
Copy link

@bzz bzz commented May 7, 2019

This is a workaround, motivated by the difference in handling non-valid UTF8
bytes that Oniriguma has, compared to Go's default RE2.

See src-d/enry#225 (comment)

Summary of changes:

  • use modern version of Oniguruma as it's well-maintained and is constantly improved
  • ci: test on latest 2 Go versions
  • c: NewOnigRegex() calls to onig_initialize() [1]
  • c: prevent NewOnigRegex() from hard-coding UTF8
  • go: expose new MustCompileASCII() \w default character class matching only ASCII
  • go: MustCompile() refactored, initRegexp() extracted for common UTF8/ASCII logic

Encoding was not exposed on Go API level intentionally for simplicity,
in order to avoid introducing complex struct type [2] to API surface.

  1. https://github.com/kkos/oniguruma/blob/83572e983928243d741f61ac290fc057d69fefc3/doc/API#L6
  2. https://github.com/kkos/oniguruma/blob/83572e983928243d741f61ac290fc057d69fefc3/src/oniguruma.h#L121

Signed-off-by: Alexander Bezzubov [email protected]

@bzz bzz mentioned this pull request May 7, 2019
2 tasks
@bzz bzz force-pushed the add-ascii-option branch 3 times, most recently from b3092ca to 4cabc4f Compare May 7, 2019 14:24
@bzz
Copy link
Author

bzz commented May 7, 2019

CI is great and it's ready to be merged.

As I do not have rights neither to push to this repo, nor add reviews - \cc @kuba-- @creachadair for a quick review.

bzz added 3 commits May 7, 2019 16:33
This is a workaround, motivated by the difference in handling non-valid UTF8
bytes that Oniriguma has, compared to Go's default RE2.

See src-d/enry#225 (comment)

Summary of changes:
 - c: prevent `NewOnigRegex()` from hard-coding UTF8
 - c: `NewOnigRegex()` now propely calls to `onig_initialize()` [1]
 - go: expose new `MustCompileASCII()` \w default charecter class matching only ASCII
 - go: `MustCompile()` refactored, `initRegexp()` extracted for common UTF8/ASCII logic

Encoding was not exposed on Go API level intentionaly for simplisity,
in order to avoid introducing complex struct type [2] to API surface.

 1. https://github.com/kkos/oniguruma/blob/83572e983928243d741f61ac290fc057d69fefc3/doc/API#L6
 2. https://github.com/kkos/oniguruma/blob/83572e983928243d741f61ac290fc057d69fefc3/src/oniguruma.h#L121

Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
@bzz bzz force-pushed the add-ascii-option branch from 4cabc4f to fcdcc4e Compare May 7, 2019 14:33
.travis.yml Outdated Show resolved Hide resolved
regex.go Outdated Show resolved Hide resolved
regex.go Outdated Show resolved Hide resolved
regex.go Outdated Show resolved Hide resolved
regex.go Outdated Show resolved Hide resolved
@bzz
Copy link
Author

bzz commented May 7, 2019

Thank you for the prompt reviews! Feedback was addressed in last 2 commits and it's ready for another round.

Copy link

@creachadair creachadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more minor comment fixing a typo, otherwise looks good to me!

regex.go Outdated Show resolved Hide resolved
.travis.yml Outdated Show resolved Hide resolved
.travis.yml Outdated
- sudo apt-get install -y dpkg # dpkg >= 1.17.5ubuntu5.8, which fixes https://bugs.launchpad.net/ubuntu/+source/dpkg/+bug/1730627- sudo dpkg -i libonig-dev_6.9.1-1_amd64.deb
- wget http://archive.ubuntu.com/ubuntu/pool/universe/libo/libonig/libonig5_6.9.1-1_amd64.deb
- sudo dpkg -i libonig5_6.9.1-1_amd64.deb
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need non dev version?

Copy link
Author

@bzz bzz May 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it's a direct dependency of the dev version

chelper.c Outdated Show resolved Hide resolved
.travis.yml Outdated Show resolved Hide resolved
.travis.yml Outdated Show resolved Hide resolved
bzz added 3 commits May 8, 2019 14:22
Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
@bzz bzz force-pushed the add-ascii-option branch from 7e4854c to af90abe Compare May 8, 2019 12:22
@bzz
Copy link
Author

bzz commented May 8, 2019

@kuba-- @creachadair thank you for kind feedback! All has been addressed and CI is green :shipit:

Waiting for this to be merged&released to go forward with src-d/enry#227 that is the last thing blocking enry/v2 release with go modules.

@smola do you know who has right GH permissions to merge/cut a release of go-oniguruma? MAINTAINER files seems to be missing here.

Signed-off-by: Alexander Bezzubov <[email protected]>
@kuba-- kuba-- merged commit 304256c into src-d:master May 8, 2019
@bzz bzz deleted the add-ascii-option branch May 8, 2019 13:16
@bzz
Copy link
Author

bzz commented May 8, 2019

@kuba-- thank you for merging! Would you be so kind to ping me, when the next (I guess v1.1.0) version that includes this changes is released? Thanks!

@kuba--
Copy link

kuba-- commented May 8, 2019

@bzz - pushed v1.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants