Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure to match (incorrect encoding issues). #174

Open
mijoharas opened this issue Feb 17, 2017 · 0 comments
Open

failure to match (incorrect encoding issues). #174

mijoharas opened this issue Feb 17, 2017 · 0 comments

Comments

@mijoharas
Copy link

I have a repo here with a minimum example that exhibits the problem.

The file is utf-8 and has some emojis in it. trying to search for foobar with: pt foobar example.txt will not show a match.

Detected points[utf8/eucjp/shiftjis] is 1/0/2.

This is a minimum example that shows the problem, other files seem to have the incorrect encoding detected.

the bytes for the lines are interpretted in UTF-8 as:

scanner.Bytes() [240 159 146 184]
scanner.Bytes() [226 152 149]
scanner.Bytes() [240 159 145 139]
scanner.Bytes() [102 111 111 98 97 114]

and in Shift-JIS as:

scanner.Bytes() [239 191 189 233 160 130]
scanner.Bytes() [231 172 152]

I've got two suggestions on how to solve this, though I don't know too much about encoding schemes.

First suggestion is to simply add some override options that allow us to specify the encoding --utf8 and --shiftJIS will do what would be expected.

Second suggestion would be to try decoding a portion of the file as UTF-8 or SHIFT-JIS (e.g. with https://godoc.org/golang.org/x/text/encoding ) and then see if that produces an error. I don't know much about SHIFT-JIS so I'm not sure whether this would be a good example.

Have you got any thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant