Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug fix: encoding missing from open() #10

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

bug fix: encoding missing from open() #10

wants to merge 2 commits into from

Conversation

DevinCharles
Copy link

Encoding option that is passed to read_table was never passed to open() command.

Should fix ISSUE #8: UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence

Encoding option that is passed to read_table was never passed to open() command.

Should fix ISSUE #8: UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence
@elyase
Copy link
Owner

elyase commented Oct 9, 2017

Hey, thanks for the fix! The tests are failing because the encoding option is not supported in Python 2. Would it be too much to ask that you use something from:

https://stackoverflow.com/questions/10971033/backporting-python-3-openencoding-utf-8-to-python-2

so that the library can stay Python compatible?

Updated to make backward compatible with Python 2 (>2.5)
@DevinCharles
Copy link
Author

There appears to be a difference in the way Python 3 and 2 handle .decode('utf-8')... so my most recent commit fails test_cities....

Python 3

['\ufeffSao Paulo é a capital do estado de Sao Paulo. As cidades de Barueri\r\n',
 'e Carapicuíba fazem parte da Grade Sao Paulo. O Rio de Janeiro\r\n',
 'continua lindo. No carnaval eu vou para Salvador. No reveillon eu \r\n',
 'quero ir para Santos.']

Python 2

[u'\ufeffS\xe3o Paulo \xe9 a capital do estado de S\xe3o Paulo. As cidades de Barueri\r\n',
 u'e Carapicu\xedba fazem parte da Grade S\xe3o Paulo. O Rio de Janeiro\r\n',
 u'continua lindo. No carnaval eu vou para Salvador. No reveillon eu \r\n',
 u'quero ir para Santos.']

I'll have to think about this... or wait for someone who actually knows what they're doing to fix it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants