Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for UnicodeDecodeError #12

Open
wants to merge 1 commit into
base: gh-pages
Choose a base branch
from
Open

Fix for UnicodeDecodeError #12

wants to merge 1 commit into from

Conversation

alexjj
Copy link

@alexjj alexjj commented Nov 5, 2015

I was getting UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 error, and this resolved it.

@projectgus
Copy link
Contributor

Hi alexjj,

Thanks for sending this PR!

Encodings are a nightmare, and I think you're right this is definitely the right way to do it. I think UTF-8 should be the default Python 3 encoding on OS X and Linux if the locale is set to utf-8 (it usually is). But Windows and some Linux configs, and maybe Python 2, will probably all get errors like you did. Yay!

The only qualm I have about merging the change is that we should probably explain in an aside what's happening here, at least pointing out that we've added this encoding argument and roughly what Unicode is and why we have to care.

Are you able to add something like that to the PR?

Cheers,

Angus

@lehmannro
Copy link
Contributor

Is there any way we could dodge the topic of encodings here? I think encodings are too important to squeeze them in here. This chapter is about CSV techniques — any explanation is either going to be too short to do encodings justice, or too long and confuse learners. And I'm generally not a friend of "here is this boilerplate — copy it and everything will be happy sparkles."

One possible way is rehosting the CSV files with all non-ASCII characters stripped, so that it should work in almost any encoding by accident. (Fun fact: OpenFlights.org claims the file is "ISO 8859-1 (Latin-1) encoded," so any reasonable learner might be doubly confused by encoding="utf8".)

Python 2 users will just get another error with this fix by the way, since its open function does not accept the encoding parameter. The legacy way to do this would be through the codecs module.

@alexjj
Copy link
Author

alexjj commented Nov 7, 2015

Good points. For me on Windows with Python 3 I needed to specify the
encoding.
However I just stuck UTF8 in as that was the top Google result to the error
and
it worked.
Probably can just add a footer saying if you get the error use encoding,
and
hopefully specifying ISO 8859-1 works too!

@lehmannro
Copy link
Contributor

ISO 8859-1 works in the way that it doesn't throw an exception. The file is encoded in UTF-8, so you will get weird, scrambled, erroneous output though. I have filed jpatokal/openflights#405 to fix the OpenFlights docs, but my other points remain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants