Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 2574: character maps to <undefined> #74

Open
wapebira opened this issue Nov 6, 2024 · 2 comments

Comments

@wapebira
Copy link

wapebira commented Nov 6, 2024

YAML specification itself does not support encodings like CP-1252 or CP-1251.
When trying to read YAML files with Unicode characters using Windows' default encoding (cp1252).
Enforce utf-8 when reading yaml files

@e-p-armstrong
Copy link
Owner

e-p-armstrong commented Nov 19, 2024

Could you provide a traceback and maybe your config, and your input files if they're not private?

Thought we ironed out all the dang decoding issues, guess not

Either way if you're desperate all the issues have historically seemed to be on Windows, so if you have another machine you can use that and it should work

@wapebira
Copy link
Author

I did not kept full traceback. This is a partial example I was still able to find.
file.write(json.dumps(item, ensure_ascii=False) + "\n") gives this error return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u03c1' in position 2034: character maps to <undefined>

To fix it, all the file handling (read and write) got added the encoding. For example, with open(save_path_file, "r", encoding='utf-8')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants