Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metadata output not exactly in utf8 encoding... #33

Open
sdm7g opened this issue Apr 24, 2020 · 0 comments
Open

metadata output not exactly in utf8 encoding... #33

sdm7g opened this issue Apr 24, 2020 · 0 comments

Comments

@sdm7g
Copy link

sdm7g commented Apr 24, 2020

metadata output seems to be in ascii with other unicode characters encoded as numerical character entities. Legal for default utf8 encoding, as ascii is a subset, but this is not what I, and I think most people want or expect.
( This may be the same issue reported as #32 . This was also reported to me by Columbia.edu and I was able to reproduce it on both my and their OAI feeds. )

I initially tried adding encoding="UTF-8" to etree.tostring call in metadata.py but this worked under python3.x, but failed under python2.x .

adding encoding="unicode" appears to be the correct fix that seems to work under both python2.x and python3.x .

Under python2.x , encoding="UTF-8" returns a <type "str"> that contains unicode characters, which then may give an error when coercing to <type "unicode"> . encoding="unicode" returns <type "unicode"> .

See: https://github.com/sdm7g/oai-harvest/blob/fix-pyoai/oaiharvest/metadata.py#L51-L53

sdm7g pushed a commit to sdm7g/oai-harvest that referenced this issue Apr 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant