Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ds000172 - dataset_description.json contains "un-braking space" (c2 a0) characters after lastnames #5

Open
yarikoptic opened this issue Aug 29, 2016 · 10 comments

Comments

@yarikoptic
Copy link

which breaks stock json modules in python
highlighted

@poldrack
Copy link
Owner

that is some really remarkable brittleness for a stock module…

On Aug 29, 2016, at 9:53 AM, Yaroslav Halchenko [email protected] wrote:

which breaks stock json modules in python
https://camo.githubusercontent.com/6ba2eb0e9c2520043e1c3c4d4011fb4bcbd95329/687474703a2f2f7777772e6f6e657275737369616e2e636f6d2f746d702f676b72656c6c53686f6f745f30382d32392d31365f3132353135352e706e67

You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #5, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1KkJGbsRpgnA5WFISIvdJIOG3gORL-ks5qkw50gaJpZM4Jvq1b.


Russell A. Poldrack
Albert Ray Lang Professor of Psychology
Bldg. 420, Jordan Hall
Stanford University
Stanford, CA 94305

[email protected]
http://www.poldracklab.org/

@vsoch
Copy link

vsoch commented Aug 29, 2016

I ran into something like this yesterday... are there any web forms involved with getting this data? The HTML spec uses CR LF pairs:

http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1

which means '/r/n' and those can be replaced easily when parsing the data before writing the json.

@vsoch
Copy link

vsoch commented Aug 29, 2016

You can also set strict=False when loading, or just replace the characters when you do open inside the json loads function!

@yarikoptic
Copy link
Author

I doubt those characters are a part of json spec... imho it is unreasonable to demand json parsers to understand all those screwy utf8 symbols used as delimiters. I guess validator should also check those jsons more thoroughly. Later will file about at least one more ;-)

On August 29, 2016 12:57:29 PM EDT, Russ Poldrack [email protected] wrote:

that is some really remarkable brittleness for a stock module…

On Aug 29, 2016, at 9:53 AM, Yaroslav Halchenko
[email protected] wrote:

which breaks stock json modules in python

https://camo.githubusercontent.com/6ba2eb0e9c2520043e1c3c4d4011fb4bcbd95329/687474703a2f2f7777772e6f6e657275737369616e2e636f6d2f746d702f676b72656c6c53686f6f745f30382d32392d31365f3132353135352e706e67

You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#5, or mute the thread
https://github.com/notifications/unsubscribe-auth/AA1KkJGbsRpgnA5WFISIvdJIOG3gORL-ks5qkw50gaJpZM4Jvq1b.


Russell A. Poldrack
Albert Ray Lang Professor of Psychology
Bldg. 420, Jordan Hall
Stanford University
Stanford, CA 94305

[email protected]
http://www.poldracklab.org/

You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#5 (comment)

@vsoch
Copy link

vsoch commented Aug 29, 2016

Excellent colors btw, is that the "Dreaming of McDonalds hamburger" terminal theme? +1! 🍟🍔

@yarikoptic
Copy link
Author

More of "I was so cool when I was young" :-)
Thanks for the hint with ignoring option... Will check when get to the laptop

On August 29, 2016 2:03:18 PM EDT, Vanessa Sochat [email protected] wrote:

Excellent colors btw, is that the "Dreaming of McDonalds hamburger"
terminal theme? +1! 🍟🍔

You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#5 (comment)

@yarikoptic
Copy link
Author

@vsoch apparently strict=False wasn't enough. So will do fixups manually

@vsoch
Copy link

vsoch commented Aug 29, 2016

Yeah, I wound up just getting rid of them entirely before writing the file.

@jbwexler
Copy link

So would it be helpful for us to upload a new revision of this dataset that fixes this issue? Or is it easy enough to work around? Is it something we should consider with future datasets?

@yarikoptic
Copy link
Author

I just worked around for now, so as to me, no rush ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants