-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not import the dataset into python #20
Comments
@tiechengsu i am having the same problem, were you able to solve the issue? |
@bngksgl No, I used the previous dataset instead, which you can find here |
It's a .tar file, just decompress it again |
The latest data combine several categories together, no idea have to import it. Does that mean reviews.josn and business.json,etc. are mixed stored int he file? |
Not really sure where you ppl r facing errors. I have edited the code to accept .json files explicitly and convert them to .csv. I have mentioned the filepath in main method explicitly instead of using arg.parse as in original code. Let me know if this helps. Reference:https://github.com/Yelp/dataset-examples/blob/master/json_to_csv_converter.py"""Convert the Yelp Dataset Challenge dataset from json format to csv.
def get_row(line_contents, column_names): |
YES! SOLVED! Once you have decomposed it from *.tar, do it again on the generated file, then you will see different josn files. |
@CAVIND46016 are you able to post your code in a formatted snippet? Using it in my compiler is producing indentation errors. Thank you! |
@dotdose : Have a look at the code here, this should work better. |
with open('yelp_dataset_challenge_academic_dataset',encoding='utf-8') as f:
jsondata=json.load(f)
I try to import the dataset into python with the code above, but failed. The error is that 'utf-8' codec can't decode byte 0xb5. I also try encoding='charmap', but it didn't work either. Can anyone tell me how to import the data.
The text was updated successfully, but these errors were encountered: