Issue with JSON to CSV converter #23

rbaral · 2016-06-27T05:45:45Z

hi,

I got the following issue while using the python converter to convert the json to csv format:

Traceback (most recent call last):
  File "json_to_csv_converter.py", line 122, in <module>
    column_names = get_superset_of_column_names_from_file(json_file)
  File "json_to_csv_converter.py", line 28, in get_superset_of_column_names_from
_file
    line_contents = json.loads(line)
  File "C:\Users\rbaral\Anaconda2\lib\site-packages\simplejson\__init__.py", lin
e 516, in loads
    return _default_decoder.decode(s)
  File "C:\Users\rbaral\Anaconda2\lib\site-packages\simplejson\decoder.py", line
 370, in decode
    obj, end = self.raw_decode(s)
  File "C:\Users\rbaral\Anaconda2\lib\site-packages\simplejson\decoder.py", line
 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Thanks.

The text was updated successfully, but these errors were encountered:

svknair · 2017-02-13T19:25:31Z

Hi,

I also faced trouble while using the python code to convert json files to csv. The error message is as follows:

usage: json_to_csv_converter.py [-h] json_file
json_to_csv_converter.py: error: the following arguments are required: json_file
C:\Users\Sajeev\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2889:

I am a novice at Python programming. Appreciate any help to fix this.

Thanks

dessertalready · 2017-02-15T08:05:02Z

Hi,
I also faced similar trouble while using the python converter to convert the json to csv format:

This issue puzzled me for quite some time. Need help!
Thanks a lot.

Aneapiy · 2017-02-15T14:17:47Z

If you guys have R, you might want to try using that. It's worked pretty well for me. I believe R loads the the entire file into memory before doing the operation, so if it crashes it's probably because it ran out of RAM.

Here's an example script in R that I've used to convert the properties I need into a CSV file.

#makeEReviewFull.R
#Last updated December 14, 2016 12:59 
#This script converts the a JSON file containing review data 
#into an edge file for upload into System G Graph store.
#The JSON contains all reviews from the yelp
#academic dataset.
#Uses jsonlite to create a csv file for reviews.

library(jsonlite)
reviews <- stream_in(file("./yelp_academic_dataset_review.json"))
#reviews$text <- gsub('\n','',reviews$text)
edges <- data.frame(
    source = reviews$user_id, 
    target = reviews$business_id,
    type = reviews$type,
    date = reviews$date,
    stars = reviews$stars,
    votes = reviews$votes,
    review_id = reviews$review_id
)
write.csv(edges, "./eReviewsFull.csv", row.names=F)

dessertalready · 2017-02-16T05:29:09Z

Thank you for your help!
In fact, I have another question about the dataset.
There is only one file without suffix in the download tar, named yelp_dataset_challenge_round9. Is it a json file with all types of data, e.g., business, tips.
As the file you have used seems to be independent only about the data of review, i.e., yelp_academic_dataset_review.json. And it is transformed according to its structure, while different types of data is with distinct structures.
I have no idea about the dataset.
Thanks!

CAVIND46016 · 2017-02-16T05:33:08Z

@dessertalready : The dataset 'yelp_dataset_challenge_round9' is kinda a double zipped file. If you try to rename the file by adding a ".tar" extension to it, you'll be able to see different .json files for business, check-ins, reviews, tips and users. Then you could use those files as input for the converter.
Hoping this helps.

dessertalready · 2017-02-16T05:44:50Z

@CAVIND46016: I am very grateful for your warm help!!
I was not sure it was a bouble zipped file, and you really make me clear!
Thanks a lot!!

svknair · 2017-02-16T15:38:14Z

@Aneapiy : thanks a lot for offering the code in R. But, I keep getting the following error:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 7, 2, 19, 6, 3, 5, 1, 0, 18, 4, 8, 20, 25, 14, 12, 13, 26, 11, 21, 10, 22, 16, 23, 27, 9, 17, 15, 24, 31, 28, 29, 30, 32, 34

I am trying to read a json file that is different from what you have given in your sample code. I am trying to convert the file: yelp_academic_dataset_business.json. Is this because some columns have missing or NULL values? I think the function stream_in works because I am able to see the data format in R. I guess the problem happens when we try to convert it to dataframe.

Another issue is that there are columns called attributes, categories, etc. which have multiple values. Is there is a way to read each of these values and output to separate columns in csv?

Thanks

Aneapiy · 2017-02-16T15:53:34Z

@svknair. If you're trying to read the business data you might one to try changing the names of the keys R is looking for. For example, within the dataframe parenthesis you might want to change

source = reviews$user_id,

to

id = reviews$business_id,

Note that the string before the $ sign has to match the name of the variable you're streaming the information into. The string after the $ is the JSON key that you want to look in. The keys in the business file is different than the keys in the reviews file.
Hmm, as far as columns like attributes etc are concern, I'm not sure if this would pose a problem since there's multiple different sub categories it could taken on. You might want to try changing the keys I mentioned above, importing in a small subset of the business day (say the first 10 lines) and seeing if R flattens it out automatically.

Here's an example of what I've used for the business json file:

#makeVBusinessFull.R
#Last updated December 14, 2016 09:12AM
#This script converts the a JSON file containing business data 
#into a vertex file for upload into System G Graph store.
#The JSON is the original business data from the yelp
#academic dataset without any preprocessing.

library(jsonlite)
biz <- stream_in(file("./yelp_academic_dataset_business.json"))
biz$full_address <- gsub('\n',' ',biz$full_address)
nodes <- data.frame(
    id = biz$business_id, 
    type = biz$type,
    name = biz$name,
    city = biz$city,
    state = biz$state,
    stars = biz$stars,
    latitude = biz$latitude,
    longitude = biz$longitude,
    review_count = biz$review_count,
    full_address = biz$full_address
)
write.csv(nodes, "./vBusinessFull.csv", row.names=F)

anitachimnani10 · 2017-02-22T19:01:01Z

the conversion in R is stuck after an hour.. does this happen with anybody else?

svknair · 2017-02-23T06:57:17Z

@Aneapiy Thanks for your suggestions. The code works fine. Another query that I have: how can I split the fields named attributes, categories and hours? These have multiple sub-fields that I would like to split into separate columns. I am referring to the business data file here.

hoseinit · 2017-02-28T19:41:15Z

I faced error when wanted to Run R script of #makeReviewFull

Loading required package: methods
opening file input connection.
 Imported 4153150 records. Simplifying...
closing file input connection.
Error in data.frame(source = reviews$user_id, target = reviews$business_id,  : 
  arguments imply differing number of rows: 4153150, 0
Execution halted

and for MakeVBusinessFull:

Loading required package: methods
opening file input connection.
 Imported 144072 records. Simplifying...
closing file input connection.
Error in `$<-.data.frame`(`*tmp*`, "full_address", value = character(0)) : 
  replacement has 0 rows, data has 144072
Calls: $<- -> $<-.data.frame
Execution halted

FrizchaApriln · 2017-06-04T15:47:31Z

how to open file json if size more than 1 gb??

rbaral · 2017-06-06T22:34:30Z

I have no idea if you can open with any tools in Windows, but for Unix machines you can use grep commands to view for content matching a pattern, or use some split commands to view smaller chunks of the file. There might be some softwares available in Windows also that can do the split job.
Lots of discussion on the web for this can be found:

Thanks.

underpope · 2017-07-25T17:46:17Z

I have the same issue with converting the business file:

> biz$full_address <- gsub('\n',' ',biz$full_address)
Error in `$<-.data.frame`(`*tmp*`, "full_address", value = character(0)) : 
  replacement has 0 rows, data has 144072

It looks like there is no "full_address" field, and that it has been replaced with "address". At least, that's how it is in my download.

shahid1579 · 2019-12-09T09:28:15Z

Cannot convert business.json to csv
error: TypeError: argument of type 'NoneType' is not iterable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with JSON to CSV converter #23

Issue with JSON to CSV converter #23

rbaral commented Jun 27, 2016

svknair commented Feb 13, 2017

dessertalready commented Feb 15, 2017

Aneapiy commented Feb 15, 2017

dessertalready commented Feb 16, 2017

CAVIND46016 commented Feb 16, 2017

dessertalready commented Feb 16, 2017

svknair commented Feb 16, 2017

Aneapiy commented Feb 16, 2017

anitachimnani10 commented Feb 22, 2017 •

edited

Loading

svknair commented Feb 23, 2017

hoseinit commented Feb 28, 2017 •

edited

Loading

FrizchaApriln commented Jun 4, 2017

rbaral commented Jun 6, 2017

underpope commented Jul 25, 2017

shahid1579 commented Dec 9, 2019

Issue with JSON to CSV converter #23

Issue with JSON to CSV converter #23

Comments

rbaral commented Jun 27, 2016

svknair commented Feb 13, 2017

dessertalready commented Feb 15, 2017

Aneapiy commented Feb 15, 2017

dessertalready commented Feb 16, 2017

CAVIND46016 commented Feb 16, 2017

dessertalready commented Feb 16, 2017

svknair commented Feb 16, 2017

Aneapiy commented Feb 16, 2017

anitachimnani10 commented Feb 22, 2017 • edited Loading

svknair commented Feb 23, 2017

hoseinit commented Feb 28, 2017 • edited Loading

FrizchaApriln commented Jun 4, 2017

rbaral commented Jun 6, 2017

underpope commented Jul 25, 2017

shahid1579 commented Dec 9, 2019

anitachimnani10 commented Feb 22, 2017 •

edited

Loading

hoseinit commented Feb 28, 2017 •

edited

Loading