Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when trying to reproduce #8

Closed
aalbracht opened this issue Jul 18, 2018 · 5 comments
Closed

Error when trying to reproduce #8

aalbracht opened this issue Jul 18, 2018 · 5 comments

Comments

@aalbracht
Copy link

I have been trying to recreate the patent landscape code in a jupyter notebook. Everything runs perfectly until I get to loading the inference data. I get "EOF error: Ran out of input" at line 541 of expansion.py.
`Loading inference data from filesystem at data\video_codec\landscape_inference_data.pkl

EOFError Traceback (most recent call last)
in ()
----> 1 subset_l1_pub_nums, l1_texts, padded_abstract_embeddings, refs_one_hot, cpc_one_hot = expander.sample_for_inference(td, 0.2)

~\Documents\Python Scripts\expansion.py in sample_for_inference(self, train_data_util, sample_frac)
539 print('Loading inference data from filesystem at {}'.format(inference_data_path))
540 with open(inference_data_path, 'rb') as infile:
--> 541 inference_data_deserialized = pickle.load(infile)
542
543 subset_l1_pub_nums, l1_texts, padded_abstract_embeddings, refs_one_hot, cpc_one_hot = \

EOFError: Ran out of input`

Appreciate the help. I can put this all on my git if that would help

@ostegm
Copy link
Collaborator

ostegm commented Jul 19, 2018

It sounds like the pickle file got corrupted. As a first step could you try clearing out the data folder of any of the intermediate files? This should be in the same directory where your notebook is located. Reply here if that doesn't work.

@aalbracht
Copy link
Author

aalbracht commented Jul 20, 2018 via email

@ostegm
Copy link
Collaborator

ostegm commented Jul 23, 2018

I ran it and got the same issue, which is related to trying to pickle an intermediate dataset over the pickle limit of 4gb. In the short term, you can comment out lines 532-544 of expansion.py and then try to rerun the notebook.

Also looping in @seinberg for comments on how to modify the code to allow the intermediate dataset to be stored locally.

@feltenberger
Copy link
Collaborator

The issue you're seeing is something I've seen from time to time and I think the issue is that one (or more) of the patents that are in the L1 result set have a really really large blob of text in them. This is perhaps actually an issue with the BigQuery dataset, though I've not investigated closely. One simple workaround is to update the sample size on this line:

subset_l1_pub_nums, l1_texts, padded_abstract_embeddings, refs_one_hot, cpc_one_hot = expander.sample_for_inference(td, 0.2)

to be something smaller than 0.2 (20% of the L1 patents). Try 5% (0.05) to start with, and dial it up until you see the problem again. This isn't ideal if you want the full set of L1 patents for inference (e.g., to build the full patent landscape), but it can get you a good portion of the way there.

@peiyu-wang
Copy link

Hi @aalbracht , do you think you can help on this one #47?
The model gets deleted on cloud storage, I really want to try the model. I will be super appreciated if you can share the local copy of the models to me if you still have it. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants