Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with running prepare_arxiv_data.py script #1

Open
Javi-Rop opened this issue May 10, 2024 · 0 comments
Open

Issue with running prepare_arxiv_data.py script #1

Javi-Rop opened this issue May 10, 2024 · 0 comments

Comments

@Javi-Rop
Copy link

Hello repository team,

Firstly, I would like to extend my congratulations on the excellent work done in this repository. I am excited to utilize your tools and data for my project.

I have a question regarding the "arxiv-metadata-oai-snapshot.json" file that I downloaded from Kaggle (link: https://www.kaggle.com/code/artgor/arxiv-metadata-exploration/input). I wanted to confirm if this is the correct file that I should use to prepare the arXiv-temporal data, as indicated in the repository instructions.

Additionally, I have encountered an issue when attempting to run the "prepare_arxiv_data.py" script. During execution, I am getting the following error:

Traceback (most recent call last):
File "prepare_arxiv_data.py", line 121, in
create_category_files(source_folder)
File "prepare_arxiv_data.py", line 31, in create_category_files
df.categories = df.categories.map(lambda x: x.split(" "))
AttributeError: 'list' object has no attribute 'split'

I understand that this error is related to splitting a list instead of a string in the script. I have attempted to resolve it but have been unsuccessful.

Could you please guide me on how to address this issue or provide me with some hints to find a solution? I greatly appreciate your assistance.

Thank you in advance!

Best regards,
Javier Rodríguez

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant