Issue with running prepare_arxiv_data.py script #1

Javi-Rop · 2024-05-10T07:25:29Z

Hello repository team,

Firstly, I would like to extend my congratulations on the excellent work done in this repository. I am excited to utilize your tools and data for my project.

I have a question regarding the "arxiv-metadata-oai-snapshot.json" file that I downloaded from Kaggle (link: https://www.kaggle.com/code/artgor/arxiv-metadata-exploration/input). I wanted to confirm if this is the correct file that I should use to prepare the arXiv-temporal data, as indicated in the repository instructions.

Additionally, I have encountered an issue when attempting to run the "prepare_arxiv_data.py" script. During execution, I am getting the following error:

Traceback (most recent call last):
File "prepare_arxiv_data.py", line 121, in
create_category_files(source_folder)
File "prepare_arxiv_data.py", line 31, in create_category_files
df.categories = df.categories.map(lambda x: x.split(" "))
AttributeError: 'list' object has no attribute 'split'

I understand that this error is related to splitting a list instead of a string in the script. I have attempted to resolve it but have been unsuccessful.

Could you please guide me on how to address this issue or provide me with some hints to find a solution? I greatly appreciate your assistance.

Thank you in advance!

Best regards,
Javier Rodríguez

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with running prepare_arxiv_data.py script #1

Issue with running prepare_arxiv_data.py script #1

Javi-Rop commented May 10, 2024

Issue with running prepare_arxiv_data.py script #1

Issue with running prepare_arxiv_data.py script #1

Comments

Javi-Rop commented May 10, 2024