Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the enwiki-20230401 #44

Open
Toblame opened this issue Feb 29, 2024 · 5 comments
Open

About the enwiki-20230401 #44

Toblame opened this issue Feb 29, 2024 · 5 comments

Comments

@Toblame
Copy link

Toblame commented Feb 29, 2024

after download the data and set the environment, I run this command python -m factscore.factscorer --input_path "/root/FNDLLM/test.jsonl" --model_name "retrieval+llama+npm" --use_atomic_facts --data_dir '/root/.cache/factscore/ and get this File "/root/anaconda3/envs/factstore/lib/python3.7/site-packages/factscore/retrieval.py", line 57, in build_db
with open(data_path, "r") as f:'FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/factscore/enwiki-20230401.jsonl'
I didn't find the enwiki-20230401.jsonl in the download data, where is it?

@martiansideofthemoon
Copy link
Collaborator

Hi @Toblame, thanks for your interest in our work. What command did you use to download the data?

The cache is stored by default in the folder where you ran the download command, see https://github.com/shmsw25/FActScore/blob/main/factscore/download_data.py#L119

Can you confirm that the other cache files are present in /root/.cache for you?

@Toblame
Copy link
Author

Toblame commented Mar 3, 2024

Thank you and I have solve this problem, however I meet another problem 'AssertionError: topic in your data (topic) is likely to be not a valid title in the DB.' This happened when I used both my own data and the factscore labeled data.

@tanay2001
Copy link

Hi @Toblame ,

How did u solve this problem? The download_data.py file only downloads a enwiki-20230401.db file, I cannot find a .jsonl file in the cache. TIA

@Toblame
Copy link
Author

Toblame commented Mar 5, 2024

Hi @Toblame ,

How did u solve this problem? The download_data.py file only downloads a enwiki-20230401.db file, I cannot find a .jsonl file in the cache. TIA

I just restart the command and check the cache file's location, then run the command again. However I still meet another problem above.

@martiansideofthemoon
Copy link
Collaborator

martiansideofthemoon commented Mar 9, 2024

Hi @Toblame,

Thank you and I have solve this problem, however I meet another problem 'AssertionError: topic in your data (topic) is likely to be not a valid title in the DB.'

You are likely getting this error because you have set topic in some rows of the input JSONL file to the string "topic". For this to work, topic must be equal to some article title (like "Billy Conigliaro") which is present in the database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants