LLM-based Fact Check Factoring

This repository implements functions for extracting factors from fact-checks, based on the appropriately-named paper Factoring Fact-Checks: Structured Information Extraction from Fact-Checking Articles. If you're interested in evaluation results, see the docs directory for a mini-writeup.

Getting started

First, if you haven't, run pip install -r requirements.txt.

This repository has two main functions:

Download and reformat the ClaimReview dataset for NLP processing
Fine tune and apply LLM models on ClaimReview data

Getting the data

To download the ClaimReview data, first run:

python get_dataset.py

This will download the Datacommons ClaimReview dataset ClaimReview dataset, filter out English articles, and extract the article text for each line. The output of this will be saved to data/dataset_with_articles.jsonl.

Then, to perform the fuzzy matching algorithm, run:

python fuzzy_match_factors.py

This will produce data/matched_articles.jsonl, which is ready for processing!

WARNING: This may take a long time.

NLP

First, change to the llm directory. Here, there are a few methods that you can use for processing the data:

python factor.py --model <model>: Run a cloud LLM (e.g. --model gpt-3.5-turbo) on a subset of the dataset, and produce a new file with predictions. This produces data/predicted_factors.jsonl. NOTE: you must have an OpenAI API key saved as the environment variable OPENAI_API_KEY for this to work if using GPT! Otherwise if using Claude, save your API key under ANTHROPIC_API_KEY.
python fine_tune.py --model <model>: Fine-tune an Anyscale model (e.g. --model mistralai/Mixtral-8x7b) on the dataset. NOTE: you must have an AnyScale API key saved as the environment variable ANYSCALE_API_KEY for this to work!
python tune_dspy.py: Use DSPy to "fine-tune" a CoT few-shot prompt on the datase, and run evalutaion. Like factor.py, you must have an OpenAI API key saved.
python eval.py --dataset <your_file.jsonl>: This will run the evalutation script on your prediction file and return ROUGE-1 scores.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
llm		llm
.gitignore		.gitignore
README.md		README.md
fuzzy_match_factors.py		fuzzy_match_factors.py
get_dataset.py		get_dataset.py
remove_non_english_articles.py		remove_non_english_articles.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-based Fact Check Factoring

Getting started

Getting the data

NLP

About

Releases

Packages

Languages

haydenmccormick/LLM-Factor-Fact-Checks

Folders and files

Latest commit

History

Repository files navigation

LLM-based Fact Check Factoring

Getting started

Getting the data

NLP

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages