HybridFC: A Hybrid Fact-Checking Approach over Knowledge Graphs

This open-source project contains the Python implementation of our approach HybridFC. This project is designed to ease real-world applications of fact-checking over knowledge graphs and produce better results. With this aim, we rely on:

PytorchLightning to perform training via multi-CPUs, GPUs, TPUs, or computing clusters,
Pre-trained-KG-embeddings to get pre-trained KG embeddings for knowledge graphs for knowledge graph-based component,
Elastic-search to load text corpus (Wikipedia) on the elastic search for text-based component, and
Path-based-approach to calculate the output score for the path-based component.

Installation

First, clone the repository:

git clone https://github.com/factcheckerr/HybridFC.git

cd HybridFC

Reproducing Results

There are two options to reproduce the results.

using pre-generated data, and
Regenerate data from scratch. Please choose any 1 of these 2 options.

1) Re-using pre-generated data

download and unzip data and embedding files in the root folder of the project.

pip install gdown

wget https://files.dice-research.org/datasets/ISWC2022_HybridFC/data.zip

unzip data.zip

Note: if it gives permission denied error you can try running the commands with "sudo"

2) Regenerate data from scratch (FactCheck and COPAAL output)

In case you don't want to use pre-generated data, follow this step:

run KGV to collect results from FactCheck and COPAAL.

run FactCheck on FactBench, FaVel, and BPDP datasets using Wikipedia as a reference corpus.

As an input user needs the output of FactCheck in JSON format.

JSON format is as follows:

-=-=-=-=-=-=-==-=-==-=-=-=-==-=-=-=-=-=-=-==-=-=-=-
10
/factbench/test/correct/death/death_00053.ttl
 defactoScore: 0.98 setProofSentences : [ComplexProofs{website='https://en.wikipedia.org/wiki/Reba White Williams', proofPhrase='In 1999 , White Williams ran unsuccessfully for the New York City City Council in District 4 .', trustworthinessScore='0.997908778988452'}, ComplexProofs{website='https://en.wikipedia.org/wiki/James Leo Herlihy', proofPhrase='Like Williams , Herlihy had lived in New York City .', trustworthinessScore='0.9975670565782072'}, ComplexProofs{website='https://en.wikipedia.org/wiki/Charles Williams (musician)', proofPhrase='Charles Isaac Williams -LRB- born July 18 , 1932 -RRB- is an alto saxophonist based in New York City .', trustworthinessScore='0.9991775993927828'}] subject : Tennessee Williams object : New York City predicate deathPlace
-=-=-=-=-=-=-==-=-==-=-=-=-==-=-=-=-=-=-=-==-=-=-=-

Put the result JSON file in the data folder.

Further details are in the readme file in overall_process folder

Running experiments

Install dependencies via conda:

#setting up the environment
#creating and activating the conda environment

conda env create -f environment.yml

conda activate hfc2

#If conda command not found: download miniconda from (https://docs.conda.io/en/latest/miniconda.html#linux-installers) and set the path: 
#export PATH=/path-to-conda/miniconda3/bin:$PATH

start generating results:

# Start the training process, with the required number of hyperparameters. Details about other hyperparameters are in the main.py file.
python main.py --emb_type CoNex --model full-Hybrid --num_workers 32 --min_num_epochs 100 --max_num_epochs 1000 --check_val_every_n_epochs 10 --eval_dataset FactBench 

# Computing evaluation files from the saved model in "dataset/Hybrid_Stroage" directory
python evaluate_checkpoint_model.py --emb_type TransE --model full-Hybrid --num_workers 32 --min_num_epochs 100 --max_num_epochs 1000 --check_val_every_n_epochs 10 --eval_dataset FactBench

comments:

To reproduce similar results you have to use the exact parameters as listed above.
For other datasets you need to change the parameter in front of --dataset
Use GPU for fast processing. The default parameter is set to 2 GPUs that we used to generate results.
For different embeddings type(emb_type) or model type(model), you just need to change the parameters.
For differnt embeddings type(emb_type) or model type(model), you just need to change the parameters.

Available embeddings types: ConEx, TransE,

The following can be added: ComplEx, RDF2Vec (only for BPDP dataset), QMult.

Available models: hybridfc-full-Hybrid, KGE-only,text-only, text-KGE-Hybrid, path-only, text-path-Hybrid, KGE-path-Hybrid

Note: model names are case-sensitive. So please use exact names.

ReGenerate AUROC results:

After computing evaluation results, the prediction files are saved in the "dataset/HYBRID_Storage" folder along with ground truth files. These files can be uploaded to a live instance of GERBIL (by Roder et al.) framework to produce AUROC curve scores.

Future plan:

In future work, we will exploit the modularity of HybridFC by integrating rule-based approaches and path embedding. We also plan to explore other possibilities to select the best evidence sentences.

Acknowledgement

The work has been supported by the EU H2020 Marie Skłodowska-Curie project KnowGraphs (no. 860801)).

Authors

Umair Qudus (DICE, Paderborn University)
Michael Röder (DICE, Paderborn University)
Muhammad Saleem (DICE, Paderborn University)
Axel-Cyrille Ngonga Ngomo (DICE, Paderborn University)

Citation

@InProceedings{qudus2022hybridfc,
  Author      = {Qudus, Umair and Röder, Michael and Saleem,Muhammad and Ngomo, Axel-Cyrille Ngonga},
  Editor       ={Sattler, Ulrike and Hogan, Aidan and Keet, Maria and Presutti, Valentina and Almeida, Jo{\~a}o Paulo A. and Takeda, Hideaki and Monnin, Pierre and Pirr{\`o}, Giuseppe and d'Amato, Claudia},
  Title          = {HybridFC: A Hybrid Fact-Checking Approach for Knowledge Graphs},
  booktitle   = {The Semantic Web -- ISWC 2022},
  Year          = {2022},
  Doi            = {10.1007/978-3-031-19433-7\_27},
  isbn           ={978-3-031-19433-7},
  pages        = {462--480},
  address     ={Cham},
  publisher   =  {Springer International Publishing},
  biburl        = {https://www.bibsonomy.org/bibtex/2ec2f0b9ee7ca0c1c6ef1d8fbcd7262e4/dice-research},
  keywords  = {knowgraphs frockg raki 3dfed dice ngonga saleem roeder qudus},
  url             = {https://papers.dice-research.org/2022/ISWC_HybridFC/public.pdf},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
comparison		comparison
nn_models		nn_models
overall_process		overall_process
temp_utils		temp_utils
utils		utils
README.md		README.md
data.py		data.py
environment.yml		environment.yml
evaluate_model_for_factchecking.py		evaluate_model_for_factchecking.py
evaluate_triple.py		evaluate_triple.py
executer.py		executer.py
instructions		instructions
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HybridFC: A Hybrid Fact-Checking Approach over Knowledge Graphs

Installation

Reproducing Results

1) Re-using pre-generated data

2) Regenerate data from scratch (FactCheck and COPAAL output)

Running experiments

comments:

ReGenerate AUROC results:

Future plan:

Acknowledgement

Authors

Citation

About

Releases

Packages

Languages

dice-group/HybridFC

Folders and files

Latest commit

History

Repository files navigation

HybridFC: A Hybrid Fact-Checking Approach over Knowledge Graphs

Installation

Reproducing Results

1) Re-using pre-generated data

2) Regenerate data from scratch (FactCheck and COPAAL output)

Running experiments

comments:

ReGenerate AUROC results:

Future plan:

Acknowledgement

Authors

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages