You should organize them in the following format.
DioR
├── Sgpt_file
├── data \\ Create and import according to the following command
├── paraphrase-MiniLM-L6-v2 \\ SBERT model
├── SGPT \\ SGPT model
├── config
├── src
├── result \\ Create and import according to the following command
├── prep_esastic.py
├── rnn_hallucination_model_0.pth
├── train.sh
conda create -n DioR python=3.9
conda activate DioR
pip install -r requirements.txt
python -m spacy download en_core_web_sm
https://huggingface.co/meta-llama/Llama-2-7b
./huggingface/paraphrase-MiniLM-L6-v2
and
./hugggingface/SGPT
mkdir -p data/dpr
wget -O data/dpr/psgs_w100.tsv.gz https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz
pushd data/dpr
gzip -d psgs_w100.tsv.gz
popd
cd data
wget -O elasticsearch-7.17.9.tar.gz https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.9-linux-x86_64.tar.gz # download Elasticsearch
tar zxvf elasticsearch-7.17.9.tar.gz
rm elasticsearch-7.17.9.tar.gz
cd elasticsearch-7.17.9
nohup bin/elasticsearch & # run Elasticsearch in background
cd ../..
python prep_elastic.py --data_path data/dpr/psgs_w100.tsv --index_name wiki # build index
There is an issue with psgsw_100.tsv (empty data exists), please delete it and save it as psgsw_100_fixed.tsv
python ./scr/fixed.py
python ./Sgpt_file/sgpt_file.py
For 2WikiMultihopQA:
Download the 2WikiMultihop dataset from its repository https://www.dropbox.com/s/ms2m13252h6xubs/data_ids_april7.zip?e=1. Unzip it and move the folder to data/2wikimultihopqa
.
For StrategyQA:
wget -O data/strategyqa_dataset.zip https://storage.googleapis.com/ai2i/strategyqa/data/strategyqa_dataset.zip
mkdir -p data/strategyqa
unzip data/strategyqa_dataset.zip -d data/strategyqa
rm data/strategyqa_dataset.zip
For HotpotQA:
mkdir -p data/hotpotqa
wget -O data/hotpotqa/hotpotqa-dev.json http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_dev_distractor_v1.json
For IIRC:
wget -O data/iirc.tgz https://iirc-dataset.s3.us-west-2.amazonaws.com/iirc_train_dev.tgz
tar -xzvf data/iirc.tgz
mv iirc_train_dev/ data/iirc
rm data/iirc.tgz
bash train.sh
python ./src/evaluate1.py --dir path_to_folder(result/[result path and name]])