DioR

You should organize them in the following format.

DioR
    ├── Sgpt_file  
    ├── data  \\  Create and import according to the following command
    ├── paraphrase-MiniLM-L6-v2   \\ SBERT model
    ├── SGPT   \\ SGPT model
    ├── config
    ├── src
    ├── result \\ Create and import according to the following command
    ├── prep_esastic.py
    ├── rnn_hallucination_model_0.pth
    ├── train.sh

Install environment

conda create -n DioR python=3.9
conda activate DioR
pip install -r requirements.txt
python -m spacy download en_core_web_sm

Download LLaMA2-7B

https://huggingface.co/meta-llama/Llama-2-7b

Download SBERT,SGPT

./huggingface/paraphrase-MiniLM-L6-v2

and

./hugggingface/SGPT

Build BM25 index

mkdir -p data/dpr
wget -O data/dpr/psgs_w100.tsv.gz https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz
pushd data/dpr
gzip -d psgs_w100.tsv.gz
popd

cd data
wget -O elasticsearch-7.17.9.tar.gz https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.9-linux-x86_64.tar.gz  # download Elasticsearch
tar zxvf elasticsearch-7.17.9.tar.gz
rm elasticsearch-7.17.9.tar.gz 
cd elasticsearch-7.17.9
nohup bin/elasticsearch &  # run Elasticsearch in background
cd ../..
python prep_elastic.py --data_path data/dpr/psgs_w100.tsv --index_name wiki  # build index

There is an issue with psgsw_100.tsv (empty data exists), please delete it and save it as psgsw_100_fixed.tsv

python ./scr/fixed.py

Build SGPT index (in file ./SGPT/encode_result/)

python ./Sgpt_file/sgpt_file.py

Download Dataset

For 2WikiMultihopQA:

Download the 2WikiMultihop dataset from its repository https://www.dropbox.com/s/ms2m13252h6xubs/data_ids_april7.zip?e=1. Unzip it and move the folder to data/2wikimultihopqa.

For StrategyQA:

wget -O data/strategyqa_dataset.zip https://storage.googleapis.com/ai2i/strategyqa/data/strategyqa_dataset.zip
mkdir -p data/strategyqa
unzip data/strategyqa_dataset.zip -d data/strategyqa
rm data/strategyqa_dataset.zip

For HotpotQA:

mkdir -p data/hotpotqa
wget -O data/hotpotqa/hotpotqa-dev.json http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_dev_distractor_v1.json

For IIRC:

wget -O data/iirc.tgz https://iirc-dataset.s3.us-west-2.amazonaws.com/iirc_train_dev.tgz
tar -xzvf data/iirc.tgz
mv iirc_train_dev/ data/iirc
rm data/iirc.tgz

Run

bash train.sh

Evaluate

python ./src/evaluate1.py --dir path_to_folder(result/[result path and name]])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DioR

Install environment

Download LLaMA2-7B

Download SBERT,SGPT

Build BM25 index

Build SGPT index (in file ./SGPT/encode_result/)

Download Dataset

Run

Evaluate

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Sgpt_file		Sgpt_file
config/Llama2-7b-chat		config/Llama2-7b-chat
src		src
README.md		README.md
prep_elastic.py		prep_elastic.py
requirements.txt		requirements.txt
rnn_hallucination_model_0.pth		rnn_hallucination_model_0.pth
train.sh		train.sh

ghh1125/DioR

Folders and files

Latest commit

History

Repository files navigation

DioR

Install environment

Download LLaMA2-7B

Download SBERT,SGPT

Build BM25 index

Build SGPT index (in file ./SGPT/encode_result/)

Download Dataset

Run

Evaluate

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages