Skip to content
/ RUIE Public

Code for COLING 2025 paper "RUIE: Retrieval-based Unified Information Extraction using Large Language Model"

Notifications You must be signed in to change notification settings

OStars/RUIE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RUIE

Code for COLING 2025 paper "RUIE: Retrieval-based Unified Information Extraction using Large Language Model"

figures-method camera-ready

Environment

conda create -n ruie python=3.10
conda activate ruie
python -m pip install -r requirements.txt

Datasets

We mainly use IE INSTRUCTION, RAMS, Wikievents and CrudeOilNews to construct our training data and test data. The processed dataset will be published soon.

Training & Evaluation

  1. Use bm25 to initialize candidates for training samples
bash scripts/bm25_search.sh
  1. Scoring candidates using LLM:
bash scripts/gen_llm_score.sh
  1. Keyword augmented reward model training (Due to #6571, #2736 and #94907, this step may require manually downgrading transformers to version 4.42.4)
OUTPUT_DIR=outputs/keyword_reward_model bash scripts/train_reward.sh
  1. Generating reward scores
OUTPUT_DIR=data bash scripts/gen_reward_scores.sh outputs/keyword_reward_model
  1. UIE retriever training
OUTPUT_DIR=outputs/uie_retriever bash scripts/train_kd_biencoder.sh
  1. Information extraction with UIE retriever (Due to only vllm>=0.5.4 supports llama3.1 and vllm >=0.5.4 need newer transformers, so this step must upgrade transformers to newest version.)
OUTPUT_DIR=outputs/uie_retriever bash scripts/scripts/eval_retriever.sh outputs/uie_retriever
  1. Evaluating F1-score per task (take NER as example)
python src/evaluation/calculate_f1.py --prediction-dir outputs/Llama-3.1-8B-Instruct/k8/NER --task NER

Citation

If you find that the code is useful in your research, please consider citing our paper.

@inproceedings{liao-etal-2025-ruie,
    title = "{RUIE}: Retrieval-based Unified Information Extraction using Large Language Model",
    author = "Liao, Xincheng and Duan, Junwen and Huang, Yixi and Wang, Jianxin",
    booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
    month = jan,
    year = "2025",
    address = "Abu Dhabi, UAE",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.coling-main.645/",
    pages = "9640--9655",
}

Contact

If you have any question, please create an issue or contact Xincheng Liao at ([email protected])

About

Code for COLING 2025 paper "RUIE: Retrieval-based Unified Information Extraction using Large Language Model"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published