Skip to content

Latest commit

 

History

History
96 lines (82 loc) · 4.18 KB

PopQA.md

File metadata and controls

96 lines (82 loc) · 4.18 KB

PopQA benchmark

PopQA dataset is a large-scale open-domain question answering (QA) dataset, consisting of 14k entity-centric QA pairs. Each question is created by converting a knowledge tuple retrieved from Wikidata using a template. Each question come with the original subject_entitiey, object_entityand relationship_type annotation, as well as Wikipedia monthly page views.

Performance

1. Leaderboard from SOTA

Paper Year Model Model Details NDCG@10 Recall@5 acc
INSTRUCTRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales 2024 InstructRAG

R:Contriever ,G:Llama3-Ins-8B(FT) - 68.7 66.2
R:Contriever ,G:Llama3-Ins-8B(ICL) - 68.7 64.2
NaiveRAG R: Contriever, G: ChatGPT - 68.7 50.8
NaiveRAG R: Contriever, G: Llama3-Ins8B - 68.7 62.3
ACTIVERAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents (only sample 500 q for eval) 2024 ActiveRAG

R:DPR ,G:ChatGPT-4oMINI - - 70.8
R:DPR ,G:Llama-3-Ins-70B - - 69.8
R:DPR ,G:Llama-3-Ins-8B - - 65.8
Baseline1 R: ❌, G: Llama3-8-Ins8B - - 24.2
Baseline2 R: ❌, G: Llama3-8-Ins70B - - 34.2
SELF-RAG: Learning To Retrieve, Generate, and Critique Through SELF-Reflection 2023 Self-RAG

R: Contriever ,G: Llama2-13B - - 55.8
R: Contriever ,G:Llama2-7B - - 54.9
Baseline1 R: ❌, G: Llama2-7B - - 14.7
Baseline2 R: ❌, G: Llama2-13B - - 14.7

2. LLM-based Methods (Reproducable)