PopQA dataset is a large-scale open-domain question answering (QA) dataset, consisting of 14k entity-centric QA pairs. Each question is created by converting a knowledge tuple retrieved from Wikidata using a template. Each question come with the original subject_entitiey, object_entityand relationship_type annotation, as well as Wikipedia monthly page views.
Paper | Year | Model | Model Details | NDCG@10 | Recall@5 | acc |
---|---|---|---|---|---|---|
INSTRUCTRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales | 2024 | InstructRAG | R:Contriever ,G:Llama3-Ins-8B(FT) | - | 68.7 | 66.2 |
R:Contriever ,G:Llama3-Ins-8B(ICL) | - | 68.7 | 64.2 | |||
NaiveRAG | R: Contriever, G: ChatGPT | - | 68.7 | 50.8 | ||
NaiveRAG | R: Contriever, G: Llama3-Ins8B | - | 68.7 | 62.3 | ||
ACTIVERAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents (only sample 500 q for eval) | 2024 | ActiveRAG | R:DPR ,G:ChatGPT-4oMINI | - | - | 70.8 |
R:DPR ,G:Llama-3-Ins-70B | - | - | 69.8 | |||
R:DPR ,G:Llama-3-Ins-8B | - | - | 65.8 | |||
Baseline1 | R: ❌, G: Llama3-8-Ins8B | - | - | 24.2 | ||
Baseline2 | R: ❌, G: Llama3-8-Ins70B | - | - | 34.2 | ||
SELF-RAG: Learning To Retrieve, Generate, and Critique Through SELF-Reflection | 2023 | Self-RAG | R: Contriever ,G: Llama2-13B | - | - | 55.8 |
R: Contriever ,G:Llama2-7B | - | - | 54.9 | |||
Baseline1 | R: ❌, G: Llama2-7B | - | - | 14.7 | ||
Baseline2 | R: ❌, G: Llama2-13B | - | - | 14.7 |