Skip to content

Latest commit

 

History

History
97 lines (82 loc) · 4.16 KB

TriviaQA.md

File metadata and controls

97 lines (82 loc) · 4.16 KB

TriviaQA benchmark

TriviaQA dataset is a reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaqQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions.

Performance

1. Leaderboard from SOTA

Paper Year Model Model Details NDCG@10 Recall@5 acc
INSTRUCTRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales 2024 InstructRAG

R:Contriever ,G:Llama3-Ins-8B(FT) - 73.5 78.5
R:Contriever ,G:Llama3-Ins-8B(ICL) - 73.5 76.8
NaiveRAG R: Contriever, G: ChatGPT - 73.5 65.7
NaiveRAG R: Contriever, G: Llama3-Ins8B - 73.5 71.4
SELF-RAG: Learning To Retrieve, Generate, and Critique Through SELF-Reflection 2023 Self-RAG

R: Contriever ,G: Llama2-13B - - 69.3
R: Contriever ,G:Llama2-7B - - 66.4
Baseline1 R: ❌, G: Llama2-7B - - 30.5
Baseline2 R: ❌, G: Llama2-13B - - 38.5
ACTIVERAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents (only sample 500 q for eval) 2024 ActiveRAG

R:DPR ,G:ChatGPT-4oMINI - - 83.4
R:DPR ,G:Llama-3-Ins-70B - - 85.4
R:DPR ,G:Llama-3-Ins-8B - - 79.8
Baseline1 R: ❌, G: Llama3-8-Ins8B - - 67.2
Baseline2 R: ❌, G: Llama3-8-Ins70B - - 80.4

2. LLM-based Methods (Reproducable)