TriviaQA benchmark

TriviaQA dataset is a reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaqQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions.

Performance

1. Leaderboard from SOTA

Paper	Year	Model	Model Details	NDCG@10	Recall@5	acc
_{INSTRUCTRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales}	2024	InstructRAG	R:Contriever ,G:Llama3-Ins-8B(FT)	-	73.5	78.5
		InstructRAG	R:Contriever ,G:Llama3-Ins-8B(ICL)	-	73.5	76.8
		NaiveRAG	R: Contriever, G: ChatGPT	-	73.5	65.7
		NaiveRAG	R: Contriever, G: Llama3-Ins8B	-	73.5	71.4
_{SELF-RAG: Learning To Retrieve, Generate, and Critique Through SELF-Reflection}	2023	Self-RAG	R: Contriever ,G: Llama2-13B	-	-	69.3
		Self-RAG	R: Contriever ,G:Llama2-7B	-	-	66.4
		Baseline1	R: ❌, G: Llama2-7B	-	-	30.5
		Baseline2	R: ❌, G: Llama2-13B	-	-	38.5
_{ACTIVERAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents} _{(only sample 500 q for eval)}	2024	ActiveRAG	R:DPR ,G:ChatGPT-4oMINI	-	-	83.4
			R:DPR ,G:Llama-3-Ins-70B	-	-	85.4
			R:DPR ,G:Llama-3-Ins-8B	-	-	79.8
		Baseline1	R: ❌, G: Llama3-8-Ins8B	-	-	67.2
		Baseline2	R: ❌, G: Llama3-8-Ins70B	-	-	80.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TriviaQA.md

TriviaQA.md

TriviaQA benchmark

Performance

1. Leaderboard from SOTA

2. LLM-based Methods (Reproducable)

Files

TriviaQA.md

Latest commit

History

TriviaQA.md

File metadata and controls

TriviaQA benchmark

Performance

1. Leaderboard from SOTA

2. LLM-based Methods (Reproducable)