From 3a7386e9698eaafd4f2efb910d4db7fba094b981 Mon Sep 17 00:00:00 2001 From: ZubinGou Date: Thu, 22 Feb 2024 17:02:03 +0800 Subject: [PATCH] =?UTF-8?q?=F0=9F=8E=89?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/index.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/index.html b/docs/index.html index dd3fae2..d559a12 100644 --- a/docs/index.html +++ b/docs/index.html @@ -57,7 +57,7 @@

CriticBench:
Benchmarking LLMs for
Critique-Correct Reasoning + src="static/images/criticbench_logo.png" alt="CriticBench Logo"> CriticBench:
Benchmarking LLMs for
Critique-Correct Reasoning

@@ -166,7 +166,7 @@

The ability of Large Language Models (LLMs) to critique and refine their reasoning is crucial for their application in evaluation, feedback provision, and self-improvement. This paper introduces CriticBench, a comprehensive benchmark designed to assess LLMs' abilities to critique and rectify their reasoning across a variety of tasks. - CriticBench encompasses five reasoning domains: mathematical, commonsense, symbolic, coding, and algorithmic. + CriticBench encompasses five reasoning domains: mathematical, commonsense, symbolic, coding, and algorithmic. It compiles 15 datasets and incorporates responses from three LLM families. Utilizing CriticBench, we evaluate and dissect the performance of 17 LLMs in generation, critique, and correction reasoning, i.e., GQC reasoning, and analyze the key factors affecting LLM critical reasoning.