Skip to content

Commit

Permalink
πŸŽ‰
Browse files Browse the repository at this point in the history
  • Loading branch information
ZubinGou committed Feb 22, 2024
1 parent c6a901b commit 3a7386e
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title"> <img class="logo"
src="static/images/criticbench_logo.png" alt="CriticBench Logo"> CriticBench: <br> Benchmarking LLMs for <br> <u>Criti</u>que-<u>Correct</u> Reasoning
src="static/images/criticbench_logo.png" alt="CriticBench Logo"> CriticBench: <br> Benchmarking LLMs for <br> <u>Criti</u>que-<u>C</u>orrect Reasoning
</h1>
<div class="is-size-5 publication-authors">
<div class="author-block">
Expand Down Expand Up @@ -166,7 +166,7 @@ <h2 class="subtitle has-text-centered">
<div class="content">
<p style="font-size: 1.1em;">
The ability of Large Language Models (LLMs) to critique and refine their reasoning is crucial for their application in evaluation, feedback provision, and self-improvement. This paper introduces <b>CriticBench, a comprehensive benchmark designed to assess LLMs' abilities to critique and rectify their reasoning across a variety of tasks</b>.
CriticBench encompasses five reasoning domains: mathematical, commonsense, symbolic, coding, and algorithmic.
CriticBench encompasses five reasoning domains: <b>mathematical, commonsense, symbolic, coding, and algorithmic</b>.
It compiles 15 datasets and incorporates responses from three LLM families.
Utilizing CriticBench, we evaluate and dissect the performance of 17 LLMs in generation, critique, and correction reasoning, i.e., GQC reasoning, and analyze the key factors affecting LLM critical reasoning.
<br>
Expand Down

0 comments on commit 3a7386e

Please sign in to comment.