-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
e4e16b8
commit 2627c5b
Showing
4 changed files
with
64 additions
and
55 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
title: Scores by release date | ||
parent: Aider LLM Leaderboards | ||
nav_order: 200 | ||
--- | ||
|
||
## LLM code editing skill by model release date | ||
|
||
[![connecting to many LLMs](/assets/models-over-time.svg)](https://aider.chat/assets/models-over-time.svg) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
--- | ||
parent: Aider LLM Leaderboards | ||
highlight_image: /assets/leaderboard.jpg | ||
nav_order: 100 | ||
description: Quantitative benchmark of LLM code refactoring skill. | ||
--- | ||
|
||
|
||
## Aider refactoring leaderboard | ||
|
||
[Aider's refactoring benchmark](https://github.com/Aider-AI/refactor-benchmark) asks the LLM to refactor 89 large methods from large python classes. This is a more challenging benchmark, which tests the model's ability to output long chunks of code without skipping sections or making mistakes. It was developed to provoke and measure [GPT-4 Turbo's "lazy coding" habit](/2023/12/21/unified-diffs.html). | ||
|
||
The refactoring benchmark requires a large context window to | ||
work with large source files. | ||
Therefore, results are available for fewer models. | ||
|
||
<input type="text" id="refacSearchInput" placeholder="Search..." style="width: 100%; max-width: 800px; margin: 10px auto; padding: 8px; display: block; border: 1px solid #ddd; border-radius: 4px;"> | ||
|
||
<table style="width: 100%; max-width: 800px; margin: auto; border-collapse: collapse; box-shadow: 0 2px 4px rgba(0,0,0,0.1); font-size: 14px;"> | ||
<thead style="background-color: #f2f2f2;"> | ||
<tr> | ||
<th style="padding: 8px; text-align: left;">Model</th> | ||
<th style="padding: 8px; text-align: center;">Percent completed correctly</th> | ||
<th style="padding: 8px; text-align: center;">Percent using correct edit format</th> | ||
<th style="padding: 8px; text-align: left;">Command</th> | ||
<th style="padding: 8px; text-align: center;">Edit format</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
{% assign refac_sorted = site.data.refactor_leaderboard | sort: 'pass_rate_1' | reverse %} | ||
{% for row in refac_sorted %} | ||
<tr style="border-bottom: 1px solid #ddd;"> | ||
<td style="padding: 8px;">{{ row.model }}</td> | ||
<td style="padding: 8px; text-align: center;">{{ row.pass_rate_1 }}%</td> | ||
<td style="padding: 8px; text-align: center;">{{ row.percent_cases_well_formed }}%</td> | ||
<td style="padding: 8px;"><code>{{ row.command }}</code></td> | ||
<td style="padding: 8px; text-align: center;">{{ row.edit_format }}</td> | ||
</tr> | ||
{% endfor %} | ||
</tbody> | ||
</table> | ||
|
||
<canvas id="refacChart" width="800" height="450" style="margin-top: 20px"></canvas> | ||
<script src="https://unpkg.com/patternomaly/dist/patternomaly.js"></script> | ||
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script> | ||
<script> | ||
{% include refactor-leaderboard.js %} | ||
</script> | ||
|
||
|