Skip to content

Commit

Permalink
Changing featured. New pfp. Changed Author ordering
Browse files Browse the repository at this point in the history
  • Loading branch information
waynchi committed Nov 12, 2024
1 parent 0ddb78a commit f11c8a2
Show file tree
Hide file tree
Showing 5 changed files with 62 additions and 68 deletions.
2 changes: 1 addition & 1 deletion _posts/2024-09-19-new-site.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Chatbot Arena New Blog
description: A new chapter for Chatbot Arena!
giscus_comments: true
date: 2024-09-20
featured: true
featured: false
thumbnail: assets/img/blog/new_site/logo.png

authors:
Expand Down
128 changes: 61 additions & 67 deletions _posts/2024-11-12-copilot-arena.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,32 @@
---
layout: distill
title: Copilot Arena
description: Initial Leaderboard, Methods, and Insights
description: Copilot Arena's Initial Leaderboard, Insights, and a New Prompting Method for Code Completions
giscus_comments: true
date: 2024-11-12
featured: true
thumbnail: assets/img/blog/copilot_arena/leaderboard.png
thumbnail: assets/img/blog/copilot_arena/leaderboard_pfp.png
authors:
- name: Wayne Chi
url: "https://waynchi.github.io"
affiliations:
name: CMU
name: CMU, UC Berkeley
- name: Valerie Chen
url: "https://valeriechen.github.io/"
- name: Wei-Lin Chiang
url: "https://infwinston.github.io/"
affiliations:
name: UC Berkeley
- name: Anastasios N. Angelopoulos
url: "http://angelopoulos.ai"
- name: Wei-Lin Chiang
url: "https://infwinston.github.io/"
- name: Naman Jain
url: "https://naman-ntc.github.io/"
- name: Tianjun Zhang
url: "https://tianjunz.github.io/"
- name: Ameet Talwalakar
url: "https://www.cs.cmu.edu/~atalwalk/"
affiliations:
name: CMU
- name: Chris Donahue
url: "https://chrisdonahue.com/"
- name: Ion Stoica
url: "https://people.eecs.berkeley.edu/~istoica/"
affiliations:
name: UC Berkeley
- name: Chris Donahue
url: "https://chrisdonahue.com/"
- name: Ameet Talwalakar
url: "https://www.cs.cmu.edu/~atalwalk/"
---

## Introduction
Expand Down Expand Up @@ -61,65 +55,65 @@ Table 1. Elo ratings and median latency of nine popular models based on over 10K
<thead>
<tr>
<th>Model</th>
<th>Arena Score</th>
<th>Confidence Intervals</th>
<th>Median Latency (s)</th>
<th style="text-align: center;">Arena Score</th>
<th style="text-align: center;">Confidence Intervals</th>
<th style="text-align: center;">Median Latency (s)</th>
</tr>
</thead>
<tbody>
<tr style="background-color: #EFBF04; color: black">
<td>Deepseek V2.5</td>
<td>1074</td>
<td>+16/-11</td>
<td>2.13</td>
<td style="text-align: center;">1074</td>
<td style="text-align: center;">+16/-11</td>
<td style="text-align: center;">2.13</td>
</tr>
<tr style="background-color: #EFBF04; color: black">
<td>Claude Sonnet 3.5 (06/20)</td>
<td>1053</td>
<td>+18/-17</td>
<td>2.29</td>
<td style="text-align: center;">1053</td>
<td style="text-align: center;">+18/-17</td>
<td style="text-align: center;">2.29</td>
</tr>
<tr style="background-color: #C0C0C0; color: black">
<td>Codestral (05/24)</td>
<td>1046</td>
<td>+12/-10</td>
<td>1.01</td>
<td style="text-align: center;">1046</td>
<td style="text-align: center;">+12/-10</td>
<td style="text-align: center;">1.01</td>
</tr>
<tr style="background-color: #C0C0C0; color: black">
<td>Meta-Llama-3.1-405B-Instruct</td>
<td>1024</td>
<td>+17/-15</td>
<td>1.12</td>
<td style="text-align: center;">1024</td>
<td style="text-align: center;">+17/-15</td>
<td style="text-align: center;">1.12</td>
</tr>
<tr style="background-color: #CD7F32; color: black">
<td>GPT-4o (08/06)</td>
<td>1016</td>
<td>+17/-20</td>
<td>0.75</td>
<td style="text-align: center;">1016</td>
<td style="text-align: center;">+17/-20</td>
<td style="text-align: center;">0.75</td>
</tr>
<tr style="background-color: #CD7F32; color: black">
<td>Gemini-1.5-Pro-002</td>
<td>1014</td>
<td>+19/-18</td>
<td>1.44</td>
<td style="text-align: center;">1014</td>
<td style="text-align: center;">+19/-18</td>
<td style="text-align: center;">1.44</td>
</tr>
<tr style="background-color: #CD7F32; color: black">
<td>Meta-Llama-3.1-70B-Instruct</td>
<td>1013</td>
<td>+14/-15</td>
<td>0.88</td>
<td style="text-align: center;">1013</td>
<td style="text-align: center;">+14/-15</td>
<td style="text-align: center;">0.88</td>
</tr>
<tr style="background-color: #CD7F32; color: black">
<td>Gemini-1.5-Flash-002</td>
<td>1005</td>
<td>+16/-22</td>
<td>0.55</td>
<td style="text-align: center;">1005</td>
<td style="text-align: center;">+16/-22</td>
<td style="text-align: center;">0.55</td>
</tr>
<tr style="background-color: #E8E8E8; color: black">
<td>GPT-4o-mini (07/18)</td>
<td>962</td>
<td>+17/-15</td>
<td>0.74</td>
<td style="text-align: center;">962</td>
<td style="text-align: center;">+17/-15</td>
<td style="text-align: center;">0.74</td>
</tr>
</tbody>
</table>
Expand Down Expand Up @@ -186,47 +180,47 @@ Table 2: Percentage of well-formatted code completions with different prompt tem
<thead>
<tr>
<th>Model</th>
<th>PSM</th>
<th>SPM</th>
<th>Mask</th>
<th style="text-align: center;">PSM</th>
<th style="text-align: center;">SPM</th>
<th style="text-align: center;">Mask</th>
</tr>
</thead>
<tbody>
<tr>
<td>Claude-3.5-sonnet</td>
<td>0.67 (+0.16)</td>
<td>0.66 (+0.15)</td>
<td>0.66 (+0.14)</td>
<td style="text-align: center;">0.67 (+0.16)</td>
<td style="text-align: center;">0.66 (+0.15)</td>
<td style="text-align: center;">0.66 (+0.14)</td>
</tr>
<tr>
<td>GPT-4o-2024-08-06</td>
<td>0.71 (+0.02)</td>
<td>0.55 (+0.19)</td>
<td>0.62 (+0.12)</td>
<td style="text-align: center;">0.71 (+0.02)</td>
<td style="text-align: center;">0.55 (+0.19)</td>
<td style="text-align: center;">0.62 (+0.12)</td>
</tr>
<tr>
<td>GPT-4o-mini-2024-07-18</td>
<td>0.18 (+0.39)</td>
<td>0.12 (+0.54)</td>
<td>0.15 (+0.36)</td>
<td style="text-align: center;">0.18 (+0.39)</td>
<td style="text-align: center;">0.12 (+0.54)</td>
<td style="text-align: center;">0.15 (+0.36)</td>
</tr>
<tr>
<td>Gemini-1.5-pro-001</td>
<td>0.38 (+0.28)</td>
<td>0.34 (+0.36)</td>
<td>0.43 (-0.04)</td>
<td style="text-align: center;">0.38 (+0.28)</td>
<td style="text-align: center;">0.34 (+0.36)</td>
<td style="text-align: center;">0.43 (-0.04)</td>
</tr>
<tr>
<td>Gemini-1.5-flash-001</td>
<td>0.34 (+0.24)</td>
<td>0.27 (+0.37)</td>
<td>0.36 (+0.19)</td>
<td style="text-align: center;">0.34 (+0.24)</td>
<td style="text-align: center;">0.27 (+0.37)</td>
<td style="text-align: center;">0.36 (+0.19)</td>
</tr>
<tr>
<td>Llama-3.1-70B-Instruct</td>
<td>0.14 (+0.46)</td>
<td>0.15 (+0.48)</td>
<td>0.12 (+0.27)</td>
<td style="text-align: center;">0.14 (+0.46)</td>
<td style="text-align: center;">0.15 (+0.48)</td>
<td style="text-align: center;">0.12 (+0.27)</td>
</tr>
</tbody>
</table>
Expand All @@ -248,7 +242,7 @@ Table 2: Percentage of well-formatted code completions with different prompt tem
```bibtex
@misc{chi2024copilot,
title={Copilot Arena},
author={Wayne Chi and Valerie Chen and Wei-Lin Chiang and Anastasios N. Angelopoulos and Naman Jain and Tianjun Zhang and Ameet Talwalakar and Chris Donahue and Ion Stoica}
author={Wayne Chi and Valerie Chen and Wei-Lin Chiang and Anastasios N. Angelopoulos and Naman Jain and Tianjun Zhang and Ion Stoica and Chris Donahue and Ameet Talwalakar}
year={2024},
}
```
Binary file modified assets/img/blog/copilot_arena/leaderboard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/img/blog/copilot_arena/leaderboard_pfp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/img/blog/copilot_arena/prompt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit f11c8a2

Please sign in to comment.