diff --git a/.gitignore b/.gitignore index 3f6b415..bd70549 100644 --- a/.gitignore +++ b/.gitignore @@ -4,4 +4,6 @@ tmp* __pycache__* -*json \ No newline at end of file +*json + +.DS_Store \ No newline at end of file diff --git a/index.html b/index.html index 09d6402..d6bdd4a 100644 --- a/index.html +++ b/index.html @@ -112,7 +112,7 @@

- @@ -132,12 +132,276 @@

+
+
+

Leaderboard

+ +
+ + +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Full
RankModelPass@1
1 RAG (Sparse-Retrieval) + GPT-4o27.35
2 RAG (Current-File) + DeepSeek-V2.527.04
3 RAG (Current-File) + Codestral-22B20.00
4RAG (Current-File) + Claude 3.5 Sonnet19.80
5RAG (Current-File) + GPT-4o-Mini18.67
6RAG (Current-File) + OpenCodeInterpreter-33B18.27
7RAG (Dense-Retrieval) + DeepSeekCoder-33B17.14
8RAG (Sparse-Retrieval) + DeepSeekCoder-6.7B14.08
9RAG (Current-File) + OpenCodeInterpreter-6.7B13.16
10RAG (Dense-Retrieval) + CodeLlama-13B12.76
11RAG (Sparse-Retrieval) + CodeLlama-7B10.71
+
+
+ + + +
+
+
+ + + + + +
@@ -205,7 +469,6 @@

Leaderboard

-
@@ -273,7 +536,6 @@

Leaderboard

-
@@ -343,7 +605,7 @@

Leaderboard

- + --> @@ -355,13 +617,7 @@

Leaderboard

Notes on Experiments

- 1. Retrieval Settings: We use three retrieval settings: Sparse Retrieval, Dense Retrieval, and Current File. Both Sparse Retrieval and Dense Retrieval operate at the function level. Sparse Retrieval employs BM25, while Dense Retrieval uses text-embedding-3-small to encode the problem description, and Cosine similairty is used as the dense-retrieval score. For Current File setting, we use the contents before the target function and after the tagret function as contexts. -

-

- 2. Generation details: Each LLM generates one output per instance in REPOCOD using greedy decoding. Outputs must have correct indentation to avoid syntax errors. -

-

- Please checkout our paper for more details. + RAG ({settings}): These results are generated under three retrieval settings: Sparse Retrieval, Dense Retrieval, and Current File. Please checkout our paper for more details.

@@ -369,6 +625,31 @@

Notes on Experiments

+
+
+
+
+ + RAG prompt example + + ▼ + + +
+
+ REPOCOD Statistics +
+ Prompt example used in the experiments of our paper. +
+
+
+
+
+
+
+ + +
@@ -393,7 +674,7 @@

Abstract

-

Data Collection

+

Data Collection Pipeline

Overview of REPOCOD Pipeline
@@ -408,7 +689,7 @@

Data Collection

-
+ +
+
+
+

Dataset Statistics

+ + +
+ +
+ + +
+
+ REPOCOD Statistics +
+ REPOCOD (Full) consists of 980 instances from 11 repositories across diverse domains, including data science, + scientific computing, web, and software development. This table details statistics for each context + complexity type—repository-level, file-level, and self-contained—including #NL (tokens in target + descriptions), #GT (tokens in canonical solutions), Cyclo. (average cyclomatic complexity), and #Funcs. + (number of target functions). +
+
+
+ + + +
+
+ +
@@ -444,12 +803,13 @@

BibTeX

+ +