SWE-bench Verified High Score Project

Project Overview

Welcome to our project dedicated to achieving high scores on SWE-bench Verified. Our goal is to develop and implement strategies that will maximize our performance on this benchmark, which is designed to evaluate AI models' ability to solve real-world software issues.

About SWE-bench Verified

SWE-bench Verified is a human-validated subset of the original SWE-bench dataset, released by OpenAI. Key points:

It consists of 500 samples verified by human annotators to be non-problematic.
It addresses issues in the original dataset, such as overly specific unit tests and underspecified problem statements.
The benchmark aims to provide a more accurate evaluation of AI models' software engineering capabilities.
On SWE-bench Verified, GPT-4o resolves 33.2% of samples, more than doubling its previous score on the original SWE-bench.

Quick Start

Clone this repository:

git clone [Your Repository URL]
cd [Your Repository Name]

Install dependencies:
```
pip install -r requirements.txt
```
Set up Docker (required for evaluation):
- Follow the Docker setup guide
- Ensure you have at least 120GB of free disk space

Run a sample evaluation:

python run_evaluation.py --model [Your Model Name] --output_dir ./results

Project Structure

README.md: This file
setup.md: Detailed setup instructions
dataset_info.md: Information about the SWE-bench Verified dataset
evaluation_guide.md: Guide to running evaluations
model_development.md: Strategies for developing high-performing models
results/: Directory for storing evaluation results
src/: Source code for our models and evaluation scripts

Next Steps

Read through the detailed documentation in this repository
Set up your development environment
Familiarize yourself with the SWE-bench Verified dataset
Start developing and testing model improvements
Run evaluations and analyze results

Contributing

We welcome contributions! Please see our CONTRIBUTING.md file for guidelines on how to contribute to this project.

License

This project is licensed under the MIT License. See the LICENSE file for details.

For more detailed information on specific aspects of the project, please refer to the individual markdown files in this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SWE-bench Verified High Score Project

Project Overview

About SWE-bench Verified

Quick Start

Project Structure

Next Steps

Contributing

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
dataset_info.md		dataset_info.md
evaluation_guide.md		evaluation_guide.md
fetch-dataset.js		fetch-dataset.js
flow.md		flow.md
model_development.md		model_development.md
setup.md		setup.md
swe_bench_data.jsonl		swe_bench_data.jsonl

License

OpenAgentsInc/swe-bench-verified

Folders and files

Latest commit

History

Repository files navigation

SWE-bench Verified High Score Project

Project Overview

About SWE-bench Verified

Quick Start

Project Structure

Next Steps

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages