GitHub - ollmer/SWE-agent: SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.29% of bugs in the SWE-bench evaluation set and takes just 1.5 minutes to run.

Website & Demo | Discord | Paper [coming April 2024]

👋 Overview

SWE-agent turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories.

On SWE-bench, SWE-agent resolves 12.29% of issues, achieving the state-of-the-art performance on the full test set.

SWE-agent is built and maintained by researchers from Princeton University.

✨ Agent-Computer Interface (ACI)

We accomplish these results by designing simple LM-centric commands and feedback formats to make it easier for the LM to browse the repository, view, edit and execute code files. We call this an Agent-Computer Interface (ACI) and build the SWE-agent repository to make it easy to iterate on ACI design for repository-level coding agents.

Just like how typical language models requires good prompt engineering, good ACI design leads to much better results when using agents. As we show in our paper, a baseline agent without a well-tuned ACI does much worse than SWE-agent.

SWE-agent contains features that we discovered to be immensely helpful during the agent-computer interface design process:

We add a linter that runs when an edit command is issued, and do not let the edit command go through if the code isn't syntactically correct.
We supply the agent with a special-built file viewer, instead of having it just cat files. We found that this file viewer works best when displaying just 100 lines in each turn. The file editor that we built has commands for scrolling up and down and for performing a search within the file.
We supply the agent with a special-built full-directory string searching command. We found that it was important for this tool to succinctly list the matches- we simply list each file that had at least one match. Showing the model more context about each match proved to be too confusing for the model.
When commands have an empty output we return a message saying "Your command ran successfully and did not produce any output."

Read our paper for more details [coming soon!].

@misc{yang2024sweagent,
      title={SWE-agent: Agent Computer Interfaces Enable Software Engineering Language Models}, 
      author={John Yang and Carlos E. Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and Karthik Narasimhan and Ofir Press},
      year={2024},
}

🚀 Get started

☁️ Run from your browser

Click
Add your API keys to keys.cfg (find the file in the left sidebar and fill out the template)
Make sure to wait until the postCreateCommand in the terminal window at the bottom is finished
Enter your SWE-agent command

🔎 Watch the video

sweagent_codespace.mov

🏎️ Express Setup + Run

You can run the software directly using Docker.

Install Docker, then start Docker locally.
Run docker pull sweagent/swe-agent:latest
Add your API tokens to a file keys.cfg as explained below

Then run

# NOTE:
# This assumes that keys.cfg is in your current directory (else fix the path below)
# This command is equivalent to the script shown in the quickstart 
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock \
  -v $(pwd)/keys.cfg:/app/keys.cfg \
  sweagent/swe-agent-run:latest \
  python run.py --image_name=sweagent/swe-agent:latest \
  --model_name gpt4 \
  --data_path https://github.com/pvlib/pvlib-python/issues/1603 \
  --config_file config/default_from_url.yaml  --skip_existing=False

Tip

For more information on the different API keys/tokens, see below.
If you're using docker on Windows, use -v //var/run/docker.sock:/var/run/docker.sock (double slash) to escape it (more information).
See the installation issues section for more help if you run into trouble.

🐍 Setup with conda (developer version)

To install the development version:

Install Docker, then start Docker locally.
Clone this repository
Install Miniconda, then create the swe-agent environment with conda env create -f environment.yml
Activate using conda activate swe-agent.
Run ./setup.sh to create the swe-agent docker image.
Create a keys.cfg file at the root of this repository (see below)

Warning

Expect some issues with Windows (we're working on them). In the meantime, simply use Docker (see above). If you want the latest version, you can also build your own swe-agent-run container with the Dockerfile at the root of this repository by running docker build -t sweagent/swe-agent-run:latest .

Tip

If you run into docker issues, see the installation issues section for more help

🔑 Add your API keys/tokens

For the conda setup, create a keys.cfg file at the root of this repository and populate it with your API keys.

GITHUB_TOKEN: 'GitHub Token Here (optional)'
OPENAI_API_KEY: 'OpenAI API Key Here if using OpenAI Model (optional)'

If you're using docker, pass the key with the -e option to the docker container.

🔎 More options for different keys (click to unfold)

All keys are optional.

GITHUB_TOKEN: 'GitHub Token for access to private repos'  # <-- delete line if not used
OPENAI_API_KEY: 'OpenAI API Key Here if using OpenAI Model'
ANTHROPIC_API_KEY: 'Anthropic API Key Here if using Anthropic Model'
TOGETHER_API_KEY: 'Together API Key Here if using Together Model'
AZURE_OPENAI_API_KEY: 'Azure OpenAI API Key Here if using Azure OpenAI Model'
AZURE_OPENAI_ENDPOINT: 'Azure OpenAI Endpoint Here if using Azure OpenAI Model'
AZURE_OPENAI_DEPLOYMENT: 'Azure OpenAI Deployment Here if using Azure OpenAI Model'
AZURE_OPENAI_API_VERSION: 'Azure OpenAI API Version Here if using Azure OpenAI Model'
OPENAI_API_BASE_URL: 'LM base URL here if using Local or alternative api Endpoint'

See the following links for tutorials on obtaining Anthropic, OpenAI, and Github tokens.

More installation tips

If you seem to be having issues with running docker

Make sure that you allow the use of the Docker socket. In Docker desktop, click Settings > Advanced > Allow the default Docker socket to be used (requires password)
If your docker installation uses a different socket, you might have to symlink them, see this command for example

Any remaining issues? Please open a GitHub issue!

🔥 Solve real-life GitHub issues!

Using this script, you can run SWE-agent on any GitHub issue!

python run.py --model_name gpt4 \
  --data_path https://github.com/pvlib/pvlib-python/issues/1603 \
  --config_file config/default_from_url.yaml

You can also apply to it to a local repository:

python run.py --model_name gpt4 \
  --data_path /path/to/my_issue.md \
  --repo_path /path/to/my/local/repo \
  --config_file config/default_from_url.yaml \
  --apply_patch_locally

Tip

Run python run.py --help to see all available options.
You can have the agent automatically open a PR if the issue has been solved by supplying the --open_pr flag. Please use this feature responsibly (on your own repositories or after careful consideration).

See the scripts/ folder for other useful scripts and details.
See the config/ folder for details about how you can define your own configuration!
See the sweagent/agent/ folder for details about the logic behind configuration based workflows.
See the sweagent/environment/ folder for details about the SWEEnv environment (interface + implementation).
See the trajectories/ folder for details about the output of run.py.

Ollama Support

Models served with an ollama server can be used by specifying --model with ollama:model_name and --host_url to point to the url used to serve ollama (http://localhost:11434 by default). See more details about using ollama here.

python run.py --model_name ollama:deepseek-coder:6.7b-instruct \
  --host_url http://localhost:11434 \
  --data_path https://github.com/pvlib/pvlib-python/issues/1603 \
  --config_file config/default_from_url.yaml

💽 Benchmarking

There are two steps to the SWE-agent pipeline. First SWE-agent takes an input GitHub issue and returns a pull request that attempts to fix it. We call that step inference. The second step (currently, only available for issues in the SWE-bench benchmark) is to evaluate the pull request to verify that it has indeed fixed the issue.

Warning

At this moment, there are known issues with a small number of repositories that don't install properly for arm64 / aarch64 architecture computers. We're working on a fix, but if you'd like to run and evaluate on the entirety of SWE-bench, the easiest way is by using an x86 machine.

👩‍💻 Inference

Inference on any GitHub Issue: See above.

Inference on SWE-bench: Run SWE-agent on SWE-bench Lite and generate patches.

python run.py --model_name gpt4 \
  --per_instance_cost_limit 2.00 \
  --config_file ./config/default.yaml

If you'd like to run on a single issue from SWE-bench, use the --instance_filter option as follows:

python run.py --model_name gpt4 \
  --instance_filter marshmallow-code__marshmallow-1359

🧪 Evaluation

This step is only available for issues from the SWE-bench set. To evaluate generated pull requests:

cd evaluation/
./run_eval.sh <predictions_path>

Replace <predictions_path> with the path to the model's predictions, which should be generated from the Inference step. The <predictions_path> arguments should look like ../trajectories/<username>/<model>-<dataset>-<hyperparams>/all_preds.jsonl

See the evaluation/ folder for details about how evaluation works.

🦺 Modifying SWE-agent

If you'd like to modify the example demonstration that we feed the model at the start of each run, first generate a trajectory manually by running the agent with --model_name human and then convert that trajectory into a demonstration by following the guide here.

💫 Contributions

If you'd like to ask questions, learn about upcoming features, and participate in future development, join our Discord community!
If you'd like to contribute to the codebase, we welcome issues and pull requests!
If you'd like to see a post or tutorial about some topic, please let us know via an issue.

Contact person: John Yang and Carlos E. Jimenez (Email: {jy1682, carlosej}@princeton.edu).

🪪 License

MIT. Check LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
.devcontainer		.devcontainer
.github		.github
assets		assets
config		config
docker		docker
evaluation		evaluation
inspector		inspector
make_demos		make_demos
scripts		scripts
sweagent		sweagent
tests		tests
trajectories		trajectories
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build_deploy.sh		build_deploy.sh
codecov.yml		codecov.yml
environment.yml		environment.yml
mlc_config.json		mlc_config.json
pyproject.toml		pyproject.toml
release_dockerhub.sh		release_dockerhub.sh
requirements.txt		requirements.txt
run.py		run.py
run_replay.py		run_replay.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👋 Overview

✨ Agent-Computer Interface (ACI)

🚀 Get started

☁️ Run from your browser

🏎️ Express Setup + Run

🐍 Setup with conda (developer version)

🔑 Add your API keys/tokens

More installation tips

🔥 Solve real-life GitHub issues!

💽 Benchmarking

👩‍💻 Inference

🧪 Evaluation

🦺 Modifying SWE-agent

💫 Contributions

🪪 License

About

Releases

Packages

Languages

License

ollmer/SWE-agent

Folders and files

Latest commit

History

Repository files navigation

👋 Overview

✨ Agent-Computer Interface (ACI)

🚀 Get started

☁️ Run from your browser

🏎️ Express Setup + Run

🐍 Setup with conda (developer version)

🔑 Add your API keys/tokens

More installation tips

🔥 Solve real-life GitHub issues!

💽 Benchmarking

👩‍💻 Inference

🧪 Evaluation

🦺 Modifying SWE-agent

💫 Contributions

🪪 License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages