RepoTransBench is a comprehensive repository-level code translation benchmark featuring 1,897 real-world repository samples across 13 language pairs with automatically executable test suites. Unlike previous fine-grained benchmarks that focus on snippets, functions, or files, RepoTransBench addresses real-world demands where entire repositories need translation.
- π Multilingual: 13 translation pairs covering 7 programming languages (C, C++, C#, Java, JavaScript, Python, Rust, Matlab)
- π Large-scale: 1,897 repository samples with comprehensive test coverage
- β‘ Execution-based: Automatic test suites for functional correctness validation
- ποΈ Real-world: Repository-level complexity with dependencies, configuration files, and resource management
- π€ Automated: Multi-agent framework for benchmark construction
Source Language | Target Languages |
---|---|
C | Python, Rust |
C++ | Python |
C# | Java |
Java | C#, Go, Python |
JavaScript | Python |
Matlab | Python |
Python | C++, Go, Java, Rust |
- Python 3.8+
- Docker (for sandboxed execution)
- Git
-
Clone the repository
git clone https://github.com/DeepSoftwareAnalytics/RepoTransBench.git cd RepoTransBench
-
Install dependencies
pip install -r requirements.txt
-
Download the dataset
Download the benchmark data from our latest release:
π₯ Release: RepoTransBench Dataset v1.0
# Download and extract the dataset to /workspace directory mkdir -p /workspace cd /workspace wget https://github.com/DeepSoftwareAnalytics/RepoTransBench/releases/download/v1.0/repotransbench_dataset.tar.gz tar -xzf repotransbench_dataset.tar.gz
-
Configure API access
# Add your API keys to the configuration file echo "api_key_1 your_openai_api_key_here" > RepoTransAgent/API_KEY.txt echo "api_key_2 your_anthropic_api_key_here" >> RepoTransAgent/API_KEY.txt
-
Set up Docker environment (optional)
cd docker docker-compose up -d
Metric | Value |
---|---|
Total Samples | 1,897 |
Translation Pairs | 13 |
Programming Languages | 7 |
Average Tokens per Sample | 23,966 |
Average Lines of Code | 2,394 |
Average Functions | 177 |
Average Classes | 35 |
Average Import Statements | 163 |
Line Coverage | 81.89% |
Branch Coverage | 72.61% |
We introduce RepoTransAgent, a general agent framework for repository-level code translation based on the ReAct (Reasoning + Acting) paradigm.
- ReadFile: Examine code files, configurations, and documentation
- CreateFile: Generate translated files and configurations
- ExecuteCommand: Run builds, tests, and dependency installations
- SearchContent: Locate specific code patterns and dependencies
- Finished: Mark translation completion
-
Single Project Translation
# Translate a single project python -m RepoTransAgent.run \ --project_name "your_project_name" \ --source_language "Python" \ --target_language "Java" \ --model_name "claude-sonnet-4-20250514" \ --max_iterations 20
-
Batch Translation
# Run batch translation on multiple projects python -m RepoTransAgent.run_batch
-
Available Models
claude-sonnet-4-20250514
(default)gpt-4.1
gemini-2.5-flash-lite
deepseek-chat
qwen3-235b-a22b
python -m RepoTransAgent.run \
--project_name PROJECT_NAME \ # Required: Name of the project to translate
--source_language SOURCE_LANG \ # Required: Source language (Python, Java, C++, etc.)
--target_language TARGET_LANG \ # Required: Target language (Python, Java, C++, etc.)
--model_name MODEL_NAME \ # Optional: LLM model (default: claude-sonnet-4-20250514)
--max_iterations MAX_ITER # Optional: Max iterations (default: 20)
python -m RepoTransAgent.run_batch
The batch script automatically:
- Reads from
/workspace/target_projects/projects_summary.jsonl
- Processes multiple projects in parallel
- Supports resume functionality (skips completed projects)
- Saves detailed results and logs
# Direct command line execution
python -m RepoTransAgent.run \
--project_name "example_project" \
--source_language "Python" \
--target_language "Java" \
--model_name "claude-sonnet-4-20250514"
# The agent automatically evaluates against tests during translation
# Results are saved in logs/ directory with detailed analysis
# Example log structure:
# logs/claude-sonnet-4-20250514/project_name_Python_to_Java_20240130_143022/
# βββ system_prompt.txt # System prompt used
# βββ turn_01.txt # Each conversation turn
# βββ turn_02.txt
# βββ ...
# βββ final_summary.txt # Final results and test analysis
# Run multiple projects in parallel (configurable in run_batch.py)
python -m RepoTransAgent.run_batch
# Configuration in run_batch.py:
# - max_per_pair: Projects per translation pair
# - num_processes: Parallel processes (default: 50)
# - max_iterations: Max iterations per project (default: 20)
Our evaluation reveals that repository-level code translation remains challenging:
Method | Success Rate | Compilation Rate |
---|---|---|
Translation Only | 0.0% | 26.2% |
Error Feedback | 12.4% | 30.5% |
RepoTransAgent | 32.8% | 54.8% |
- Directional Asymmetry: Static-to-dynamic translation (45-63% success) significantly outperforms dynamic-to-static (< 10%)
- Model Specialization: Different LLMs show advantages for specific translation pairs
- Complexity Impact: Repository complexity inversely correlates with translation success
RepoTransBench enables research in:
- Code Translation: Develop and evaluate new translation methods
- LLM Capabilities: Assess model performance on complex, real-world tasks
- Software Engineering: Study repository-level code migration challenges
- Multi-Agent Systems: Design collaborative AI systems for complex tasks
RepoTransBench/
βββ RepoTransAgent/ # π€ Main agent framework
β βββ actions.py # Action definitions (CreateFile, ReadFile, etc.)
β βββ generator.py # LLM API client and response handling
β βββ run.py # Single project translation script
β βββ run_batch.py # Batch processing script
β βββ test_analyzer.py # Multi-language test result analysis
β βββ API_KEY.txt # API keys configuration
β βββ prompts/
β βββ system_prompt.py # System prompt generation
βββ multi_agent_based_benchmark_construction/ # ποΈ Benchmark construction tools
β βββ testcase_public_agent_batch/ # Public test generation
β βββ testcase_target_agent_batch/ # Target test translation
β βββ coverage_agent_batch/ # Coverage analysis
β βββ runnable_agent_batch/ # Environment setup
βββ rule_based_filter_scripts/ # π Repository filtering tools
βββ download_repos_scripts/ # π₯ Data collection utilities
βββ docker/ # π³ Containerization setup
β βββ Dockerfile
β βββ docker-compose.yml
βββ assets/ # π Paper figures and resources
After downloading the dataset, your /workspace
directory should look like:
/workspace/
βββ source_projects/ # Original source code repositories
β βββ Python/
β βββ Java/
β βββ C++/
β βββ ...
βββ target_projects/ # Target translation projects with tests
β βββ projects_summary.jsonl # Project metadata
β βββ Python/
β β βββ Java/
β β β βββ project1/
β β β β βββ run_tests.sh
β β β β βββ public_tests/
β β β β βββ original_tests/
β β β βββ project2/
β β βββ C++/
β βββ Java/
β βββ Python/
βββ translated_projects/ # Generated translations (created during execution)
βββ claude-sonnet-4-20250514/
βββ Python/
β βββ Java/
βββ Java/
βββ Python/
We welcome submissions to our leaderboard! Submit your results via GitHub Issues.
Rank | Method | Model | Success Rate | Paper/Code |
---|---|---|---|---|
1 | RepoTransAgent | Claude-4 | 32.8% | [This work] |
2 | RepoTransAgent | GPT-4.1 | 32.8% | [This work] |
3 | RepoTransAgent | DeepSeek | 22.5% | [This work] |
If you use RepoTransBench in your research, please cite our paper:
@article{repotransbench2024,
title={RepoTransBench: A Real-World Multilingual Benchmark for Repository-Level Code Translation},
author={Wang, Yanli and Wang, Yanlin and Wang, Suiquan and Guo, Daya and Chen, Jiachi and Grundy, John and Liu, Xilin and Ma, Yuchi and Mao, Mingzhi and Zhang, Hongyu and Zheng, Zibin},
journal={arXiv preprint arXiv:2024.xxxxx},
year={2024}
}
We welcome contributions! Please see our Contributing Guidelines for details.
- π Report bugs and issues
- π‘ Suggest new features or translation pairs
- π Improve documentation
- π§ͺ Add new evaluation methods
- π Submit translation results
This project is licensed under the MIT License - see the LICENSE file for details.
- Thanks to all contributors who helped build this benchmark
- Special thanks to the open-source community for providing repositories
- Supported by Sun Yat-sen University, Monash University, Huawei Cloud, and Chongqing University
For questions or collaboration opportunities:
- Primary Contact: Yanlin Wang ([email protected])
- Issues: Please use GitHub Issues
- Discussions: Join our GitHub Discussions
β Star this repository if you find it useful! β