Skip to content

DeepSoftwareAnalytics/RepoTransBench

Repository files navigation

RepoTransBench: A Real-World Multilingual Benchmark for Repository-Level Code Translation

License Python Paper

πŸ“– Overview

RepoTransBench is a comprehensive repository-level code translation benchmark featuring 1,897 real-world repository samples across 13 language pairs with automatically executable test suites. Unlike previous fine-grained benchmarks that focus on snippets, functions, or files, RepoTransBench addresses real-world demands where entire repositories need translation.

Key Features

  • 🌍 Multilingual: 13 translation pairs covering 7 programming languages (C, C++, C#, Java, JavaScript, Python, Rust, Matlab)
  • πŸ“Š Large-scale: 1,897 repository samples with comprehensive test coverage
  • ⚑ Execution-based: Automatic test suites for functional correctness validation
  • πŸ—οΈ Real-world: Repository-level complexity with dependencies, configuration files, and resource management
  • πŸ€– Automated: Multi-agent framework for benchmark construction

Supported Translation Pairs

Source Language Target Languages
C Python, Rust
C++ Python
C# Java
Java C#, Go, Python
JavaScript Python
Matlab Python
Python C++, Go, Java, Rust

πŸš€ Getting Started

Prerequisites

  • Python 3.8+
  • Docker (for sandboxed execution)
  • Git

Installation

  1. Clone the repository

    git clone https://github.com/DeepSoftwareAnalytics/RepoTransBench.git
    cd RepoTransBench
  2. Install dependencies

    pip install -r requirements.txt
  3. Download the dataset

    Download the benchmark data from our latest release:

    πŸ“₯ Release: RepoTransBench Dataset v1.0

    # Download and extract the dataset to /workspace directory
    mkdir -p /workspace
    cd /workspace
    wget https://github.com/DeepSoftwareAnalytics/RepoTransBench/releases/download/v1.0/repotransbench_dataset.tar.gz
    tar -xzf repotransbench_dataset.tar.gz
  4. Configure API access

    # Add your API keys to the configuration file
    echo "api_key_1 your_openai_api_key_here" > RepoTransAgent/API_KEY.txt
    echo "api_key_2 your_anthropic_api_key_here" >> RepoTransAgent/API_KEY.txt
  5. Set up Docker environment (optional)

    cd docker
    docker-compose up -d

πŸ“Š Benchmark Statistics

Metric Value
Total Samples 1,897
Translation Pairs 13
Programming Languages 7
Average Tokens per Sample 23,966
Average Lines of Code 2,394
Average Functions 177
Average Classes 35
Average Import Statements 163
Line Coverage 81.89%
Branch Coverage 72.61%

πŸ€– RepoTransAgent

We introduce RepoTransAgent, a general agent framework for repository-level code translation based on the ReAct (Reasoning + Acting) paradigm.

Key Capabilities

  • ReadFile: Examine code files, configurations, and documentation
  • CreateFile: Generate translated files and configurations
  • ExecuteCommand: Run builds, tests, and dependency installations
  • SearchContent: Locate specific code patterns and dependencies
  • Finished: Mark translation completion

Quick Start with RepoTransAgent

  1. Single Project Translation

    # Translate a single project
    python -m RepoTransAgent.run \
        --project_name "your_project_name" \
        --source_language "Python" \
        --target_language "Java" \
        --model_name "claude-sonnet-4-20250514" \
        --max_iterations 20
  2. Batch Translation

    # Run batch translation on multiple projects
    python -m RepoTransAgent.run_batch
  3. Available Models

    • claude-sonnet-4-20250514 (default)
    • gpt-4.1
    • gemini-2.5-flash-lite
    • deepseek-chat
    • qwen3-235b-a22b

Command Line Arguments

Single Translation (RepoTransAgent.run)

python -m RepoTransAgent.run \
    --project_name PROJECT_NAME \      # Required: Name of the project to translate
    --source_language SOURCE_LANG \    # Required: Source language (Python, Java, C++, etc.)
    --target_language TARGET_LANG \    # Required: Target language (Python, Java, C++, etc.)
    --model_name MODEL_NAME \          # Optional: LLM model (default: claude-sonnet-4-20250514)
    --max_iterations MAX_ITER          # Optional: Max iterations (default: 20)

Batch Translation (RepoTransAgent.run_batch)

python -m RepoTransAgent.run_batch

The batch script automatically:

  • Reads from /workspace/target_projects/projects_summary.jsonl
  • Processes multiple projects in parallel
  • Supports resume functionality (skips completed projects)
  • Saves detailed results and logs

πŸ“‹ Usage Examples

1. Basic Translation

# Direct command line execution
python -m RepoTransAgent.run \
    --project_name "example_project" \
    --source_language "Python" \
    --target_language "Java" \
    --model_name "claude-sonnet-4-20250514"

2. Evaluation on Benchmark

# The agent automatically evaluates against tests during translation
# Results are saved in logs/ directory with detailed analysis

# Example log structure:
# logs/claude-sonnet-4-20250514/project_name_Python_to_Java_20240130_143022/
# β”œβ”€β”€ system_prompt.txt          # System prompt used
# β”œβ”€β”€ turn_01.txt                # Each conversation turn
# β”œβ”€β”€ turn_02.txt
# β”œβ”€β”€ ...
# └── final_summary.txt          # Final results and test analysis

3. Batch Processing

# Run multiple projects in parallel (configurable in run_batch.py)
python -m RepoTransAgent.run_batch

# Configuration in run_batch.py:
# - max_per_pair: Projects per translation pair
# - num_processes: Parallel processes (default: 50)
# - max_iterations: Max iterations per project (default: 20)

πŸ“ˆ Evaluation Results

Our evaluation reveals that repository-level code translation remains challenging:

Method Success Rate Compilation Rate
Translation Only 0.0% 26.2%
Error Feedback 12.4% 30.5%
RepoTransAgent 32.8% 54.8%

Key Findings

  1. Directional Asymmetry: Static-to-dynamic translation (45-63% success) significantly outperforms dynamic-to-static (< 10%)
  2. Model Specialization: Different LLMs show advantages for specific translation pairs
  3. Complexity Impact: Repository complexity inversely correlates with translation success

πŸ”¬ Research Applications

RepoTransBench enables research in:

  • Code Translation: Develop and evaluate new translation methods
  • LLM Capabilities: Assess model performance on complex, real-world tasks
  • Software Engineering: Study repository-level code migration challenges
  • Multi-Agent Systems: Design collaborative AI systems for complex tasks

πŸ“ Project Structure

RepoTransBench/
β”œβ”€β”€ RepoTransAgent/              # πŸ€– Main agent framework
β”‚   β”œβ”€β”€ actions.py              # Action definitions (CreateFile, ReadFile, etc.)
β”‚   β”œβ”€β”€ generator.py            # LLM API client and response handling
β”‚   β”œβ”€β”€ run.py                  # Single project translation script
β”‚   β”œβ”€β”€ run_batch.py            # Batch processing script
β”‚   β”œβ”€β”€ test_analyzer.py        # Multi-language test result analysis
β”‚   β”œβ”€β”€ API_KEY.txt             # API keys configuration
β”‚   └── prompts/
β”‚       └── system_prompt.py    # System prompt generation
β”œβ”€β”€ multi_agent_based_benchmark_construction/  # πŸ—οΈ Benchmark construction tools
β”‚   β”œβ”€β”€ testcase_public_agent_batch/    # Public test generation
β”‚   β”œβ”€β”€ testcase_target_agent_batch/    # Target test translation
β”‚   β”œβ”€β”€ coverage_agent_batch/           # Coverage analysis
β”‚   └── runnable_agent_batch/           # Environment setup
β”œβ”€β”€ rule_based_filter_scripts/   # πŸ“‹ Repository filtering tools
β”œβ”€β”€ download_repos_scripts/      # πŸ“₯ Data collection utilities
β”œβ”€β”€ docker/                      # 🐳 Containerization setup
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── docker-compose.yml
└── assets/                      # πŸ“Š Paper figures and resources

πŸ“Š Expected Directory Structure (After Dataset Download)

After downloading the dataset, your /workspace directory should look like:

/workspace/
β”œβ”€β”€ source_projects/             # Original source code repositories
β”‚   β”œβ”€β”€ Python/
β”‚   β”œβ”€β”€ Java/
β”‚   β”œβ”€β”€ C++/
β”‚   └── ...
β”œβ”€β”€ target_projects/             # Target translation projects with tests
β”‚   β”œβ”€β”€ projects_summary.jsonl  # Project metadata
β”‚   β”œβ”€β”€ Python/
β”‚   β”‚   β”œβ”€β”€ Java/
β”‚   β”‚   β”‚   β”œβ”€β”€ project1/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ run_tests.sh
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ public_tests/
β”‚   β”‚   β”‚   β”‚   └── original_tests/
β”‚   β”‚   β”‚   └── project2/
β”‚   β”‚   └── C++/
β”‚   └── Java/
β”‚       └── Python/
└── translated_projects/         # Generated translations (created during execution)
    └── claude-sonnet-4-20250514/
        β”œβ”€β”€ Python/
        β”‚   └── Java/
        └── Java/
            └── Python/

πŸ† Leaderboard

We welcome submissions to our leaderboard! Submit your results via GitHub Issues.

Rank Method Model Success Rate Paper/Code
1 RepoTransAgent Claude-4 32.8% [This work]
2 RepoTransAgent GPT-4.1 32.8% [This work]
3 RepoTransAgent DeepSeek 22.5% [This work]

πŸ“„ Citation

If you use RepoTransBench in your research, please cite our paper:

@article{repotransbench2024,
  title={RepoTransBench: A Real-World Multilingual Benchmark for Repository-Level Code Translation},
  author={Wang, Yanli and Wang, Yanlin and Wang, Suiquan and Guo, Daya and Chen, Jiachi and Grundy, John and Liu, Xilin and Ma, Yuchi and Mao, Mingzhi and Zhang, Hongyu and Zheng, Zibin},
  journal={arXiv preprint arXiv:2024.xxxxx},
  year={2024}
}

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Ways to Contribute

  • πŸ› Report bugs and issues
  • πŸ’‘ Suggest new features or translation pairs
  • πŸ“ Improve documentation
  • πŸ§ͺ Add new evaluation methods
  • πŸ”„ Submit translation results

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Thanks to all contributors who helped build this benchmark
  • Special thanks to the open-source community for providing repositories
  • Supported by Sun Yat-sen University, Monash University, Huawei Cloud, and Chongqing University

πŸ“ž Contact

For questions or collaboration opportunities:


⭐ Star this repository if you find it useful! ⭐

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published