Multilingual Code Co-Evolution Using Large Language Models

This repo hosts the code and data for the following FSE 2023 paper:

Title: Multilingual Code Co-Evolution Using Large Language Models

Authors: Jiyang Zhang, Pengyu Nie, Junyi Jessy Li, Milos Gligoric

@inproceedings{ZhangETAL23Codeditor,
  author = {Zhang, Jiyang and Nie, Pengyu and Li, Junyi Jessy and Gligoric, Milos},
  title = {Multilingual Code Co-Evolution Using Large Language Models},
  booktitle = {Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
  year = {2023},
}

News

May 2024 The fine-tuned EditsTranslation model is released on 🤗 ! 🔥cs2java and java2cs

How to Use

from transformers import T5ForConditionalGeneration, AutoTokenizer

checkpoint = "EngineeringSoftware/EditsTranlation-java2cs"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = T5ForConditionalGeneration.from_pretrained(checkpoint)

code_input = """class HelloWorld { public static void main(String[] args) { System.out.println("Hello, World!")"""

input_ids = tokenizer(code_input, return_tensors="pt").input_ids
generated_ids = model.generate(input_ids, max_length=200)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
# output: <INSERT>; } } ;<INSERT_END> class HelloWorld { public static void main(String[] args) { System.out.println("Hello, World!") ; } } ;

Introduction

This repo contains the code and artifacts for reproducing the experiments in Multilingual Code Co-Evolution Using Large Language Models. In this work, we introduce Codeditor for co-evolving software implemented in multiple programming languages.

The code includes:

scripts for processing dataset
scripts for training and evaluating codeditor models

The artifacts include:

Java to C# raw paired changes
Java to C# translation dataset processed for codeditor models

Data Downloads

All our data is hosted on UTBox via a shared folder.

Code for Processing Fine-tuning Data

We provide the sample script to process the datasets for edit-translation. Requires the raw data files at raw_data/.

cd python/
python -m deltr.collector.DataProcessor edit_translation_data_process --exp cs2java --src_lang cs --tgt_lang java

Code for Training and Evaluating Models

Train ML models

cd python/
python -m deltr.coditT5.CodeT5 fit --exp_dir {MODELS_DIR}/${model_name}/${dataset} --data.dataset {dataset} --data.model ${model_name} --config  configs/coditT5.yaml

# Example: python -m deltr.coditT5.CodeT5 fit --exp_dir models/edit-translation/java2cs --data.dataset java2cs --data.model edit-translation --config  configs/coditT5.yaml

Results are generated to models/${model}/${dataset}/, where:

model/: stores the trained model.
logs/: stores logs during training.

Run ML models to do inference

Requires the dataset at data/${model}/${dataset}/, the trained model at models/${model}/${dataset}/model/.

cd python/
python -m deltr.coditT5.CodeT5 predict --exp_dir {MODELS_DIR}/${model_name}/${dataset} --data.dataset {dataset} --data.model ${model_name} --config  configs/coditT5.yaml

Results are generated to models/${model}/${dataset}/, where:

output.hyp: the predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
python		python
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multilingual Code Co-Evolution Using Large Language Models

News

How to Use

Introduction

Data Downloads

Code for Processing Fine-tuning Data

Code for Training and Evaluating Models

Train ML models

Run ML models to do inference

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

EngineeringSoftware/codeditor

Folders and files

Latest commit

History

Repository files navigation

Multilingual Code Co-Evolution Using Large Language Models

News

How to Use

Introduction

Data Downloads

Code for Processing Fine-tuning Data

Code for Training and Evaluating Models

Train ML models

Run ML models to do inference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages