Why My Code Summarization Model Does Not Work: Code Comment Improvement with Category Prediction.

Published on TOSEM (CCF Rank A)

This repository is the accompanying repository for our paper "Why My Code Summarization Model Does Not Work: Code Comment Improvement with Category Prediction"

https://dl.acm.org/doi/abs/10.1145/3434280

The paper is accessed in pdf

Requirements

* Python3.7+
* Java 8+

For Python requirements, we list Python package dependency in requirements.txt. Note that the environment is based on Anaconda.
For Java requirements, we adopt Maven to build the projects. The core component is JavaParser. Just add the following to your maven configuration.

<dependency>
    <groupId>com.github.javaparser</groupId>
    <artifactId>javaparser-symbol-solver-core</artifactId>
    <version>3.16.1</version>
</dependency>

Dataset

We use the dataset provided by DeepCom for constructing code summarization models.
We provide the annotated dataset that our work produced in dataset folder.

Reproducing

Train the code summarization models

When training models, we use the code provided by CodeNN, Code2Seq, DeepCom, and NNGen.

These projects are all open-sourced.

Please note that we re-design the pre-processing phase of Code2Seq to perform code summarization experiments. Please refer to this issue.

Comment category prediction

We provide source code of comment category prediction. Please locate ccpm.py and run as instructions.

Leverage the trained model to generate comments.
Set the path of the generated comments of each model.
Run the commands.

python ccpm.py classify --input_file --cross_validation
python ccpm.py evaluate --input_file --output_file

Evaluation

We evaluate the BLEU and ROUGE using the package nmt-eval.

Running

nlg-eval --hypothesis=examples/generated_comments.txt --references=examples/reference_comments.txt

Citation

If you use our dataset or you are inspired by our work, please consider citing our paper:

@article{10.1145/3434280,
author = {Chen, Qiuyuan and Xia, Xin and Hu, Han and Lo, David and Li, Shanping},
title = {Why My Code Summarization Model Does Not Work: Code Comment Improvement with Category Prediction},
year = {2021},
issue_date = {February 2021},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {30},
number = {2},
issn = {1049-331X},
url = {https://doi.org/10.1145/3434280},
doi = {10.1145/3434280},
month = feb,
articleno = {25},
numpages = {29},
keywords = {comment classification, code comment, Code summarization}
}

Thanks!

Reference

[1] Qiuyuan Chen, Xin Xia, Han Hu, David Lo, and Shanping Li, “Why My Code Summarization Model Does Not Work: Code Comment Improvement with Category Prediction,” ACM Trans. Softw. Eng. Methodol. (TOSEM), vol. 30, no. 2, pp. 1–29, 2021, DOI: https://doi.org/10.1145/3434280.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
dataset		dataset
src/main		src/main
.classpath		.classpath
.gitignore		.gitignore
.gitignore copy		.gitignore copy
.project		.project
README.MD		README.MD
pom.xml		pom.xml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why My Code Summarization Model Does Not Work: Code Comment Improvement with Category Prediction.

Published on TOSEM (CCF Rank A)

Requirements

Dataset

Reproducing

Train the code summarization models

Comment category prediction

Evaluation

Citation

Reference

About

Releases

Packages

Languages

chenqiuyuan/TOSEM_CodeSum

Folders and files

Latest commit

History

Repository files navigation

Why My Code Summarization Model Does Not Work: Code Comment Improvement with Category Prediction.

Published on TOSEM (CCF Rank A)

Requirements

Dataset

Reproducing

Train the code summarization models

Comment category prediction

Evaluation

Citation

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages