This repository is the accompanying repository for our paper "Why My Code Summarization Model Does Not Work: Code Comment Improvement with Category Prediction"
https://dl.acm.org/doi/abs/10.1145/3434280
The paper is accessed in pdf
* Python3.7+
* Java 8+
-
For Python requirements, we list Python package dependency in
requirements.txt
. Note that the environment is based on Anaconda. -
For Java requirements, we adopt Maven to build the projects. The core component is JavaParser. Just add the following to your maven configuration.
<dependency>
<groupId>com.github.javaparser</groupId>
<artifactId>javaparser-symbol-solver-core</artifactId>
<version>3.16.1</version>
</dependency>
-
We use the dataset provided by DeepCom for constructing code summarization models.
-
We provide the annotated dataset that our work produced in
dataset
folder.
When training models, we use the code provided by CodeNN, Code2Seq, DeepCom, and NNGen.
These projects are all open-sourced.
Please note that we re-design the pre-processing phase of Code2Seq to perform code summarization experiments. Please refer to this issue.
We provide source code of comment category prediction. Please locate ccpm.py
and run as instructions.
- Leverage the trained model to generate comments.
- Set the path of the generated comments of each model.
- Run the commands.
python ccpm.py classify --input_file --cross_validation
python ccpm.py evaluate --input_file --output_file
We evaluate the BLEU and ROUGE using the package nmt-eval.
Running
nlg-eval --hypothesis=examples/generated_comments.txt --references=examples/reference_comments.txt
If you use our dataset or you are inspired by our work, please consider citing our paper:
@article{10.1145/3434280,
author = {Chen, Qiuyuan and Xia, Xin and Hu, Han and Lo, David and Li, Shanping},
title = {Why My Code Summarization Model Does Not Work: Code Comment Improvement with Category Prediction},
year = {2021},
issue_date = {February 2021},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {30},
number = {2},
issn = {1049-331X},
url = {https://doi.org/10.1145/3434280},
doi = {10.1145/3434280},
month = feb,
articleno = {25},
numpages = {29},
keywords = {comment classification, code comment, Code summarization}
}
Thanks!
[1] Qiuyuan Chen, Xin Xia, Han Hu, David Lo, and Shanping Li, “Why My Code Summarization Model Does Not Work: Code Comment Improvement with Category Prediction,” ACM Trans. Softw. Eng. Methodol. (TOSEM), vol. 30, no. 2, pp. 1–29, 2021, DOI: https://doi.org/10.1145/3434280.