Code Comment Inconsistency Detection Based on Confidence Learning

In just three simple steps, MCCL can detect the inconsistency between code and comment.

Before that, let's do some preparatory work:

Download data from here. Download additional model resources from here. Edit configurations in constants.py to specify data, resource, and output locations.

Then

(1) Obtain Out of Sample Prediction Probability (MC):

Choose HYBRID to get probability

python run_comment_model.py --attend_code_sequence_states --attend_code_graph_states --features --model_path=detect_attend_code_sequence_states_attend_code_graph_states_features.pkl.gz --model_name=detect_attend_code_sequence_states_attend_code_graph_states_features

python run_comment_model.py --attend_code_sequence_states --attend_code_graph_states --features --model_path=detect_attend_code_sequence_states_attend_code_graph_states_features.pkl.gz --model_name=detect_attend_code_sequence_states_attend_code_graph_states_features --test_mode

After executing the above instructions in turn, we get the results without cleaning the dataset and the out of sample prediction probability that CL component needed. The predicted probability and data label are stored in DETECTION_DIR modified in constants.py.

(2) Clean the Dataset (CL):

Choose training set and threshold=0.5 as an example

python CL.py --attend_code_sequence_states --attend_code_graph_states --features --model_path=detect_attend_code_sequence_states_attend_code_graph_states_features.pkl.gz --model_name=detect_attend_code_sequence_states_attend_code_graph_states_features --dataset=train --threshold=0.5

We can modify the parameters as needed. And pruned data is generated in DATA_PATH.

(3) Predict the Inconsistency (MCCL):

Don't forget to change PARTITIONS and train_examples or valid_examples in data_loader.py to the name of data file generated in DATA_PATH beforehand.

We can choose any of the following methods to predict the results.

SEQ(C, M_edit) + features

python run_comment_model.py --attend_code_sequence_states --features --model_path=detect_attend_code_sequence_states_features_cl.pkl.gz --model_name=detect_attend_code_sequence_states_features_cl

python run_comment_model.py --attend_code_sequence_states --features --model_path=detect_attend_code_sequence_states_features_cl.pkl.gz --model_name=detect_attend_code_sequence_states_features_cl --test_mode

GRAPH(C, T_edit) + features (The GGNN used for this approach is derived from here.)

python run_comment_model.py --attend_code_graph_states --features --model_path=detect_attend_code_graph_states_features_cl.pkl.gz --model_name=detect_attend_code_graph_states_features_cl

python run_comment_model.py --attend_code_graph_states --features --model_path=detect_attend_code_graph_states_features_cl.pkl.gz --model_name=detect_attend_code_graph_states_features_cl --test_mode

HYBRID(C, M_edit, T_edit) + features

python run_comment_model.py --attend_code_sequence_states --attend_code_graph_states --features --model_path=detect_attend_code_sequence_states_attend_code_graph_states_features_cl.pkl.gz --model_name=detect_attend_code_sequence_states_attend_code_graph_states_features_cl

python run_comment_model.py --attend_code_sequence_states --attend_code_graph_states --features --model_path=detect_attend_code_sequence_states_attend_code_graph_states_features_cl.pkl.gz --model_name=detect_attend_code_sequence_states_attend_code_graph_states_features_cl --test_mode

Finally

Similarly, predicting Param and Summary simply requires repeating the above steps. Remember to modify comment_type_str in get_data_splits() and load_cleaned_test_set() function in data_loader.py.

If you need to verify the overall performance of the model, just change Return, Param, or Summary to None.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
comment_update		comment_update
data_processing		data_processing
dpu_utils		dpu_utils
gleu		gleu
CL.py		CL.py
README.md		README.md
__init__.py		__init__.py
ast_graph_encoder.py		ast_graph_encoder.py
constants.py		constants.py
data_loader.py		data_loader.py
data_utils.py		data_utils.py
detection_evaluation_utils.py		detection_evaluation_utils.py
detection_module.py		detection_module.py
diff_utils.py		diff_utils.py
encoder.py		encoder.py
gnn.py		gnn.py
module_manager.py		module_manager.py
run_comment_model.py		run_comment_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Comment Inconsistency Detection Based on Confidence Learning

About

Releases

Packages

Languages

mrhuggins03/MCCL

Folders and files

Latest commit

History

Repository files navigation

Code Comment Inconsistency Detection Based on Confidence Learning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages