CC2Vec: Distributed Representations of Code Changes [pdf]
Questions and discussion are welcome: [email protected]
Please install the neccessary libraries before running our tool:
- python==3.6.9
- torch==1.2.0
- tqdm==4.46.1
- nltk==3.4.5
- numpy==1.16.5
- scikit-learn==0.22.1
Please following the link below to download the data and pretrained models of our paper.
After downloading, simply copy the data and model folders to CC2Vec folder.
We have a number of different parameters (Note that the number of hyperparameters is different depends on different tasks)
- --embedding_dim: Dimension of embedding vectors.
- --filter_sizes: Sizes of filters used by the hierarchical attention layers.
- --num_filters: Number of filters.
- --hidden_layers: Number of hidden layers.
- --dropout_keep_prob: Dropout for training cc2vec.
- --l2_reg_lambda: Regularization rate.
- --learning_rate: Learning rate.
- --batch_size: Batch size.
- --num_epochs: Number of epochs.
-
In the first task (log message generation), simply run this command to train our model:
$ python lmg_cc2ftr.py -train -train_data [path of our training data] -dictionary_data [path of our dictionary data]
-
The command will create a folder snapshot used to save our model. To extract the code change features, please follow this command:
$ python lmg_cc2ftr.py -predict -pred_data [path of our data] -dictionary_data [path of our dictionary data] -load_model [path of our model] -name [name of our output file]
-
To evaluation the first task, please run this command:
$ python lmg_eval.py -train_data [path of our training data] -test_data [path of our testing data] -train_cc2ftr_data [path of our code changes features extracted from training data] -test_cc2ftr_data [path of our code changes features extracted from testing data]
-
Note that we need the training and testing dataset for this task. Please run this command to train our model:
$ python bfp_cc2ftr.py -train -train_data [path of our training data] -test_data [path of our training data] -dictionary_data [path of our dictionary data]
-
Similar to the first task, the command will create a folder snapshot used to save our model. To extract the code change features, please follow this command:
$ python bfp_cc2ftr.py -predict -predict_data [path of our data] -dictionary_data [path of our dictionary data] -load_model [path of our model] -name [name of our output file]
-
To train the model for bug fixing patch identification, please follow this command:
$ python bfp_PNExtended.py -train -train_data [path of our data] -train_data_cc2ftr [path of our code changes features extracted from training data] -dictionary_data [path of our dictionary data]
-
To evaluate the model for bug fixing patch identification, please follow this command:
$ python bfp_PNExtended.py -predict -pred_data [path of our data] -pred_data_cc2ftr [path of our code changes features extracted from our data] -dictionary_data [path of our dictionary data] -load_model [path of our model]
-
For each dataset in just-in-time defect prediction (qt or openstack), we create two variants: one for training code changes features ('.pkl'), the other one for training just-in-time defect prediction model (end with '_dextend.pkl').
-
Please run this command to train the code changes features:
$ python jit_cc2ftr.py -train -train_data [path of our training data] -test_data [path of our training data] -dictionary_data [path of our dictionary data]
-
Similar to the second task, the command will create a folder snapshot used to save our model. To extract the code change features, please follow this command:
$ python jit_cc2ftr.py -predict -predict_data [path of our data] -dictionary_data [path of our dictionary data] -load_model [path of our model] -name [name of our output file]
-
To train the model for just-in-time defect prediction, please follow this command:
$ python jit_DExtended.py -train -train_data [path of our data] -train_data_cc2ftr [path of our code changes features extracted from training data] -dictionary_data [path of our dictionary data]
-
To evaluate the model for just-in-time defect prediction, please follow this command:
$ python jit_DExtended.py -predict -pred_data [path of our data] -pred_data_cc2ftr [path of our code changes features extracted from our data] -dictionary_data [path of our dictionary data] -load_model [path of our model]
Questions and discussion are welcome: [email protected]