PBR23M7
Group members: Nikodem Kropielnicki, Bogusława Tlołka, Joanna Wdziękońska
Project based on VulCurator: A Vulnerability-Fixing Commit Detector
Team Policies and Team Expectations Agreement
Progress is tracked in Github Projects
You can do it manualy (instruction below) or via GoogleColab (by running commands in sequence) Results can be found after training each classifier as the F1 score presented in epoch 19 (for message, patch and issue) and as the F1 score presented at the end for ensemble.
Note: Some files were too big to put them on Github, they are in the missing files link above (on GoogleColab there is a section which uploads them to project)
For Tensorflow dataset:
To train message classifier:
python message_classifier.py --dataset_path tf_vuln_dataset.csv --model_path model/tf_message_classifier.sav
To train issue classifier:
python issue_classifier.py --dataset_path tf_vuln_dataset.csv --model_path model/tf_issue_classifier.sav
To finetune CodeBERT for patch classifier:
python vulfixminer_finetune.py --dataset_path tf_vuln_dataset.csv --finetune_model_path model/tf_patch_vulfixminer_finetuned_model.sav
To train patch classifier:
python vulfixminer.py --dataset_path tf_vuln_dataset.csv --model_path model/tf_patch_vulfixminer.sav --finetune_model_path model/tf_patch_vulfixminer_finetuned_model.sav --train_prob_path probs/tf_patch_vulfixminer_train_prob.txt --test_prob_path probs/tf_patch_vulfixminer_test_prob.txt
To run ensemble classifier:
python variant_ensemble.py --config_file tf_dataset.conf
Similarly, for SAP dataset (some classifiers about 1,5h on GoogleColab!):
To train message classifier:
python message_classifier.py --dataset_path sub_enhanced_dataset_th_100.txt --model_path model/sap_message_classifier.sav
To train issue classifier:
python issue_classifier.py --dataset_path sub_enhanced_dataset_th_100.txt --model_path model/sap_issue_classifier.sav
To finetune CodeBERT for patch classifier:
python vulfixminer_finetune.py --dataset_path sap_patch_dataset.csv --finetune_model_path model/sap_patch_vulfixminer_finetuned_model.sav
To train patch classifier:
python vulfixminer.py --dataset_path sap_patch_dataset.csv --model_path model/sap_patch_vulfixminer.sav --finetune_model_path model/sap_patch_vulfixminer_finetuned_model.sav --train_prob_path probs/sap_patch_vulfixminer_train_prob.txt --test_prob_path probs/sap_patch_vulfixminer_test_prob.txt
To run ensemble classifier:
python variant_ensemble.py --config_file sap_dataset.conf
For MSR dataset:
To train message classifier:
python message_classifier.py --dataset_path partycje.json --model_path model/msr_message_classifier.sav
To train issue classifier:
python issue_classifier.py --dataset_path partycje.json --model_path model/msr_issue_classifier.sav
To finetune CodeBERT for patch classifier:
python vulfixminer_finetune.py --dataset_path partycje.json --finetune_model_path model/msr_patch_vulfixminer_finetuned_model.sav
To train patch classifier:
python vulfixminer.py --dataset_path partycje.json --model_path model/msr_patch_vulfixminer.sav --finetune_model_path model/msr_patch_vulfixminer_finetuned_model.sav --train_prob_path probs/msr_patch_vulfixminer_train_prob.txt --test_prob_path probs/msr_patch_vulfixminer_test_prob.txt
To run ensemble classifier:
python variant_ensemble.py --config_file msr_dataset.conf