Skip to content

pwr-pbr23/M6

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reproduction of DeepLineDP

The aim of this project was to reproduce the following research paper:

C. Pornprasit; C. Kla Tantithamthavorn, DeepLineDP: Towards a Deep Learning Approach for Line-Level Defect Prediction (2023).

describe the necessary steps that we took throughout the proces and finally try to improve on the already achieved results.

Project links

Another aim of the project was to document the whole process, plan and share responsibilities, therefore we used following tools (links to actual projects):

overleaf.png Overleaf
trello.png Trello

Authors

github.png Kamila Sproska
github.png Dominik Polak


Reproduction

Our approach towards extending the original repository

Original repository for research paper was separated into two:

We decided to merge two repositories in order to make reproduction easier.

Preparation for reproduction

Since models require CUDA to be able to run and not all computers can have it installed, we decided to do the reproduction on Google colab.
For this reason there are a couple of steps required to do before reproduction itself.

  1. Download this repository using Download ZIP option.
    github-download-zip.png

  2. Upload folder to drive to the main catalog (for this example the folder is called M6).
    google-drive-placement.png

  3. Go to uploaded folder and find reproduction.ipynb script. Choose Open with > Google Colaboratory option.
    open-reproduction-script.png

  4. Change runtime type to GPU Change runtime type -> GPU -> Save.
    change-runtme-menu.png change-to-GPU.png

Running reproduction script

Overview of the whole process:

overview.png

All those steps have been described in reproduction.ipynb, however most notable remarks to keep in mind are:

  • When mounting Google Drive make sure you followed all the popup instructions and followed the setup correctly. At the end setup should look somewhat like this:
    google-collab-setup.png
  • Not all lines need to be run each time, however all pip install commands have to be run at the beginning of each session.
  • The file is uploaded without cleared output, so that it is easier to recognize whether cell ran correctly (outputs should be similar).
  • To check reporoduction with applied changes go to /content/drive/MyDrive/M6/DeepLineDP/script/preprocess_data.py and change flag ignore_imports to True

Results of the reproduction

For all databases

file-Effort@Top20Recall (↘) file-Recall@Top20LOC (↗) file-IFA (↘)

For activemq

file-Effort@Top20Recall (↘) file-Recall@Top20LOC (↗) file-IFA (↘)

Improvements

In order to set a specific type of change it is needed to set specific flags. In DeepLineDP_model.py and in preprocess_data.py there are following flags:

Flag Function
ignore_imports Changes all import lines to comment containing #import
replace_exceptions Replaces all *Exception classes to Exception
remove_public_keyword Removes keyword public
remove_final_keyword Removes keyword final
normalize_names Removes whitespaces before and after each line
remove_duplication_line Removes all duplicated lines (that have already appeared)
add_hidden_layer Adds one hidden FC layer

Each flag has a default value False therefore in order to tes a certain type of change it is necessary before running reproduction script to change chosen flag(s) to true.

Exceptions replaced

Original Exceptions replaced

Imports replaced with comment

Original Imports replaced with comment

Public remove

Original Public remove

Final remove

Original Final remove

Hidden layer added

Original Hidden layer added

Duplicate line remove

Original Duplicate line remove

Result for additional metrics

Result for not preprocessed data