The aim of this project was to reproduce the following research paper:
C. Pornprasit; C. Kla Tantithamthavorn, DeepLineDP: Towards a Deep Learning Approach for Line-Level Defect Prediction (2023).
describe the necessary steps that we took throughout the proces and finally try to improve on the already achieved results.
Another aim of the project was to document the whole process, plan and share responsibilities, therefore we used following tools (links to actual projects):
Original repository for research paper was separated into two:
- supplementary materials (scripts for training models) - the original from awsm-research/DeepLineDP was pasted into DeepLineDP folder.
- database - original from awsm-research/line-level-defect-prediction was pasted into DeepLineDP/datasets folder.
We decided to merge two repositories in order to make reproduction easier.
Since models require CUDA to be able to run and not all computers can have it installed, we decided to do the reproduction on Google colab.
For this reason there are a couple of steps required to do before reproduction itself.
-
Upload folder to drive to the main catalog (for this example the folder is called M6).
-
Go to uploaded folder and find reproduction.ipynb script. Choose Open with > Google Colaboratory option.
-
Change runtime type to GPU Change runtime type -> GPU -> Save.
Overview of the whole process:
All those steps have been described in reproduction.ipynb, however most notable remarks to keep in mind are:
- When mounting Google Drive make sure you followed all the popup instructions and followed the setup correctly.
At the end setup should look somewhat like this:
- Not all lines need to be run each time, however all
pip install
commands have to be run at the beginning of each session. - The file is uploaded without cleared output, so that it is easier to recognize whether cell ran correctly (outputs should be similar).
- To check reporoduction with applied changes go to /content/drive/MyDrive/M6/DeepLineDP/script/preprocess_data.py and change flag
ignore_imports
toTrue
file-Effort@Top20Recall (↘) | file-Recall@Top20LOC (↗) | file-IFA (↘) |
---|---|---|
![]() |
![]() |
![]() |
file-Effort@Top20Recall (↘) | file-Recall@Top20LOC (↗) | file-IFA (↘) |
---|---|---|
![]() |
![]() |
![]() |
In order to set a specific type of change it is needed to set specific flags. In DeepLineDP_model.py and in preprocess_data.py there are following flags:
Flag | Function |
---|---|
ignore_imports | Changes all import lines to comment containing #import |
replace_exceptions | Replaces all *Exception classes to Exception |
remove_public_keyword | Removes keyword public |
remove_final_keyword | Removes keyword final |
normalize_names | Removes whitespaces before and after each line |
remove_duplication_line | Removes all duplicated lines (that have already appeared) |
add_hidden_layer | Adds one hidden FC layer |
Each flag has a default value False
therefore in order to tes a certain type of change
it is necessary before running reproduction script to change chosen flag(s) to true
.
Original | Exceptions replaced | |
---|---|---|
↘ | ![]() |
![]() |
↗ | ![]() |
![]() |
↘ | ![]() |
![]() |
Original | Imports replaced with comment | |
---|---|---|
↘ | ![]() |
![]() |
↗ | ![]() |
![]() |
↘ | ![]() |
![]() |
Original | Public remove | |
---|---|---|
↘ | ![]() |
![]() |
↗ | ![]() |
![]() |
↘ | ![]() |
![]() |
Original | Final remove | |
---|---|---|
↘ | ![]() |
![]() |
↗ | ![]() |
![]() |
↘ | ![]() |
![]() |
Hidden layer added
Original | Hidden layer added | |
---|---|---|
↘ | ![]() |
![]() |
↗ | ![]() |
![]() |
↘ | ![]() |
![]() |
Original | Duplicate line remove | |
---|---|---|
↘ | ![]() |
![]() |
↗ | ![]() |
![]() |
↘ | ![]() |
![]() |
↘ | ![]() |
![]() |
↗ | ![]() |
![]() |
↘ | ![]() |
![]() |
↘ | ![]() |
![]() |
↗ | ![]() |
![]() |
↘ | ![]() |
![]() |