Repository for the PhD assignment of course DD3431 at KTH.
This PhD project aims to apply the fundamental learning of the DD3431 Machine Learning course on the PhD topic of the author.
In this case, the project will be directly related to smart contract security.
Using the dataset smartbugs-curated and the recent-published exploit for the dataset sb-heist, this experiment will compare the performance of two models trained with different configurations of the same dataset for vulnerable line identification.
Mix Model (MM):
- Dataset: smartbugs-curated
- Labels: smartbugs-curated
Pure Model (PM):
- Dataset: smartbugs-curated
- Labels: smartbugs-curated x sb-heists (exploits)
Each model will aim to identify vulnerable lines of code.
MM: will train using the labeles from the smartgbugs-curated dataset.
PM: will train on lines of code that are label as vulnerable if they are reported as vulnerable in the smartbugs-curated dataset and the contract has been listed as exploitable by sb-heists.
Both MM and PM will be test with each others dataset to assess the performance of the model when trained with pure True Positive vulnerabilities.
Accuracy
Dataset | MM | PM | ||||
---|---|---|---|---|---|---|
Accuracy | Precision | Recall | Accuracy | Precision | Recall | |
Raw smartbugs-curated | 0.887 | 0.84 | 0.954 | 0.426 | 0.44 | 0.659 |
Exploitable smartbugs-curated | 0.42 | 0.518 | 0.518 | 0.844 | 0.833 | 0.925 |
- Clone the repository:
git clone https://github.com/yourusername/DD3431-PhD-Task.git
- Change to the repository directory:
cd DD3431-PhD-Task
- Install the required dependencies using Poetry:
poetry install
Thanks to @vivi365 for her key advise on ML for vulnerability detection.