Skip to content

Scripts for a simulation on the effect of dataset size on neural network performance for active learning applied to systematic reviewing

License

Notifications You must be signed in to change notification settings

govertv/asreview-study-nn-sample-size

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code repository for: 'The effect of dataset size on neural network performance within systematic reviewing'

DOI

A repository of code accompanying a study into the effect of dataset size on neural network performance within systematic reviewing. The code here can be used to reproduce the simulation study included in the study. In the simulation study, the systematic review process contained in ASReview is applied to dataset samples of different sizes, using a neural network classifier. The results here were generated using ASReview v0.17.

Installation

Running this simulation study requires Python 3.6+. After installing Python, ASReview can be installed using

pip install asreview

Gensim is also required to run the simulation, it can be installed with

pip install --upgrade gensim

Data

Three different systematic review datasets were used to perform the simulation study:

  • Nudging - Systematic review study performed by Nagtegaal et al. on nudging healthcare professionals towards evidence based medicine: Dataset - Paper
  • Software - Systematic review study performed by Hall et al. on software fault detection: Dataset - Paper
  • Depression - Systematic review study performed by Brouwer et al. on depressive relapse: Dataset - Paper

Smaller datasets were sampled out of the original datasets, the samples used in the simulation study are included in this repository. The full datasets are not included here, but can be obtained from the links above.

How to use

Data preprocessing:

This section can be skipped if the dataset samples included in this repository are used

The original datasets should be placed in the data folder, and the files named 'Brouwer_2019.csv', 'Hall_2012.csv' and 'Nagtegaal_2019.csv'

Then run the data_generation notebook contained in the scripts folder to generate the samples out of the original dataset.

Running the simulation

The commands needed to run the simulation are all included within the jobs.sh file, running this file will perform the full simulation. Warning: running the full simulation can take multiple days. The simulation process can be safely interrupted by using the keyboard interrupt and can be resumed by running jobs.sh again.

Simulation outcomes

The metrics used to evaluate the simulation outcome are written by the shell script to the tables folder (contained in output). Plots for visual analysis can be generated by running the results notebook in the scripts folder.

License

The scripts in this repository are MIT licensed.

Contact

[email protected]

About

Scripts for a simulation on the effect of dataset size on neural network performance for active learning applied to systematic reviewing

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published