Reproducing "Identification of COVID-19 samples from chest X-Ray images using deep learning: A comparison of transfer learning approaches"

In this series of notebooks we reproduce a result published in

Identification of COVID-19 samples from chest X-Ray images using deep learning: A comparison of transfer learning approaches [1]

Reproducing “Identification of COVID-19 samples from chest X-Ray images using deep learning: A comparison of transfer learning approaches”

In Reproducing_Original_Result, we reproduce the results obtained using the VGG19 model and achieve an accuracy of 92% on the test set. However, as noted in [2], a significant demographic inconsistency exists: normal and pneumonia chest X-ray images are from pediatric patients, while COVID-19 chest X-ray images are from adults. This allows the model to achieve high accuracy by learning features that are not clinically relevant.

Exploring ConvNet Activations

In Exploring_ConvNet_Activations, we explore how a model can learn illegitimate features using a small dataset of wolf and husky images. The model achieves 90% accuracy, but we reveal that this performance is due to a data leakage issue: all wolf images have snow backgrounds, while husky images have grass backgrounds. This enables the model to simply distinguish between white (snow) and green (grass) backgrounds to make predictions. To prove this, we test the model on a new dataset where the backgrounds are swapped (dogs with snow, wolves with grass). The model's accuracy drops to 0%, confirming it was indeed using background cues for classification. We provide GradCAM heatmaps to visualize pixel attributions, further illustrating the model's focus on background rather than animal features. Then, we train a new model on a dataset where both wolf and husky images have white backgrounds and achieve an accuracy of 70%. This shows that the accuracy obtained earlier was an overly optimistic measure due to data leakage.

Reproducing “Identification of COVID-19 samples from chest X-Ray images using deep learning: A comparison of transfer learning approaches”

In Correcting_Original_Result, we reproduce the results obtained using the VGG19 model, but with a key change: we use datasets containing adult chest X-ray images. This time, the model achieves an accuracy of 51%, a 40% drop from the earlier results, confirming that the metrics reported in the paper were overly optimistic due to data leakage, where the model learned illegitimate features.

Running the project

Google Colab

Click on the "Open in Colab" buttons above to run the notebooks in Google Colab.

Local Machine

Clone the repository:

$ git clone https://github.com/shaivimalik/medicine_preprocessing-on-entire-dataset.git
$ cd medicine_preprocessing-on-entire-dataset

Install the required dependencies:
```
$ pip install -r requirements.txt
```
Launch Jupyter Notebook:
```
$ jupyter notebook
```

Chameleon

You can run these notebooks on Chameleon using the Chameleon Jupyter environment.

Acknowledgements

This project was part of the 2024 Summer of Reproducibility organized by the UC Santa Cruz Open Source Program Office.

Contributor: Shaivi Malik
Mentors: Fraida Fund, Mohamed Saeed

References

[1]: Rahaman, Md Mamunur et al. “Identification of COVID-19 samples from chest X-Ray images using deep learning: A comparison of transfer learning approaches.” Journal of X-ray science and technology vol. 28,5 (2020): 821-839. doi:10.3233/XST-200715

[2]: Roberts, M., Driggs, D., Thorpe, M. et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell 3, 199–217 (2021). https://doi.org/10.1038/s42256-021-00307-0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Reproducing "Identification of COVID-19 samples from chest X-Ray images using deep learning: A comparison of transfer learning approaches"

Running the project

Google Colab

Local Machine

Chameleon

Acknowledgements

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Reproducing "Identification of COVID-19 samples from chest X-Ray images using deep learning: A comparison of transfer learning approaches"

Running the project

Google Colab

Local Machine

Chameleon

Acknowledgements

References