CM2003_COVID19CT_Project

Depository containing our codes and results for the COVID-19 CT classification project

COVID-CT challenge

The challenge is a classification task of covid patients using CT scans.

Data Presentation:

The dataset is composed of COVID and NonCOVID images. Some images come from article and some directly from CT. The images extracted from the article have a poorer resolution but radiologists have confirmed the utility of these images. The link for the data is : https://github.com/UCSD-AI4H/COVID-CT

The image files are in Images-processed/CT_COVID.zip and Images-processed/CT_NonCOVID.zip

Besides, the splits are given by the author of the challenge in .txt files present in Data-split/COVID and Data-split/NonCOVID

trainCT_COVID, testCT_COVID, valCT_COVID,
trainCT_NonCOVID, testCT_NonCOVID, valCT_NonCOVID

There are 425 images in the train set and 203 in the validation set.

How do we load the data ?

Recover the txt files to make lists of path to load images
Load the images with cv2 and normalize them

Models and results :

The article proposed to use different models, DenseNet169 or ResNet50. We decided to implement both to see the difference of performance between both models. We try also smaller models with less layers or with a smaller number of filters in the Convolution layers. The initial parameters are resumed in the following table and will be used in the next attempts.

Epochs	Batch size	Filter number	Learning rate	Loss function	Metrics
100	16	32	10^-5	binary cross entropy	binary accuracy

The results show that all the models learn pretty well with the train set but the results with the validation set are less conclusive. It seems like the models don't see any benefits in the validation set. Maybe it learns too specific features during the training and then struggle to generalize for the validation set and the test set.

DenseNet169 learning curves

ResNet50 learning curves

The different steps we have followed to make the results better are the following :

Use data augmentation on the training set because there are not enough images in the dataset.
Use smaller models such as Vgg16 or Alexnet model with 5 layers that we have used before in the labs for classification tasks.
Switch the training set with the test set or the validation set to see if there is a good validation on the training set.
Visualize the activation maps to see what the models learn on the images.
Use standardization instead of normalization of gray scales.

The corresponding codes are available in the Tests with original data folder.

1. Data augmentation

Data augmentation on the DenseNet model

2. Smaller models: VGG16 and AlexNet

VGG16 learning curves

AlexNet learning curves

3. Swtich the training and the test sets

4. Activation maps

5. Data standardization

Discussion

Model	DenseNet169	Resnet50	DenseNet augmented	VGG	AlexNet	AlexNet switched	AlexNet standardized
accuracy	95%	95%	70%	65%	70%	70%	70%
val_accuracy	~60%	~65%	50%	55%	52%	55%	52%

As it is visible in previous results and on the previous table, the results recovered were not as good as expected. Several possibilities to improve them were tried, but the validation accuracy stays low.

Improvements

After the first project presentation, we decided to test another method so as to have a learning behavior for our model. The problem was with the data split, so we decided to shuffle all the data and make two sets, one train set and one validation set but totally different from the previous ones. We used an Alexnet model because the other model were too precise and learnt too specific features. Here are the results :

AlexNet learning curves with new sets

The corresponding codes are available in the Tests with switched data folder. So the model is now learning and the accuracy is roughly 1 for the validation test. We decided also to use standardization to make the model faster. Here are the results :

AlexNet learning curves with standardization

We see that the model reaches the maximum accuracy faster. To go further, we could try to implement other methods of standardization and other models. For now we have the right behavior and good accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Tests on original data		Tests on original data
Tests on switched data		Tests on switched data
Functions.py		Functions.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CM2003_COVID19CT_Project

COVID-CT challenge