Skip to content

Depository containing our codes and results for the COVID-19 CT classification project

Notifications You must be signed in to change notification settings

o-fares/CM2003_COVID19CT_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CM2003_COVID19CT_Project

Depository containing our codes and results for the COVID-19 CT classification project

COVID-CT challenge

The challenge is a classification task of covid patients using CT scans.

Data Presentation:

The dataset is composed of COVID and NonCOVID images. Some images come from article and some directly from CT. The images extracted from the article have a poorer resolution but radiologists have confirmed the utility of these images. The link for the data is : https://github.com/UCSD-AI4H/COVID-CT

The image files are in Images-processed/CT_COVID.zip and Images-processed/CT_NonCOVID.zip

Besides, the splits are given by the author of the challenge in .txt files present in Data-split/COVID and Data-split/NonCOVID

  • trainCT_COVID, testCT_COVID, valCT_COVID,
  • trainCT_NonCOVID, testCT_NonCOVID, valCT_NonCOVID

There are 425 images in the train set and 203 in the validation set.

How do we load the data ?

  • Recover the txt files to make lists of path to load images
  • Load the images with cv2 and normalize them

Models and results :

The article proposed to use different models, DenseNet169 or ResNet50. We decided to implement both to see the difference of performance between both models. We try also smaller models with less layers or with a smaller number of filters in the Convolution layers. The initial parameters are resumed in the following table and will be used in the next attempts.

Epochs Batch size Filter number Learning rate Loss function Metrics
100 16 32 10^-5 binary cross entropy binary accuracy

The results show that all the models learn pretty well with the train set but the results with the validation set are less conclusive. It seems like the models don't see any benefits in the validation set. Maybe it learns too specific features during the training and then struggle to generalize for the validation set and the test set.

loss_densenet accuracy_densenet DenseNet169 learning curves

loss_resnet accuracy_resnet ResNet50 learning curves

The different steps we have followed to make the results better are the following :

  • Use data augmentation on the training set because there are not enough images in the dataset.
  • Use smaller models such as Vgg16 or Alexnet model with 5 layers that we have used before in the labs for classification tasks.
  • Switch the training set with the test set or the validation set to see if there is a good validation on the training set.
  • Visualize the activation maps to see what the models learn on the images.
  • Use standardization instead of normalization of gray scales.

The corresponding codes are available in the Tests with original data folder.

1. Data augmentation

data aug 1 data aug 2 Data augmentation on the DenseNet model

2. Smaller models: VGG16 and AlexNet

vgg1 vgg2 VGG16 learning curves

conv1 conv2 AlexNet learning curves

3. Swtich the training and the test sets

switch1 switch2

4. Activation maps

vizu12 vizu22 vizu1 vizu21

5. Data standardization

stand1 stand2

Discussion

Model DenseNet169 Resnet50 DenseNet augmented VGG AlexNet AlexNet switched AlexNet standardized
accuracy 95% 95% 70% 65% 70% 70% 70%
val_accuracy ~60% ~65% 50% 55% 52% 55% 52%

As it is visible in previous results and on the previous table, the results recovered were not as good as expected. Several possibilities to improve them were tried, but the validation accuracy stays low.

Improvements

After the first project presentation, we decided to test another method so as to have a learning behavior for our model. The problem was with the data split, so we decided to shuffle all the data and make two sets, one train set and one validation set but totally different from the previous ones. We used an Alexnet model because the other model were too precise and learnt too specific features. Here are the results :

loss_alex accuracy_alex AlexNet learning curves with new sets

The corresponding codes are available in the Tests with switched data folder. So the model is now learning and the accuracy is roughly 1 for the validation test. We decided also to use standardization to make the model faster. Here are the results :

loss_std accuracy_std AlexNet learning curves with standardization

We see that the model reaches the maximum accuracy faster. To go further, we could try to implement other methods of standardization and other models. For now we have the right behavior and good accuracy.

About

Depository containing our codes and results for the COVID-19 CT classification project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published