Depository containing our codes and results for the COVID-19 CT classification project
The challenge is a classification task of covid patients using CT scans.
The dataset is composed of COVID and NonCOVID images. Some images come from article and some directly from CT. The images extracted from the article have a poorer resolution but radiologists have confirmed the utility of these images. The link for the data is : https://github.com/UCSD-AI4H/COVID-CT
The image files are in Images-processed/CT_COVID.zip
and Images-processed/CT_NonCOVID.zip
Besides, the splits are given by the author of the challenge in .txt
files present in Data-split/COVID
and Data-split/NonCOVID
- trainCT_COVID, testCT_COVID, valCT_COVID,
- trainCT_NonCOVID, testCT_NonCOVID, valCT_NonCOVID
There are 425 images in the train set and 203 in the validation set.
- Recover the txt files to make lists of path to load images
- Load the images with cv2 and normalize them
The article proposed to use different models, DenseNet169 or ResNet50. We decided to implement both to see the difference of performance between both models. We try also smaller models with less layers or with a smaller number of filters in the Convolution layers. The initial parameters are resumed in the following table and will be used in the next attempts.
Epochs | Batch size | Filter number | Learning rate | Loss function | Metrics |
---|---|---|---|---|---|
100 | 16 | 32 | 10^-5 | binary cross entropy | binary accuracy |
The results show that all the models learn pretty well with the train set but the results with the validation set are less conclusive. It seems like the models don't see any benefits in the validation set. Maybe it learns too specific features during the training and then struggle to generalize for the validation set and the test set.
The different steps we have followed to make the results better are the following :
- Use data augmentation on the training set because there are not enough images in the dataset.
- Use smaller models such as Vgg16 or Alexnet model with 5 layers that we have used before in the labs for classification tasks.
- Switch the training set with the test set or the validation set to see if there is a good validation on the training set.
- Visualize the activation maps to see what the models learn on the images.
- Use standardization instead of normalization of gray scales.
The corresponding codes are available in the Tests with original data
folder.
Data augmentation on the DenseNet model
Model | DenseNet169 | Resnet50 | DenseNet augmented | VGG | AlexNet | AlexNet switched | AlexNet standardized |
---|---|---|---|---|---|---|---|
accuracy | 95% | 95% | 70% | 65% | 70% | 70% | 70% |
val_accuracy | ~60% | ~65% | 50% | 55% | 52% | 55% | 52% |
As it is visible in previous results and on the previous table, the results recovered were not as good as expected. Several possibilities to improve them were tried, but the validation accuracy stays low.
After the first project presentation, we decided to test another method so as to have a learning behavior for our model. The problem was with the data split, so we decided to shuffle all the data and make two sets, one train set and one validation set but totally different from the previous ones. We used an Alexnet model because the other model were too precise and learnt too specific features. Here are the results :
AlexNet learning curves with new sets
The corresponding codes are available in the Tests with switched data
folder.
So the model is now learning and the accuracy is roughly 1 for the validation test. We decided also to use standardization to make the model faster. Here are the results :
AlexNet learning curves with standardization
We see that the model reaches the maximum accuracy faster. To go further, we could try to implement other methods of standardization and other models. For now we have the right behavior and good accuracy.