Skip to content

By using the convolutional neural network (CNN) to classify the plant seedling with the support of Keras libraries

License

Notifications You must be signed in to change notification settings

Pzugatti/Plant-Seedlings-Classification

Repository files navigation

Plant-Seedlings-Classification

The aim of this project is to use deep learning model to classify the plant seedling by using a supervised learning technique.

The data-set is available on Kaggle : (https://www.kaggle.com/c/plant-seedlings-classification/data).

Data-set examination

There are 12 species in the data-set which are shown below:

xtrain_plant

The data-set is split into two group, which are training and testing data.

Graphic visualizes the total number for each species:

totalnum_species

no_species

According to the above graph, the highest number of the plant is Loose-Silky-Bent, and the lowest number of the plants are Maize and Common Wheat.

Pre-processing dataset

The training and testing images have been processed by using OpenCV libraries that extracted the plant seedling only and removed the background noise. The filtering process depending on the HSV values, retaining green HSV parameters and convert back to RGB format, which means only the green colour remains and the rest of the colour are removed. The pre-processed image has been shown below:

xtrain_image_processing

Then the training and testing dataset have been normalized by dividing 255.0 to limit the pixel values within 0 to 1 and the labels are one-hot-encoded.

Convolutional neural network (CNN)

CNN is a good choice while dealing with the image data. Designed CNN architecture based on personal experience, knowledge and, most important, the machine learning community and forum helps. The time spent a lot on tuning the model hyper-parameters in order to achieve higher accuracy and lower residual for model training. So that the model predicting the unseen data will have a higher chance to obtain the correct result. Of course, there is plenty of other powerful CNN available such as AlexNet, ResNet and more, those networks may also suitable applying in this data-set.

Model visualization:

model

Training the model with validation and training data-set

psr_confusion_matrix

The validation data-set is getting from the training data-set. For instance, the 100% training data-set split out its 10% data-set as validation data-set, which means 10 % treat as validation data-set and 90% treat as training data-set.

According to the confusion matrix, Sugar Beet and Black-Grass have misclassified obviously after the validation and training data-set fit into CNN model. There are 10 samples of Sugar Beet misclassified as Black-Grass, and 7 samples of Black-Grass misclassified as Sugar Beet. This means both plant images may having similar features that confuse the CNN model. The solution could be getting more data-set, apply alternative image processing techniques, more data augmentation or modified or change the current CNN.

loss_acc_curve

cnn_result

Above graph showing the loss and accuracy of the training and validation data after both data-set fittings into the model. The x-axis is the epoch, the loss is decreasing and accuracy is increasing when epoch getting larger. At the end of the epoch, the validation accuracy is greater than the training accuracy that means the model doesn't overfit.

Predict unseen data-set (testing data-set)

kaggle_result Above picture was getting from my Kaggle competition result. The trained model predicted the unseen data and the result shows 0.91939(92%) accuracy. The remaining 8% (100%- 92%) could be the Sugar Beet,Black-Grass, and a small number of other plants have misclassified.

Summary

The Sugar Beet and Black-Grass misclassified after training causes the model unable to differentiate both of them and directly affecting the accuracy when predicting the unseen data. The solution could be getting more data-set, apply alternative image processing techniques, more data augmentation or modified or change the current CNN. This will be the future work.

In my opinion

This Kaggle competition is challenging at least for me, because I just started to study this deep learning field a few months ago. I 'google' numerous blog,forum, documentation and more in order to let myself having intuition understanding how does deep learning work. Still, there are too many knowledge and techniques almost make me feels overwhelming. Anyway, it is great experience to know how to build a simple CNN and gain the significant skills. Even though deep learning network is quite hard to master, however, it could solve the real world problems, which make me feels exciting and motivating!

Working enviroment

Google Colab

  • Keras 2.1.6
  • Python 3
  • Opencv 3.4.3
  • sklearn 0.19.2

About

By using the convolutional neural network (CNN) to classify the plant seedling with the support of Keras libraries

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published