Plant-Seedlings-Classification

The aim of this project is to use deep learning model to classify the plant seedling by using a supervised learning technique.

The data-set is available on Kaggle : (https://www.kaggle.com/c/plant-seedlings-classification/data).

Data-set examination

There are 12 species in the data-set which are shown below:

The data-set is split into two group, which are training and testing data.

Graphic visualizes the total number for each species:

According to the above graph, the highest number of the plant is Loose-Silky-Bent, and the lowest number of the plants are Maize and Common Wheat.

Pre-processing dataset

The training and testing images have been processed by using OpenCV libraries that extracted the plant seedling only and removed the background noise. The filtering process depending on the HSV values, retaining green HSV parameters and convert back to RGB format, which means only the green colour remains and the rest of the colour are removed. The pre-processed image has been shown below:

Then the training and testing dataset have been normalized by dividing 255.0 to limit the pixel values within 0 to 1 and the labels are one-hot-encoded.

Convolutional neural network (CNN)

CNN is a good choice while dealing with the image data. Designed CNN architecture based on personal experience, knowledge and, most important, the machine learning community and forum helps. The time spent a lot on tuning the model hyper-parameters in order to achieve higher accuracy and lower residual for model training. So that the model predicting the unseen data will have a higher chance to obtain the correct result. Of course, there is plenty of other powerful CNN available such as AlexNet, ResNet and more, those networks may also suitable applying in this data-set.

Model visualization:

Training the model with validation and training data-set

The validation data-set is getting from the training data-set. For instance, the 100% training data-set split out its 10% data-set as validation data-set, which means 10 % treat as validation data-set and 90% treat as training data-set.

According to the confusion matrix, Sugar Beet and Black-Grass have misclassified obviously after the validation and training data-set fit into CNN model. There are 10 samples of Sugar Beet misclassified as Black-Grass, and 7 samples of Black-Grass misclassified as Sugar Beet. This means both plant images may having similar features that confuse the CNN model. The solution could be getting more data-set, apply alternative image processing techniques, more data augmentation or modified or change the current CNN.

Above graph showing the loss and accuracy of the training and validation data after both data-set fittings into the model. The x-axis is the epoch, the loss is decreasing and accuracy is increasing when epoch getting larger. At the end of the epoch, the validation accuracy is greater than the training accuracy that means the model doesn't overfit.

Predict unseen data-set (testing data-set)

Above picture was getting from my Kaggle competition result. The trained model predicted the unseen data and the result shows 0.91939(92%) accuracy. The remaining 8% (100%- 92%) could be the Sugar Beet,Black-Grass, and a small number of other plants have misclassified.

Summary

The Sugar Beet and Black-Grass misclassified after training causes the model unable to differentiate both of them and directly affecting the accuracy when predicting the unseen data. The solution could be getting more data-set, apply alternative image processing techniques, more data augmentation or modified or change the current CNN. This will be the future work.

In my opinion

This Kaggle competition is challenging at least for me, because I just started to study this deep learning field a few months ago. I 'google' numerous blog,forum, documentation and more in order to let myself having intuition understanding how does deep learning work. Still, there are too many knowledge and techniques almost make me feels overwhelming. Anyway, it is great experience to know how to build a simple CNN and gain the significant skills. Even though deep learning network is quite hard to master, however, it could solve the real world problems, which make me feels exciting and motivating!

Working enviroment

Google Colab

Keras 2.1.6
Python 3
Opencv 3.4.3
sklearn 0.19.2

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Picture		Picture
__pycache__		__pycache__
CNN_PlantSeedClf_Run.ipynb		CNN_PlantSeedClf_Run.ipynb
LICENSE		LICENSE
PSC_config_V1.json		PSC_config_V1.json
PSC_weight_V1.h5		PSC_weight_V1.h5
README.md		README.md
_config.yml		_config.yml
evaluate_model.py		evaluate_model.py
model_arch.py		model_arch.py
process_dataset.py		process_dataset.py
utils_tools.py		utils_tools.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Plant-Seedlings-Classification

Data-set examination

There are 12 species in the data-set which are shown below:

Graphic visualizes the total number for each species:

Pre-processing dataset

Convolutional neural network (CNN)

Model visualization:

Training the model with validation and training data-set

Predict unseen data-set (testing data-set)

Summary

In my opinion

Working enviroment

About

Releases

Packages

Languages

License

Pzugatti/Plant-Seedlings-Classification

Folders and files

Latest commit

History

Repository files navigation

Plant-Seedlings-Classification

Data-set examination

There are 12 species in the data-set which are shown below:

Graphic visualizes the total number for each species:

Pre-processing dataset

Convolutional neural network (CNN)

Model visualization:

Training the model with validation and training data-set

Predict unseen data-set (testing data-set)

Summary

In my opinion

Working enviroment

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages