Natural compounds such as biological colorants (biocolorants) have long been employed for the purpose of dying textile and represent a crucial ingredient in the mass market textile industry. However, industry-wide standards for commercialization of biocolorants are still lacking. Thus, it is beneficial to establish a database including compositionally diverse biocolorants. Moreover, it could be used as a tool to identify and authenticate biocolorant in textile end-products, to ensure their quality and safety, thereby supporting the growth of the biocolorant industry. Efficiently managing the databases and analyzing bio-dyed products data typically requires experts to organize and refine data collected from bio-based dye and pigment production. In this process, it not only requires researchers to have a certain understanding of botanical taxonomy but also knowledge about biology and chemistry.
As one part of the BioColour consortium project, our goal in this research is to take advantage of unsupervised learning for cluster analysis, to discover possible clusters of bio-dyed textile in the absence of ground truth labels or other knowledge of expert domain. This work aims to apply different approaches for unsupervised learning. Specifically, we use agglomerative clustering, Fuzzy C-means, OPTICS as well as a well-known artificial neural network (ANN), namely self-organizing maps (SOM), resulting in an investigation that combines data visualization and cluster analysis. In summary, we apply AI techniques to discover hidden clusters emerging among products colored using biocolorant, here specifically bio-dyed textile samples, and show the potential of clustering techniques in this application domain. (2020.1-2021.8)
Thesis Link: https://erepo.uef.fi/handle/123456789/26048
Software | Version |
---|---|
Python | 3.7.0 |
Numpy | 1.17.2 |
pandas | 1.0.4 |
matplotlib | 3.2.1 |
scikit-learn | 0.21.3 |
colormath | 3.0.0 |
In particular, we use its implementation of the evaluation measures:
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.davies_bouldin_score.html
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html
- as well as for the agglomerative hierarchical clustering algorithm:
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html
- and the OPTICS algorithm:
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.OPTICS.html
For the other algorithms, we use separate existing implementations, respectively
- for the Fuzzy C-means algorithms: Madson Luiz Dantas Dias. (2019). fuzzy-c-means: An implementation of Fuzzy C-means clustering algorithm., Zenodo, doi = 10.5281/zenodo.3066222
https://github.com/omadson/fuzzy-c-means
- for the two-dimensional Self-Organizing Maps: Giuseppe Vettigli. (2018). Mini-Som: minimalistic and NumPy-based implementation of the Self Organizing Map.
https://github.com/JustGlowing/minisom
- for the Growing Hierarchical Self-Organizing Map: Civitelli E., Teotini F. (2018). An implementation of Growing Hierarchical SOM algorithm.
https://github.com/enry12/growing_hierarchical_som
The distribution of bio-dyed samples in 2D space after PCA and corresponding clustering results by different unsupervised learning methods.
One example in cluster analysis by 2D SOM in size of 140 neurons:
One example in cluster analysis by GHSOM in size of 20 neurons: