Extending TPOT to unsupervised clustering #195

matalab · 2016-07-12T12:35:23Z

Hi, I'm excited with your TPOT tool and how it infers hyperparameters for binary classifiers. I was wondering whether you have any plans to extend TPOT to unsupervised machine learning, i.e. clustering?

Context of the issue

Setting hyperparameters for various clustering algorithms in scikit-learn can be tricky similarly to unsupervised learning algorithms. It would be great if clustering hyperparameters could be automatically infered by TPOT in same way as it is performed for classifying algorithms.
I presume silhouette coefficient would be an adequate scoring method, because it is a measure of compactness and separation of clusters. It is already present in scikit-learn (see here: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html)

rhiever · 2016-07-12T12:37:52Z

I like this idea. I'd like to explore it after we have regression integrated into TPOT.

We should explore additional metrics for scoring unsupervised results as well.

matalab · 2016-07-12T12:45:46Z

On the page http://scikit-learn.org/stable/modules/clustering.html, in chapter 2.3.9. Clustering performance evaluation, there are several clustering performance measures mentioned. Besides silhouette coefficient (2.3.9.4. Silhouette Coefficient), there are also:
2.3.9.1. Adjusted Rand index
2.3.9.2. Mutual Information based scores
2.3.9.3. Homogeneity, completeness and V-measure

nlyf · 2016-12-28T09:04:44Z

I'd be glad to have this feature.
Are you really planning on sorting it out?

rhiever · 2017-01-03T16:21:02Z

We plan to add it eventually, but we have many more high-priority issues to resolve before we touch this one. We are happy for you to start working on this issue and send over a PR if you're interested. Please let us know.

HamedMP · 2018-04-10T07:49:19Z

Any updates on this?

mfeurer · 2018-04-10T07:59:47Z

As one of the maintainers of Auto-sklearn I'm also asked this several times. However, there are two papers arguing against such a feature:

I'd be really interested to learn how you overcome these problems.

Bec-k · 2023-02-22T10:25:58Z

Any update on this?

rhiever added enhancement need contributor labels Jul 12, 2016

rhiever mentioned this issue Jul 13, 2016

Extended Multi-label Classification Support #196

Open

AIAdventures mentioned this issue Jun 6, 2017

Titanic example -problem with 2nd last cell. #492

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending TPOT to unsupervised clustering #195

Extending TPOT to unsupervised clustering #195

matalab commented Jul 12, 2016 •

edited by rhiever

Loading

rhiever commented Jul 12, 2016

matalab commented Jul 12, 2016

nlyf commented Dec 28, 2016

rhiever commented Jan 3, 2017

HamedMP commented Apr 10, 2018

mfeurer commented Apr 10, 2018

Bec-k commented Feb 22, 2023

Extending TPOT to unsupervised clustering #195

Extending TPOT to unsupervised clustering #195

Comments

matalab commented Jul 12, 2016 • edited by rhiever Loading

Context of the issue

rhiever commented Jul 12, 2016

matalab commented Jul 12, 2016

nlyf commented Dec 28, 2016

rhiever commented Jan 3, 2017

HamedMP commented Apr 10, 2018

mfeurer commented Apr 10, 2018

Bec-k commented Feb 22, 2023

matalab commented Jul 12, 2016 •

edited by rhiever

Loading