Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending TPOT to unsupervised clustering #195

Open
matalab opened this issue Jul 12, 2016 · 7 comments
Open

Extending TPOT to unsupervised clustering #195

matalab opened this issue Jul 12, 2016 · 7 comments

Comments

@matalab
Copy link

matalab commented Jul 12, 2016

Hi, I'm excited with your TPOT tool and how it infers hyperparameters for binary classifiers. I was wondering whether you have any plans to extend TPOT to unsupervised machine learning, i.e. clustering?

Context of the issue

Setting hyperparameters for various clustering algorithms in scikit-learn can be tricky similarly to unsupervised learning algorithms. It would be great if clustering hyperparameters could be automatically infered by TPOT in same way as it is performed for classifying algorithms.
I presume silhouette coefficient would be an adequate scoring method, because it is a measure of compactness and separation of clusters. It is already present in scikit-learn (see here: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html)

@rhiever
Copy link
Contributor

rhiever commented Jul 12, 2016

I like this idea. I'd like to explore it after we have regression integrated into TPOT.

We should explore additional metrics for scoring unsupervised results as well.

@matalab
Copy link
Author

matalab commented Jul 12, 2016

On the page http://scikit-learn.org/stable/modules/clustering.html, in chapter 2.3.9. Clustering performance evaluation, there are several clustering performance measures mentioned. Besides silhouette coefficient (2.3.9.4. Silhouette Coefficient), there are also:
2.3.9.1. Adjusted Rand index
2.3.9.2. Mutual Information based scores
2.3.9.3. Homogeneity, completeness and V-measure

@nlyf
Copy link

nlyf commented Dec 28, 2016

I'd be glad to have this feature.
Are you really planning on sorting it out?

@rhiever
Copy link
Contributor

rhiever commented Jan 3, 2017

We plan to add it eventually, but we have many more high-priority issues to resolve before we touch this one. We are happy for you to start working on this issue and send over a PR if you're interested. Please let us know.

@HamedMP
Copy link

HamedMP commented Apr 10, 2018

Any updates on this?

@mfeurer
Copy link

mfeurer commented Apr 10, 2018

As one of the maintainers of Auto-sklearn I'm also asked this several times. However, there are two papers arguing against such a feature:

I'd be really interested to learn how you overcome these problems.

@Bec-k
Copy link

Bec-k commented Feb 22, 2023

Any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants