-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending TPOT to unsupervised clustering #195
Comments
I like this idea. I'd like to explore it after we have regression integrated into TPOT. We should explore additional metrics for scoring unsupervised results as well. |
On the page http://scikit-learn.org/stable/modules/clustering.html, in chapter 2.3.9. Clustering performance evaluation, there are several clustering performance measures mentioned. Besides silhouette coefficient (2.3.9.4. Silhouette Coefficient), there are also: |
I'd be glad to have this feature. |
We plan to add it eventually, but we have many more high-priority issues to resolve before we touch this one. We are happy for you to start working on this issue and send over a PR if you're interested. Please let us know. |
Any updates on this? |
As one of the maintainers of Auto-sklearn I'm also asked this several times. However, there are two papers arguing against such a feature:
I'd be really interested to learn how you overcome these problems. |
Any update on this? |
Hi, I'm excited with your TPOT tool and how it infers hyperparameters for binary classifiers. I was wondering whether you have any plans to extend TPOT to unsupervised machine learning, i.e. clustering?
Context of the issue
Setting hyperparameters for various clustering algorithms in scikit-learn can be tricky similarly to unsupervised learning algorithms. It would be great if clustering hyperparameters could be automatically infered by TPOT in same way as it is performed for classifying algorithms.
I presume silhouette coefficient would be an adequate scoring method, because it is a measure of compactness and separation of clusters. It is already present in scikit-learn (see here: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html)
The text was updated successfully, but these errors were encountered: