Feature Request: K-Prototypes or similar implementation for clustering mixed data #118

plutonium-239 · 2021-04-17T14:26:47Z

I would love to have an optimised (or at least CUDA) implementation of the K-Prototypes algorithm (package that I use: kmodes, since a lot of data science deals with categorical data, and it would be great if I don't have to use TargetEncoders or worse, pd.get_dummies() for categorical data with a lot of categories.
Right now, the solution that I use is using a TargetEncoder on the categorical variables and then using the kmeans/knn in this package, which I feel is a little 'fix'-ey, because of numerical data being continuous and having some relations, whereas it is not necessary for the categorical variables to have any relations (greater than/less than)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: K-Prototypes or similar implementation for clustering mixed data #118

Feature Request: K-Prototypes or similar implementation for clustering mixed data #118

plutonium-239 commented Apr 17, 2021 •

edited

Loading

Feature Request: K-Prototypes or similar implementation for clustering mixed data #118

Feature Request: K-Prototypes or similar implementation for clustering mixed data #118

Comments

plutonium-239 commented Apr 17, 2021 • edited Loading

plutonium-239 commented Apr 17, 2021 •

edited

Loading