Skip to content

Code for the manuscript: "A semi-supervised sparse K-Means algorithm"

License

Notifications You must be signed in to change notification settings

avouros/Code-PCSKM

Repository files navigation

This repository contains the code used to produce the results of the manuscript: A semi-supervised sparse K-Means algorithm (arxiv version).

Code-PCSKM

exeSimus.m: Runs the whole analysis and stores the results inside the ./GenRes/results folder. This file contains the following options:

  • DETERM: 0/1 start without or with a random seed.

  • JMPCKM_OVERLOAD: 0/1 use overloaded or non-overloaded MPCK-Means. The WekaUT library is used for the MPCK-Means algorithm. See Bilenko, M., et al. (2004).

  • CONSTR_PERC: 0/1 use a flat number of constraints or percentages based on size.

  • LOG: (0) no log file and no display, (1) log file only, (2) display only, (else) both display and log file.

  • constraints_type: Type of constraints to use; 0/1 to activate ML and/or CL, when both 1 then equal number of constriants per type is selected when either -1 then random constraints are picked from all the available constraints.

  • constraints_number: flat or percentage of constraints to use.

  • citer: number of iterations per constraints

  • sstep: sparsity parameter values to be tested form 1.1 to sqrt(dimensions) with step sstep.

  • maxIter: iterations for algorithm to reach convergence.

  • kfolds: selection of k for k-fold validation.

CVstatsPer.m: Generates statistics about the data sets such as percentage of used constraints during the k-fold validation.

Citations for software and code that we have used in this project

Density K-Means++:

Nidheesh, N., KA Abdul Nazeer, and P. M. Ameer. "An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data." Computers in biology and medicine 91 (2017): 213-221.

MATLAB code was based on the R implementation of the algorithm; code: dkmpp_0.1.0

MPCK-Means:

Bilenko, Mikhail, Sugato Basu, and Raymond J. Mooney. "Integrating constraints and metric learning in semi-supervised clustering." Proceedings of the twenty-first international conference on Machine learning. 2004.

Modified WekaUT in order to read initial centroids from text files and write results to text files.

Sparse clustering:

Witten, Daniela M., and Robert Tibshirani. "A framework for feature selection in clustering." Journal of the American Statistical Association 105.490 (2010): 713-726.

Brodinová, Šárka, et al. "Robust and sparse k-means clustering for high-dimensional data." Advances in Data Analysis and Classification (2017): 1-28.

MATLAB code was based on the R implementation of the algorithm; packages: sparcl and wrsk

About

Code for the manuscript: "A semi-supervised sparse K-Means algorithm"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published