This repository contains the code used to produce the results of the manuscript: A semi-supervised sparse K-Means algorithm (arxiv version).
exeSimus.m: Runs the whole analysis and stores the results inside the ./GenRes/results folder. This file contains the following options:
-
DETERM: 0/1 start without or with a random seed.
-
JMPCKM_OVERLOAD: 0/1 use overloaded or non-overloaded MPCK-Means. The WekaUT library is used for the MPCK-Means algorithm. See Bilenko, M., et al. (2004).
-
CONSTR_PERC: 0/1 use a flat number of constraints or percentages based on size.
-
LOG: (0) no log file and no display, (1) log file only, (2) display only, (else) both display and log file.
-
constraints_type: Type of constraints to use; 0/1 to activate ML and/or CL, when both 1 then equal number of constriants per type is selected when either -1 then random constraints are picked from all the available constraints.
-
constraints_number: flat or percentage of constraints to use.
-
citer: number of iterations per constraints
-
sstep: sparsity parameter values to be tested form 1.1 to sqrt(dimensions) with step sstep.
-
maxIter: iterations for algorithm to reach convergence.
-
kfolds: selection of k for k-fold validation.
CVstatsPer.m: Generates statistics about the data sets such as percentage of used constraints during the k-fold validation.
Density K-Means++:
MATLAB code was based on the R implementation of the algorithm; code: dkmpp_0.1.0
MPCK-Means:
Modified WekaUT in order to read initial centroids from text files and write results to text files.
Sparse clustering:
MATLAB code was based on the R implementation of the algorithm; packages: sparcl
and wrsk