Code-PCSKM

This repository contains the code used to produce the results of the manuscript: A semi-supervised sparse K-Means algorithm (arxiv version).

exeSimus.m: Runs the whole analysis and stores the results inside the ./GenRes/results folder. This file contains the following options:

DETERM: 0/1 start without or with a random seed.
JMPCKM_OVERLOAD: 0/1 use overloaded or non-overloaded MPCK-Means. The WekaUT library is used for the MPCK-Means algorithm. See Bilenko, M., et al. (2004).
CONSTR_PERC: 0/1 use a flat number of constraints or percentages based on size.
LOG: (0) no log file and no display, (1) log file only, (2) display only, (else) both display and log file.
constraints_type: Type of constraints to use; 0/1 to activate ML and/or CL, when both 1 then equal number of constriants per type is selected when either -1 then random constraints are picked from all the available constraints.
constraints_number: flat or percentage of constraints to use.
citer: number of iterations per constraints
sstep: sparsity parameter values to be tested form 1.1 to sqrt(dimensions) with step sstep.
maxIter: iterations for algorithm to reach convergence.
kfolds: selection of k for k-fold validation.

CVstatsPer.m: Generates statistics about the data sets such as percentage of used constraints during the k-fold validation.

Citations for software and code that we have used in this project

Density K-Means++:

MATLAB code was based on the R implementation of the algorithm; code: dkmpp_0.1.0

MPCK-Means:

Modified WekaUT in order to read initial centroids from text files and write results to text files.

Sparse clustering:

MATLAB code was based on the R implementation of the algorithm; packages: sparcl and wrsk

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
clustering		clustering
cross_validation		cross_validation
datasets		datasets
datasets_extras/constraints		datasets_extras/constraints
tests_hackedWeka		tests_hackedWeka
utilities		utilities
CVstatsPer.m		CVstatsPer.m
LICENSE		LICENSE
README.md		README.md
exeSimus.m		exeSimus.m