-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
executable file
·84 lines (60 loc) · 2.39 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
output: github_document
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/tglkmeans)](https://CRAN.R-project.org/package=tglkmeans)
[![Codecov test coverage](https://codecov.io/gh/tanaylab/tglkmeans/branch/master/graph/badge.svg)](https://app.codecov.io/gh/tanaylab/tglkmeans?branch=master)
[![R-CMD-check](https://github.com/tanaylab/tglkmeans/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tanaylab/tglkmeans/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
# tglkmeans - efficient implementation of kmeans++ algorithm
This package provides R binding to a cpp implementation of the [kmeans++ algorithm](<https://en.wikipedia.org/wiki/K-means%2B%2B>).
## Installation
You can install the released version of **tglkmeans** using the following command:
```{r, eval=FALSE}
install.packages("tglkmeans")
```
Or install the development version using:
```{r, eval=FALSE}
if (!require("remotes")) install.packages("remotes")
remotes::install_github("tanaylab/tglkmeans")
```
## Basic usage
```{r}
library(tglkmeans)
```
Create 5 clusters normally distributed around 1 to 5, with sd of 0.3:
```{r}
data <- rbind(
matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 2, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 3, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 4, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 5, sd = 0.3), ncol = 2)
)
colnames(data) <- c("x", "y")
head(data)
```
Cluster using kmeans++:
```{r}
km <- TGL_kmeans(data, k = 5, id_column = FALSE)
km
```
Plot the results:
```{r, clustering, fig.show='hold', fig.asp = 1}
plot(data, col = km$cluster)
points(km$centers, pch = 8, cex = 2)
```
## Vignette
Please refer to the package vignettes for usage and workflow, or look at the [usage](https://tanaylab.github.io/tglkmeans/articles/usage.html) section in the site.
```{r, eval=FALSE}
browseVignettes("usage")
```
## A note regarding random number generation
From version 0.4.0 onward, the package uses R random number generation functions instead of the C++11 random number generation functions. Note that this may result in different results from previous versions. To get the same results as previous versions, set the `use_cpp_random` argument to `TRUE` in the `TGL_kmeans` function.