Skip to content

flashxio/knorPy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
Disa Mhembere
Jun 8, 2019
f4098d3 · Jun 8, 2019
Jul 13, 2017
Jun 5, 2019
Jun 6, 2019
Jul 17, 2017
Jul 17, 2017
Jun 5, 2019
May 15, 2019
Jun 5, 2019
Jun 8, 2019
Jul 13, 2017
Jun 4, 2019
Jun 6, 2019
Jul 21, 2017

Repository files navigation

# knor for Python

[![Build
Status](https://travis-ci.org/flashxio/knorPy.svg?branch=master)](https://travis-ci.org/flashxio/knorPy)

These are Python 2.7+ and Python 3 bindings for the standalone
machine in-memory portion of the clustering NUMA Optimized Routines. See the
full C++ library for details: https://github.com/flashxio/knor

## Supported OSes

The `knor` python package has been tested on the following OSes:
- Mac OSX Sierra, Mojave
- Ubuntu LTS 14.04, 16.04, 18.04
- Debian Linux

## Python Dependencies

- Numpy: `pip install -U numpy`
- Setuptools: `pip install -U setuptools`
- pybind11: `pip install -U pybind11`

## Mac Installation

```
pip install knor
```

## Linux installation

### Best Performance configuration

For the best performance make sure the `numa` system package is installed via

```
apt-get install -y libnuma-dbg libnuma-dev libnuma1
```

Then simply install the package via pip:

```
pip install knor
```

## Installation Errors

1.

```
  File “/Library/Python/2.7/site-packages/pip/utils/__init__.py”, line X, in
  call_subprocess
      % (command_desc, proc.returncode, cwd))
      InstallationError: Command “python setup.py egg_info” failed with error
      code 1 in /private/tmp/pip-build-vaASFl/knor/
```

### Solution

Update your `python` version to at least `2.7.10`

2.

```
    In file included from knor/cknor/libkcommon/clusters.cpp:23:
    knor/cknor/libkcommon/util.hpp:29:10: fatal error: ‘random’ file not found
    #include <random>
             ^
    1 error generated.
    error: command ‘/usr/bin/clang’ failed with exit status 1
```
### Solution
This usually occurs on Mac when your Xcode and Xcode command line tools are out of
date. Update then to at least Version 8

3.

```
unable to execute 'x86_64-linux-gnu-gcc': No such file or directory
  error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
```

### Solution
Install the `gcc` compiler via `apt-get install build-essential`

4.

```
 fatal error: Python.h: No such file or directory compilation terminated.
```

### Solution
Install the development headers for the version on Python you intend to install knor e.g

```
apt-get install python-dev  # for python2.x installs
apt-get install python3-dev  # for python3.x installs
```

5.
```
fatal error: 'pybind11/pybind11.h' file not found
#include <pybind11/pybind11.h>
         ^~~~~~~~~~~~~~~~~~~~~
ImportError: No module named pybind11
```

### Solution

This occurs on Mac and is best solved by utilizing virtual environments as
follows:

```
sudo pip install virtualenv
virtualenv -p <python-version> <desired-path>
source <desired-path>/bin/activate
```

Where `python-version` is either `python2.7` or `python3`.

Then install `pybind11` by `pip install pybind11` then attempt to install `knor`

6.
ImportError: No module named 'setuptools'
```

### Solution

Use a recent version of `setuptools` from `pip`

```
sudo apt remove python-setuptools python3-setuptool
pip install -U setuptools
pip3 install -U setuptools
```


## Documentation

```
from knor import *
help(Kmeans)
help(SKmeans)
help(KmeansPP)
help(FuzzyCMeans)
help(Kmedoids)
help(Xmeans)
help(Hmeans)
help(Gmeans)
```

## Example

```
import knor
import numpy as np
data = np.random.random((100, 10))
km = knor.Kmeans(k=5)
ret = km.fit(data)
print(ret)
```

## The `cluster_t` return object

The `cluster_t` return object has the following attributes:

- `k`: The number of clusters requested
- `nrow`: The number of rows/samples in the dataset
- `ncol`: Then number of columns/features in the dataset
- `sizes`: The number of samples in each cluster/centroid
- `iters`: The number of iterations performed
- `centroids`: A `list` where each row is a cluster center
- `clusters`: A `list` index for which cluster each sample falls into