Skip to content

bellingcat/geoclustering

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

f287cb8 ยท Jul 1, 2022

History

21 Commits
Jul 1, 2022
Jul 1, 2022
Jul 1, 2022
Jul 1, 2022
Jun 30, 2022
Jun 30, 2022
Jun 30, 2022
Jul 1, 2022
Jul 1, 2022
Jul 1, 2022
Jul 1, 2022

Repository files navigation

geoclustering

๐Ÿ“ command-line tool for clustering geolocations.

Features

  • Uses DBSCAN or OPTICS to perform clustering.
  • Outputs clustering results as json, txt and geojson.
  • Creates a kepler.gl visualization of clusters.

Clustering Method

A cluster is created when a certain number of points (=> --size) each are within a given distance (=> --distance) of at least one other point in the cluster.

Install

Clone the repository:

git clone https://github.com/bellingcat/geoclustering
cd geoclustering

Install keplergl build dependencies:

# macos
brew install proj gdal

Install project with pip:

pip install .

Usage

Usage: geoclustering [OPTIONS] FILENAME

Options:
  -d, --distance FLOAT            (in km) Max. distance between two points in
                                  a cluster.  [required]
  -s, --size INTEGER              Min. number of points in a cluster.
                                  [required]
  -o, --output PATH               Output directory for results. Default:
                                  ./output
  -a, --algorithm [dbscan|optics]
                                  Clustering algorithm to be used. `optics`
                                  produces tighter clusters but is slower.
                                  Default: dbscan
  --help                          Show this message and exit.

Input

Inputs are supplied as a .csv file. The only required fields are lat and lon, all other fields are reflected to the output.

id,name,lat,lon
1,Bonnibelle Mathwen,40.1324085,64.4911086
...

Output

If at least one cluster was found, the tool outputs a folder with json, geojson, text and a kepler.gl html files.

JSON

Encodes an array of clusters, each containing an array of points.

[
  {
    "cluster_id": 0,
    "points": [
      {
        "id": 9,
        "name": "Rosanna Foggo",
        "lat": -6.2074293,
        "lon": 106.8915948
      }
    ]
  }
]

GeoJSON

Encodes a single FeatureCollection, containing all points as Feature objects.

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [
          106.891595,
          -6.207429
        ]
      },
      "properties": {
        "id": 9,
        "name": "Rosanna Foggo",
        "cluster_id": 0
      }
    }
  ]
}

txt

Encodes cluster as blocks separated by a newline, where each line in a cluster block contains one point.

Cluster 0
id 9, name Rosanna Foggo, lat -6.2074293, lon 106.8915948

// ...

kepler.gl

kepler.gl instance