Skip to content

Kandinsky - analysis of color in photographic images through clustering and other algorithms.

License

Notifications You must be signed in to change notification settings

shauryashaurya/kandinsky

Repository files navigation

Kandinsky

Clustering and Quantization
Using photographs as visual input

Extracting the most significant colors from the photograph using K-Means, Photo © Shaurya Agarwal

Significant colors in a photograph.


Scope

This started as a very simple exploration of the simplest clustering algorithm in use, but I can see that doing a more comprehensive coverage of algorithms may be very valuable. Kandinsky aims to cover:

I. Basic building blocks

  1. Similarity/Distance Measures:

    • Euclidean Distance (Cartesian)
    • Manhattan Distance
    • Cosine Distance
    • Mahalanobis Distance
    • Domain-specific Distances
  2. Data Preprocessing:

    • Feature Scaling and Normalization
    • Dimensionality Reduction (e.g., PCA, t-SNE)
  3. Cluster Evaluation:

    • Internal Measures (Cohesion, Separation)
      • Silhouette Coefficient
      • Davies-Bouldin Index
    • External Measures (vs. Ground Truth)
      • Purity, Rand Index, Adjusted Rand Index

II. Clustering Algorithms

  1. Partitioning-Based

    • K-Means (hard assignments)
    • K-Medoids (more robust to outliers)
    • Fuzzy C-Means (soft assignments)
  2. Hierarchical

    • Agglomerative (Bottom-up)
      • Various linkage methods (single, complete, average)
    • Divisive (Top-down)
  3. Density-Based

    • DBSCAN (Density-Based Spatial Clustering of Applications with Noise, discovers clusters of varying shapes)
    • OPTICS (Ordering Points To Identify the Clustering Structure, extension of DBSCAN, provides reachability plot)
    • HDBSCAN (Improved density clustering, handles varying densities)
  4. Distribution-Based

    • Gaussian Mixture Models (GMM) (assumes data follows mixtures of Gaussian distributions)
  5. Grid-Based

    • STING (Statistical Information Grid-based Clustering)
    • CLIQUE (Clustering In QUEst)
  6. Neural Network-Based

    • Autoencoders (Variational, Denoising, etc.)
      • Learn latent representations for clustering
    • Self-Organizing Maps (SOMs)
      • Preserve neighborhood relationships in a grid-like space
    • Deep Embedded Clustering (DEC)

III. Additional Stuff to tackle when I get time and braincycles to spare...

  • Clustering High-Dimensional Data: Image data often results in high-dimensional feature vectors, so techniques for dimensionality reduction become crucial. It is easy to see that distances like Euclidean or Cartesian lose their meaning as we go into higher dimensional data. Also think about situations where one dimension may not advance as much as other - for e.g. considering age and salary, age may only go from 0 to 100, while salary may range from 0 to 1 million (hint: specifically for this example, prefer Manhattan distance over Cartesian).
  • Clustering Large-Scale Data: When you have many images, scalable clustering algorithms (e.g., sampling or mini-batch variations of standard methods) are essential.
  • Spectral Clustering (Flexible approach, particularly effective on non-convex cluster shapes)
  • Graph-Based Clustering
  • Hybrid Approaches (Combining traditional algorithms with neural networks)
  • Clustering High-Dimensional Data
  • Clustering Large-Scale Data (sampling, incremental approaches)
  • Affinity Propagation (Finds clusters based on message-passing between data points)

...so yeah! there's a bunch of work needed!


Notebooks [WIP]

  • 00 Prep the Pictures
  • 01 K-Means
  • 015 Color Models

Eight Down Toofaan Mail

Kandinsky helped in the cinematography for our feature film Eight Down Toofaan Mail.

our feature film **Eight Down Toofaan Mail**


Talks

References

Font

About

Kandinsky - analysis of color in photographic images through clustering and other algorithms.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published