doc variation of info

lbollar · Aug 10, 2014 · c72d19c · c72d19c
1 parent 9d25d68
commit c72d19c
Show file tree

Hide file tree

Showing 7 changed files with 38 additions and 5 deletions.
diff --git a/doc/source/affprop.rst b/doc/source/affprop.rst
@@ -1,7 +1,7 @@
 Affinity Propagation
 ======================
 
-*Affinity propagation* is a clustering algorithm based on *message passing* between data points. Similar to *K-medoids*, it finds a subset of points as *exemplars* based on (dis)similarities, and assigns each point in the given data set to the closest exemplar.  
+`Affinity propagation <http://en.wikipedia.org/wiki/Affinity_propagation>`_ is a clustering algorithm based on *message passing* between data points. Similar to *K-medoids*, it finds a subset of points as *exemplars* based on (dis)similarities, and assigns each point in the given data set to the closest exemplar.  
 
 This package implements the affinity propagation algorithm based on the following paper:
 

diff --git a/doc/source/dbscan.rst b/doc/source/dbscan.rst
@@ -1,7 +1,7 @@
 DBSCAN
 =========
 
-*Density-based Spatial Clustering of Applications with Noise (DBSCAN)* is a data clustering algorithm that finds clusters through density-based expansion of seed points. The algorithm is proposed by:
+`Density-based Spatial Clustering of Applications with Noise (DBSCAN) <http://en.wikipedia.org/wiki/DBSCAN>`_ is a data clustering algorithm that finds clusters through density-based expansion of seed points. The algorithm is proposed by:
 
     Martin Ester, Hans-peter Kriegel, Jörg S, and Xiaowei Xu
     *A density-based algorithm for discovering clusters in large spatial databases with noise.* 

diff --git a/doc/source/kmeans.rst b/doc/source/kmeans.rst
@@ -1,7 +1,7 @@
 K-means
 ==========
 
-*K-means* is a classic method for clustering or vector quantization. The K-means algorithms produces a fixed number of clusters, each associated with a *center* (also known as a *prototype*), and each sample belongs to a cluster with the nearest center. 
+`K-means <http://en.wikipedia.org/wiki/K_means>`_ is a classic method for clustering or vector quantization. The K-means algorithms produces a fixed number of clusters, each associated with a *center* (also known as a *prototype*), and each sample belongs to a cluster with the nearest center. 
 
 From a mathematical standpoint, K-means is an coordinate descent algorithm to solve the following optimization problem:
 

diff --git a/doc/source/kmedoids.rst b/doc/source/kmedoids.rst
@@ -1,7 +1,7 @@
 K-medoids
 ===========
 
-*K-medoids* is a clustering algorithm that seeks a subset of points out of a given set such that the total costs or distances between each point to the closest point in the chosen subset is minimal. This chosen subset of points are called *medoids*.
+`K-medoids <http://en.wikipedia.org/wiki/K-medoids>`_ is a clustering algorithm that seeks a subset of points out of a given set such that the total costs or distances between each point to the closest point in the chosen subset is minimal. This chosen subset of points are called *medoids*.
 
 This package implements a K-means style algorithm instead of PAM, which is considered to be much more efficient and reliable. Particularly, the algorithm is implemented by the ``kmedoids`` function.
 

diff --git a/doc/source/silhouette.rst b/doc/source/silhouette.rst
@@ -1,7 +1,7 @@
 Silhouettes
 =============
 
-*Silhouettes* is a method for validating clusters of data. Particularly, it provides a quantitative way to measure how well each item lies within its cluster as opposed to others. The *Silhouette* value of a data point is defined as:
+`Silhouettes <http://en.wikipedia.org/wiki/Silhouette_(clustering)>`_ is a method for validating clusters of data. Particularly, it provides a quantitative way to measure how well each item lies within its cluster as opposed to others. The *Silhouette* value of a data point is defined as:
 
 .. math::
 

diff --git a/doc/source/validate.rst b/doc/source/validate.rst
@@ -6,3 +6,4 @@ This package provides a variety of ways to validate or evaluate clustering resul
 .. toctree:: 
 
 	silhouette.rst
+	varinfo.rst
diff --git a/doc/source/varinfo.rst b/doc/source/varinfo.rst
@@ -0,0 +1,32 @@
+Variation of Information
+==========================
+
+`Variation of information <http://en.wikipedia.org/wiki/Variation_of_information>`_ (also known as *shared information distance*) is a measure of the distance between two clusterings. It is devised based on mutual information, but it is a true metric, *i.e.* it satisfies symmetry and triangle inequality. 
+
+**References:**
+
+    Meila, Marina (2003). 
+    *Comparing Clusterings by the Variation of Information.* 
+    Learning Theory and Kernel Machines: 173–187. 
+
+This package provides the ``varinfo`` function that implements this metric:
+
+.. function:: varinfo(k1, a1, k2, a2)
+
+    Compute the variation of information between two assignments. 
+
+    :param k1: The number of clusters in the first clustering.
+    :param a1: The assignment vector for the first clustering.
+    :param k2: The number of clusters in the second clustering.
+    :param a2: The assignment vector for the second clustering.
+
+    :return: the value of variation of information.
+
+.. function:: varinfo(R, k0, a0)
+
+    This method takes ``R``, an instance of ``ClusteringResult``, as input, and computes the variation of information between its corresponding clustering with one given by ``(k0, a0)``, where ``k0`` is the number of clusters in the other clustering, while ``a0`` is the corresponding assignment vector. 
+
+.. function:: varinfo(R1, R2)
+
+    This method takes ``R1`` and ``R2`` (both are instances of ``ClusteringResult``) and computes the variation of information between them.
+
Original file line number	Diff line number	Diff line change
Expand Up		@@ -6,3 +6,4 @@ This package provides a variety of ways to validate or evaluate clustering resul
		.. toctree::

		silhouette.rst
		varinfo.rst