From 3c429ebd31f78adbab4e4283d0d797c004d0a2fc Mon Sep 17 00:00:00 2001 From: KevinMusgrave Date: Wed, 30 Mar 2022 04:24:49 -0400 Subject: [PATCH] Updated the docs --- docs/accuracy_calculation.md | 2 +- docs/distances.md | 31 +++++++++++++++++++++++++++++++ docs/inference_models.md | 3 ++- docs/losses.md | 8 ++------ 4 files changed, 36 insertions(+), 8 deletions(-) diff --git a/docs/accuracy_calculation.md b/docs/accuracy_calculation.md index 2bd2a995..525b8cfd 100644 --- a/docs/accuracy_calculation.md +++ b/docs/accuracy_calculation.md @@ -24,7 +24,7 @@ AccuracyCalculator(include=(), * ```None```. This means k will be set to the total number of reference embeddings. * An integer greater than 0. This means k will be set to the input integer. * ```"max_bin_count"```. This means k will be set to ```max(bincount(reference_labels)) - self_count``` where ```self_count == 1``` if the query and reference embeddings come from the same source. -* **label_comparison_fn**: A function that compares two torch arrays of labels and returns a boolean array. The default is ```torch.eq```. If a custom function is used, then you must exclude clustering based metrics ("NMI" and "AMI"). The following is an example of a custom function for two-dimensional labels. It returns ```True``` if the 0th column matches, and the 1st column does **not** match: +* **label_comparison_fn**: A function that compares two torch arrays of labels and returns a boolean array. The default is ```torch.eq```. If a custom function is used, then you must exclude clustering based metrics ("NMI" and "AMI"). The example below shows a custom function for two-dimensional labels. It returns ```True``` if the 0th column matches, and the 1st column does **not** match. * **device**: The device to move input tensors to. If ```None```, will default to GPUs if available. * **knn_func**: A callable that takes in 4 arguments (```query, k, reference, embeddings_come_from_same_source```) and returns ```distances, indices```. Default is ```pytorch_metric_learning.utils.inference.FaissKNN```. * **kmeans_func**: A callable that takes in 2 arguments (```x, nmb_clusters```) and returns a 1-d tensor of cluster assignments. Default is ```pytorch_metric_learning.utils.inference.FaissKMeans```. diff --git a/docs/distances.md b/docs/distances.md index bb41474f..24562015 100644 --- a/docs/distances.md +++ b/docs/distances.md @@ -67,6 +67,37 @@ def pairwise_distance(self, query_emb, ref_emb): ``` +## BatchedDistance + +Computes distance matrices iteratively, passing each matrix into ```iter_fn```. + +```python +distances.BatchedDistance(distance, iter_fn=None, batch_size=32) +``` + +**Parameters**: + +* **distance**: The wrapped distance function. +* **iter_fn**: This function will be called at every iteration. It receives ```(mat, s, e)``` as input, where ```mat``` is the current distance matrix, and ```s, e``` is the range of query embeddings used to construct ```mat```. +* **batch_size**: Each distance matrix will be size ```(batch_size, len(ref_emb))```. + +**Example usage**: +```python +from pytorch_metric_learning.distances import BatchedDistance, CosineSimilarity + +def fn(mat, s, e): + print(f"At query indices {s}:{e}") + +distance = BatchedDistance(CosineSimilarity(), fn) + +# Works like a regular distance function, except nothing is returned. +# So any persistent changes need to be done in the supplied iter_fn. +# query vs query +distance(embeddings) +# query vs ref +distance(embeddings, ref_emb) +``` + ## CosineSimilarity ```python distances.CosineSimilarity(**kwargs) diff --git a/docs/inference_models.md b/docs/inference_models.md index a74afa0b..9e9c1e3a 100644 --- a/docs/inference_models.md +++ b/docs/inference_models.md @@ -119,12 +119,13 @@ Uses a [distance function](distances.md) to determine similarity between datapoi ```python from pytorch_metric_learning.utils.inference import CustomKNN -CustomKNN(distance) +CustomKNN(distance, batch_size=None) ``` **Parameters**: * **distance**: A [distance function](distances.md) +* **batch_size**: If specified, k-nn will be computed incrementally. For example, if there are 50000 reference embeddings and the batch size is 32, then CustomKNN will iterate through all embeddings, using distance matrices of size (32, 50000). The final result is equal to the ```batch_size=None``` setting, but saves memory because the full (50000, 50000) matrix does not need to be computed all at once. Example: ```python diff --git a/docs/losses.md b/docs/losses.md index 60bc2e94..b9934839 100644 --- a/docs/losses.md +++ b/docs/losses.md @@ -169,15 +169,11 @@ Unlike many other losses, the instance of this class can only be called as the f ```python from pytorch_metric_learning import losses -loss_func = losses.SomeLoss() - -embeddings = torch.randn(8, 32) -labels = torch.tensor([0, 0, 0, 0, 0, 0, 1, 1]) +loss_func = losses.CentroidTripletLoss() loss = loss_func(embeddings, labels) ``` -and does not allow for use of `ref_embs`, `ref_labels`. Furthermore, the labels can't imply classes with just one -embedding in it (e.g. if there was only one label with value `1` in the above example). Refer to a [previous issue](https://github.com/KevinMusgrave/pytorch-metric-learning/issues/451) about this topic. +and does not allow for use of `ref_embs`, `ref_labels`. Furthermore, there must be at least 2 embeddings associated with each label. Refer to [this issue](https://github.com/KevinMusgrave/pytorch-metric-learning/issues/451) for details. **Parameters**: