Skip to content

Commit 9905727

Browse files
authored
Merge pull request #1218 from qdrant/qdrant-1.12
[blog] Qdrant 1.12 Release
2 parents 14b3880 + eb35c21 commit 9905727

File tree

7 files changed

+266
-0
lines changed

7 files changed

+266
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
---
2+
title: "Qdrant 1.12 - Distance Matrix, Facet Counting & On-Disk Indexing"
3+
draft: false
4+
short_description: "On-Disk Text & Geo Index. Distance Matrix API. Facet API for Cardinality."
5+
description: "Uncover insights with the Distance Matrix API, dynamically filter via Facet API, and offload additional payload to disk."
6+
preview_image: /blog/qdrant-1.12.x/social_preview.png
7+
social_preview_image: /blog/qdrant-1.12.x/social_preview.png
8+
date: 2024-10-08T00:00:00-08:00
9+
author: David Myriel
10+
featured: true
11+
tags:
12+
- vector search
13+
- distance matrix
14+
- dimensionality reduction
15+
- data exploration
16+
- data visualization
17+
- faceting
18+
- facet api
19+
---
20+
[**Qdrant 1.12.0 is out!**](https://github.com/qdrant/qdrant/releases/tag/v1.12.0) Let's look at major new features and a few minor additions:
21+
22+
**Distance Matrix API:** Efficiently calculate pairwise distances between vectors.</br>
23+
**GUI Data Exploration** Visually navigate your dataset and analyze vector relationships.</br>
24+
**Faceting API:** Dynamically aggregate and count unique values in specific fields.</br>
25+
26+
**Text Index on disk:** Reduce memory usage by storing text indexing data on disk.</br>
27+
**Geo Index on disk:** Offload indexed geographic data on disk for memory efficiency.
28+
29+
## Distance Matrix API for Data Insights
30+
![distance-matrix-api](/blog/qdrant-1.12.x/distance-matrix-api.png)
31+
32+
> **Qdrant** is a similarity search engine. Our mission is to give you the tools to **discover and understand connections** between vast amounts of semantically relevant data
33+
34+
The **Distance Matrix API** is here to lay the groundwork for such tools.
35+
36+
In data exploration, tasks like [**clustering**](https://en.wikipedia.org/wiki/DBSCAN) and [**dimensionality reduction**](https://en.wikipedia.org/wiki/Dimensionality_reduction) rely on calculating distances between data points.
37+
38+
**Use Case:** A retail company with 10,000 customers wants to segment them by purchasing behavior. Each customer is stored as a vector in Qdrant, but without a dedicated API, clustering would need 10,000 separate batch requests, making the process inefficient and costly.
39+
40+
You can use this API to compute a **sparse matrix of distances** that is optimized for large datasets. Then, you can filter through the retrieved data to find the exact vector relationships that matter.
41+
42+
In terms of endpoints, we offer two different formats to show results:
43+
- **Pairs** are simple, intutitive and ideal for graph representation.
44+
- **Offsets** are more complex, but also native when defining CSR sparse matrices.
45+
46+
### Output - Pairs
47+
48+
Use the `pairs` endpoint to compare 10 random point pairs from your dataset:
49+
50+
```http
51+
POST /collections/{collection_name}/points/search/matrix/pairs
52+
{
53+
"sample": 10,
54+
"limit": 2
55+
}
56+
```
57+
Configuring the `sample` will retrieve a random group of 10 points to compare. The `limit` is the number of semantic connections between points to consider.
58+
59+
Qdrant will list a sparse matrix of distances **between the closest pairs**:
60+
61+
```http
62+
{
63+
"result": {
64+
"pairs": [
65+
{"a": 1, "b": 3, "score": 1.4063001},
66+
{"a": 1, "b": 4, "score": 1.2531},
67+
{"a": 2, "b": 1, "score": 1.1550001},
68+
{"a": 2, "b": 8, "score": 1.1359},
69+
{"a": 3, "b": 1, "score": 1.4063001},
70+
{"a": 3, "b": 4, "score": 1.2218001},
71+
{"a": 4, "b": 1, "score": 1.2531},
72+
{"a": 4, "b": 3, "score": 1.2218001},
73+
{"a": 5, "b": 3, "score": 0.70239997},
74+
{"a": 5, "b": 1, "score": 0.6146},
75+
{"a": 6, "b": 3, "score": 0.6353},
76+
{"a": 6, "b": 4, "score": 0.5093},
77+
{"a": 7, "b": 3, "score": 1.0990001},
78+
{"a": 7, "b": 1, "score": 1.0349001},
79+
{"a": 8, "b": 2, "score": 1.1359},
80+
{"a": 8, "b": 3, "score": 1.0553}
81+
]
82+
}
83+
}
84+
```
85+
86+
### Output - Offsets
87+
88+
The `offsets` endpoint offer another format of showing the distance between points:
89+
90+
```http
91+
POST /collections/{collection_name}/points/search/matrix/offsets
92+
{
93+
"sample": 10,
94+
"limit": 2
95+
}
96+
```
97+
98+
Qdrant will return a compact representation of the distances between points in the **form of row and column offsets**.
99+
100+
Two arrays, `offsets_row` and `offsets_col`, represent the positions of non-zero distance values in the matrix. Each entry in these arrays corresponds to a pair of points with a calculated distance.
101+
102+
```http
103+
{
104+
"result": {
105+
"offsets_row": [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7],
106+
"offsets_col": [2, 3, 0, 7, 0, 3, 0, 2, 2, 0, 2, 3, 2, 0, 1, 2],
107+
"scores": [
108+
1.4063001, 1.2531, 1.1550001, 1.1359, 1.4063001,
109+
1.2218001, 1.2531, 1.2218001, 0.70239997, 0.6146, 0.6353,
110+
0.5093, 1.0990001, 1.0349001, 1.1359, 1.0553
111+
],
112+
"ids": [1, 2, 3, 4, 5, 6, 7, 8]
113+
}
114+
}
115+
```
116+
*To learn more about the distance matrix, read [**The Distance Matrix documentation**](/documentation/concepts/explore/#distance-matrix).*
117+
118+
## Distance Matrix API in the Graph UI
119+
120+
We are adding more visualization options to the [**Graph Exploration Tool**](/blog/qdrant-1.11.x/#web-ui-graph-exploration-tool), introduced in v.1.11.
121+
122+
You can now leverage the **Distance Matrix API** from within this tool for a **clearer picture** of your data and its relationships.
123+
124+
**Example:** You can retrieve 900 `sample` points, with a `limit` of 5 connections per vector and a `tree` visualization:
125+
126+
```json
127+
{
128+
"limit": 5,
129+
"sample": 900,
130+
"tree": true
131+
}
132+
```
133+
The new graphing method is cleaner and reveals **relationships and outliers:**
134+
135+
![distance-matrix](/blog/qdrant-1.12.x/distance-matrix.png)
136+
137+
*To learn more about the Web UI Dashboard, read the [**Interfaces documentation**](/documentation/interfaces/web-ui/).*
138+
139+
## Facet API for Metadata Cardinality
140+
141+
![facet-api](/blog/qdrant-1.12.x/facet-api.png)
142+
143+
In modern applications like e-commerce, users often rely on [**filters**](/articles/vector-search-filtering/), such as **brand** or **color**, to refine search results. The **Facet API** is designed to help users understand the distribution of values in a dataset.
144+
145+
The `facet` endpoint can efficiently count and aggregate values for a specific [**payload field**](/documentation/concepts/payload/) in your dataset.
146+
147+
You can use it to retrieve unique values for a field, along with the number of points that contain each value. This functionality is similar to `GROUP BY` with `COUNT(*)` in SQL databases.
148+
149+
> **Note:** Facet counting can only be applied to fields that support `match` conditions, such as fields with a keyword index.
150+
151+
### Configuration
152+
153+
Here’s a sample query using the REST API to facet on the `size` field, filtered by products where the `color` is red:
154+
155+
```http
156+
POST /collections/{collection_name}/facet
157+
{
158+
"key": "size",
159+
"filter": {
160+
"must": {
161+
"key": "color",
162+
"match": { "value": "red" }
163+
}
164+
}
165+
}
166+
```
167+
This returns counts for each unique value in the `size` field, filtered by `color` = `red`:
168+
169+
```json
170+
{
171+
"response": {
172+
"hits": [
173+
{"value": "L", "count": 19},
174+
{"value": "S", "count": 10},
175+
{"value": "M", "count": 5},
176+
{"value": "XL", "count": 1},
177+
{"value": "XXL", "count": 1}
178+
]
179+
},
180+
"time": 0.0001
181+
}
182+
```
183+
The results are sorted by count in descending order and only values with non-zero counts are returned.
184+
185+
### Configuration - Precise Facet
186+
187+
By default, facet counting runs an approximate filter. If you need a precise count, you can enable the `exact` parameter:
188+
189+
```http
190+
POST /collections/{collection_name}/facet
191+
{
192+
"key": "size",
193+
"exact": true
194+
}
195+
```
196+
This feature provides flexibility between performance and precision, depending on the needs of your application.
197+
198+
*To learn more about faceting, read the [**Facet API documentation**](/documentation/concepts/payload/#facet-counts).*
199+
200+
## Text Index on Disk Support
201+
![text-index-disk](/blog/qdrant-1.12.x/text-index-disk.png)
202+
203+
[**Qdrant text indexing**](/documentation/concepts/indexing/#full-text-index) tokenizes text into smaller units (tokens) based on chosen settings (e.g., tokenizer type, token length). These tokens are stored in an inverted index for fast text searches.
204+
205+
> With `on_disk` text indexing, the inverted index is stored on disk, reducing memory usage.
206+
207+
### Configuration
208+
Just like with other indexes, simply add `on_disk: true` when creating the index:
209+
210+
```http
211+
PUT /collections/{collection_name}/index
212+
{
213+
"field_name": "review_text",
214+
"field_schema": {
215+
"type": "text",
216+
"tokenizer": "word",
217+
"min_token_len": 2,
218+
"max_token_len": 20,
219+
"lowercase": true,
220+
"on_disk": true
221+
}
222+
}
223+
```
224+
225+
*To learn more about indexes, read the [**Indexing documentation**](/documentation/concepts/indexing/).*
226+
227+
## Geo Index on Disk Support
228+
229+
For [**large-scale geographic datasets**](/documentation/concepts/payload/#geo) where storing all indexes in memory is impractical, **geo indexing** allows efficient filtering of points based on geographic coordinates.
230+
231+
With `on_disk` geo indexing, the index is written to disk instead of residing in memory, making it possible to handle large datasets without exhausting system memory.
232+
233+
> This can be crucial when dealing with millions of geo points that don’t require real-time access.
234+
235+
### Configuration
236+
237+
To enable this feature, modify the index schema for the geographic field by setting the `on_disk: true` flag.
238+
239+
```http
240+
PUT /collections/{collection_name}/index
241+
{
242+
"field_name": "location",
243+
"field_schema": {
244+
"type": "geo",
245+
"on_disk": true
246+
}
247+
}
248+
```
249+
250+
### Performance Considerations
251+
252+
- **Cold Query Latency:** On-disk indexes require I/O to load index segments, introducing slight latency on first access. Subsequent queries will benefit from disk caching.
253+
- **Hot vs. Cold Indexes:** Fields frequently queried should stay in memory for faster performance, and on-disk indexes are better for large, infrequently queried fields.
254+
- **Memory vs. Disk Trade-offs:** Users can manage memory by deciding which fields to store on disk.
255+
256+
![geo-index-disk](/blog/qdrant-1.12.x/geo-index-disk.png)
257+
258+
> To learn how to get the best performance from Qdrant, read the [**Optimization Guide**](/documentation/guides/optimize/).
259+
260+
## Just the Beginning
261+
262+
The easiest way to reach that **Hello World** moment is to [**try vector search in a live cluster**](/documentation/quickstart-cloud/). Our **interactive tutorial** will show you how to create a cluster, add data and try some filtering clauses.
263+
264+
**All of the new features from version 1.12 can be tested in the Web UI:**
265+
266+
![qdrant-filtering-tutorial](/articles_data/vector-search-filtering/qdrant-filtering-tutorial.png)
Loading
Loading
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)