You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The indexing of each specific dimension in VectorChord requires preprocessing, which can increase the latency of the first query. This can be mitigated by modifying GUC `vchordrq.prewarm_dim` to perform the preprocessing ahead of time during PostgreSQL cluster startup.
16
-
17
-
```SQL
18
-
-- Add your vector dimensions to the `prewarm_dim` list to reduce latency.
19
-
-- If this is not configured, the first query will have higher latency as the matrix is generated on demand.
Copy file name to clipboardExpand all lines: src/vectorchord/usage/indexing.md
-1Lines changed: 0 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -67,7 +67,6 @@ When dealing with large datasets (> $10^6$ vectors), please follow these guideli
67
67
- Example:
68
68
-`residual_quantization = false` means that residual quantization is not used.
69
69
-`residual_quantization = true` means that residual quantization is used.
70
-
- Note: set `residual_quantization` to `true` if your model generates embeddings where the metric is Euclidean distance. This option only works for L2 distance. Using it with other distance metrics will result in an error in building.
Copy file name to clipboardExpand all lines: src/vectorchord/usage/search.md
+35-9Lines changed: 35 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -37,7 +37,6 @@ SELECT 1 FROM items WHERE category_id = 1 ORDER BY embedding <#> '[0.5,0.5,0.5]'
37
37
- If `lists = []`, then probes must be an empty list.
38
38
- If `lists = [11, 22]`, then probes can be 2,4 or 4,8, but it must not be an empty list, `3`, `7,8,9`, or `5,5,5,5`.
39
39
40
-
41
40
#### `vchordrq.epsilon`
42
41
43
42
- Description: Even after pruning, the number of retrieved vectors remains substantial. The index employs the RaBitQ algorithm to quantize vectors into bit vectors, which require just $\frac{1}{32}$ the memory of single-precision floating-point vectors. With minimal floating-point operations, most computations are integer-based, leading to faster processing. Unlike typical quantization algorithms, RaBitQ not only estimates distances but also their lower bounds. The index computes the lower bound for each vector and dynamically adjusts the number of vectors needing recalculated distances, based on the query count, thus balancing performance and accuracy. The GUC parameter `vchordrq.epsilon` controls the conservativeness of the lower bounds of distances. The higher the value, the higher the accuracy, but the worse the performance. The default value provides unnecessarily high accuracy for most indexes, so you can try lowering this parameter to achieve better performance.
@@ -51,11 +50,38 @@ SELECT 1 FROM items WHERE category_id = 1 ORDER BY embedding <#> '[0.5,0.5,0.5]'
51
50
52
51
You can refer to [performance tuning](../usage/performance-tuning#query-performance) for more information about tuning the query performance.
53
52
54
-
#### `vchordrq.prewarm_dim`
55
-
56
-
- Description: The `vchordrq.prewarm_dim` GUC parameter is used to precompute the RaBitQ projection matrix for the specified dimensions. This can help to reduce the latency of the first query after the PostgreSQL cluster is started.
57
-
- Type: list of integers
58
-
- Default: `64,128,256,384,512,768,1024,1536`
59
-
- Example:
60
-
-`ALTER SYSTEM SET vchordrq.prewarm_dim = '64,128'` means that the projection matrix will be precomputed for dimensions 64 and 128.
61
-
- Note: This setting requires a database restart to take effect.
53
+
#### `vchordrq.prefilter`
54
+
55
+
- Description: The `vchordrq.prefilter` GUC parameter enables condition evaluation before distance computation. For example, in the query `SELECT * FROM items WHERE id % 2 = 0 ORDER BY embedding <-> '[3,1,2]' LIMIT 5`, the index normally computes all useful `embedding <-> '[3,1,2]'` distances first and then pass the rows to PostgreSQL, which filters out rows where `id % 2 != 0`. This parameter allows the index to pre-evaluate the condition and discard non-matching rows before computing their distances, improving query efficiency.
56
+
- Type: boolean
57
+
- Default: `false`
58
+
59
+
#### `vchordrq.search_rerank`
60
+
61
+
- Description: This GUC parameter controls the I/O prefetching strategy for reading bit vectors in vector search, which can significantly impact search performance on disk-based vectors.
-`read_buffer` indicates a preference for `ReadBuffer`.
71
+
-`prefetch_buffer` indicates a preference for both `PrefetchBuffer` and `ReadBuffer`.
72
+
-`read_stream` indicates a preference for `read_stream`.
73
+
74
+
#### `vchordrq.io_rerank`
75
+
76
+
- Description: This GUC parameter controls the I/O prefetching strategy for reading full precision vectors in vector search, which can significantly impact search performance on disk-based vectors.
0 commit comments