Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prod] Lower your cost on opensearch using binary vectors. #3464

Merged
merged 14 commits into from
Nov 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
---
layout: post
title: "Optimize your OpenSearch costs using binary vectors"
authors:
- heemin
- junqiu
- vamshin
- dylantong
- kolchfa
date: 2024-11-25
categories:
- technical-posts
meta_keywords: vector search, binary vectors in OpenSearch, k-NN plugin, difference between FP32 and binary vectors, Binary vector challenges, HNSW algorithm
meta_description: Explore how binary vectors in OpenSearch revolutionize large-scale vector search, offering significant cost savings and performance improvements over traditional FP32 vectors.
excerpt: Binary vectors offer a powerful, efficient alternative to FP32 vectors, reducing memory and storage by more than 90% without compromising performance. They provide a cost-effective way to scale large datasets while boosting resource efficiency.
---

Imagine the ability to search through hundreds of millions of high-dimensional vectors in a split second, all while using less storage and memory than ever before. This might sound impossible, but with binary vectors—--OpenSearch's latest advancement in large-scale vector search—--it's a reality. In a world where data is growing at explosive rates, handling massive datasets with reduced memory is crucial, whether you're building recommendation systems or advanced search engines. In this blog post, we'll explore how binary vectors perform as compared to traditional FP32 vectors, especially with large datasets like our randomly generated 768-dimensional, 100-million vector dataset. We'll look at storage, memory usage, and search speed as well as how binary vectors might change the way you approach vector search.

## What's the difference between FP32 and binary vectors?

FP32 vectors have long been the standard for vector search because of their high precision and seamless integration with many large language models (LLMs), which typically produce vectors in floating-point formats. However, this precision comes at a price—increased storage and memory. As your data needs increase, this trade-off can be hard to justify. In contrast, binary vectors use only 1s and 0s, as shown in the following image.

<img src="/assets/media/blog-images/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors/pic1.png" alt="fp32-binary-comparison" class="img-centered"/>

This binary format makes binary vectors more compact and faster to process. LLMs are increasingly generating binary embeddings in order to improve their efficiency on large datasets, offering significant reductions in storage, memory, and latency.

## Using binary vectors in OpenSearch

Let's explore how you can use binary vectors in your OpenSearch solution.

### Data preparation

To get started, you'll need binary vector data. Luckily, many models now generate embeddings in binary format. For example, the Cohere Embed v3 model generates binary vectors.

Binary vectors are arrays of 1s and 0s, such as `[0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0]`. However, OpenSearch requires that binary vectors be packed into an `int8` byte format. For example, the preceding bit array would be converted to `[108, -116]`, as shown in the following image.

<img src="/assets/media/blog-images/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors/pic2.png" alt="binary-vector-packing" class="img-centered"/>

Many embedding models already generate binary vectors in `int8` byte format, so extra packing is usually unnecessary. However, if your data is stored in a bit array, you can easily convert it into a byte array using the `numpy` library:

```python
import numpy as np
bit_array = [0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0]
bit_array_np = np.array(bit_array, dtype=np.uint8)
byte_array = np.packbits(bit_array_np).astype(np.int8).tolist()
```

### Ingestion and search

Once your data is stored in byte arrays, you'll need to ingest those into OpenSearch.

First, set the data type to `binary` in your index mapping and ensure that the vector dimensions are a multiple of 8 (if they are not, pad the vectors with zeros):

```json
PUT /test-binary-hnsw
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"my_vector": {
"type": "knn_vector",
"dimension": 16,
"data_type": "binary",
"space_type": "hamming",
"method": {
"name": "hnsw",
"engine": "faiss"
}
}
}
}
}
```

Binary vectors in OpenSearch use the Hamming distance for indexing and search.

Next, pack the binary vector into a byte format for both indexing and searching. Otherwise, using binary vectors is similar to working with FP32 vectors. In the following example, you'll index two documents with vector values of `[0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0]` and `[0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1]`:

```json
PUT _bulk
{"index": {"_index": "test-binary-hnsw", "_id": "1"}}
{"my_vector": [7, 8]}
{"index": {"_index": "test-binary-hnsw", "_id": "2"}}
{"my_vector": [10, 11]}
```

Finally, search for the vector closest to a query vector of `[0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0]`:

```json
GET /test-binary-hnsw/_search
{
"size": 1,
"query": {
"knn": {
"my_vector": {
"vector": [108, -116],
"k": 1
}
}
}
}
```

## Performance comparison

Now let's look at the resource savings realized by using binary vectors. In our benchmarking tests, we observed similar ingestion speeds and query times between FP32 and binary vectors, even while using 8x less powerful hardware for binary vectors, as shown in the following image.

<img src="/assets/media/blog-images/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors/pic3.png" alt="performance" class="img-centered"/>

### Cluster setup

Our benchmark used a randomly generated 100-million vector dataset with 768 dimensions, comparing FP32 and binary vectors. The clusters were identical except for the data nodes: binary vectors used nodes that were 2x smaller and 4x fewer, leading to an 86% cost reduction. The following table outlines the benchmarking setup.

<img src="/assets/media/blog-images/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors/table1.png" alt="performance" class="img-centered"/>

### Performance results

Even on 8x smaller hardware, binary vectors delivered indexing speeds and query times comparable to FP32 vectors on more powerful machines. With binary vectors, memory usage was reduced by 92% and storage by 97%, producing significant savings. The results are presented in the following table.

<img src="/assets/media/blog-images/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors/table2.png" alt="performance" class="img-centered"/>

### Accuracy

In terms of recall, you can expect around 0.97 recall compared to exact search. OpenSearch uses the HNSW algorithm for approximate nearest neighbor searches, but the accuracy of the results depends on your dataset. Some models produce binary vectors with high accuracy: for example, [Cohere Embed v3](https://cohere.com/blog/int8-binary-embeddings) reports a 94.7% match in search quality compared to FP32 embeddings. Thus, when using a model producing quality binary embeddings, binary vectors can reach nearly the same accuracy as FP32 vectors.

## Binary vector challenges: When they fall short

When your model only produces FP32 vectors but you'd like to use binary vectors in OpenSearch, the process can get a little complicated. The following example demonstrates using binary vector search in OpenSearch with FP32 vectors, along with the challenges involved.

For this example, we used the [Cohere Simple dataset](https://huggingface.co/datasets/Cohere/wikipedia-22-12-simple-embeddings) from Hugging Face. Because the data was in FP32 format, we converted it to binary format by setting zero and negative values to `0` and positive values to `1`. The following image illustrates the conversion process.

<img src="/assets/media/blog-images/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors/pic4.png" alt="binary-quantization" class="img-centered"/>

In terms of recall, binary vectors achieved a score of 0.73196. To reach a recall of 0.93865, we needed 3x oversampling. Additionally, rescoring requires storing the original vector format, which adds disk usage unless the original vector is stored outside of OpenSearch. The following image illustrates oversampling with rescoring. Note that recall may vary across datasets.

Check failure on line 139 in _posts/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors.md#L139

[OpenSearch.Spelling] Error: rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors.md", "range": {"start": {"line": 139, "column": 136}}}, "severity": "ERROR"}

Check failure on line 139 in _posts/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors.md#L139

[OpenSearch.Spelling] Error: rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors.md", "range": {"start": {"line": 139, "column": 323}}}, "severity": "ERROR"}

<img src="/assets/media/blog-images/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors/pic5.png" alt="oversampling-re-scoring" class="img-centered"/>

By quantizing, oversampling, and rescoring, binary vectors can achieve similar recall to FP32 vectors while using significantly less memory. However, managing these steps outside of OpenSearch can be cumbersome. [Disk-based vector search](https://opensearch.org/docs/latest/search-plugins/knn/disk-based-vector-search/) simplifies the process by automatically using advanced quantization techniques for all required steps. Give it a try and see the difference!

Check failure on line 143 in _posts/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors.md#L143

[OpenSearch.Spelling] Error: rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors.md", "range": {"start": {"line": 143, "column": 34}}}, "severity": "ERROR"}

## Conclusion

Binary vectors offer an efficient alternative to FP32 vectors, reducing memory and storage usage by more than 90% while maintaining strong performance on smaller hardware. This efficiency makes binary vectors ideal for large-scale vector search applications like recommendation systems or search engines, where speed and resource savings are critical. If you're handling massive datasets, binary vectors provide a practical way to scale search capabilities without increasing costs.

## What's next?

Binary vector support is now available in OpenSearch 2.16 and later. Check out the OpenSearch [binary vector documentation](https://opensearch.org/docs/latest/field-types/supported-field-types/knn-vector#binary-vectors) for detailed instructions.

If you're working with floating-point vectors, [disk-based vector search](https://opensearch.org/docs/latest/search-plugins/knn/disk-based-vector-search/) provides the memory efficiency of binary vector search without losing recall. It performs binary quantization, oversampling, and rescoring automatically, all while maintaining the low memory usage of binary vector search.

Check failure on line 153 in _posts/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors.md#L153

[OpenSearch.Spelling] Error: rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-10-30-lower-your-cost-on-opensearch-using-binary-vectors.md", "range": {"start": {"line": 153, "column": 285}}}, "severity": "ERROR"}

We encourage you to try binary vectors for yourself and to stay tuned for our upcoming blog post on disk-based vector search.
5 changes: 5 additions & 0 deletions _posts/2024-11-13-boosting-k-nn-exact-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,8 @@ The tests showed that OpenSearch's new SIMD support and optimized memory access
## What's next for exact k-NN search?

Future OpenSearch versions will provide even more k-NN search flexibility. You'll be able to switch between exact and approximate search at query time. Additionally, future versions will provide the ability to specify which fields build indexes for exact and approximate search types. Stay tuned for these updates as we continue to improve OpenSearch's k-NN search capabilities.


#### Note

To use the optimized Lucene format in OpenSearch 2.17, set `index.knn` to `true` in order to build approximate nearest neighbor (ANN) data structures. In OpenSearch 2.18, a new `index.knn.advanced.approximate_threshold` setting is available. If you are performing only exact searches, set this value to `-1` to reduce indexing time.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.