From 5562fa76a2ffaeb3dcf282226af56bc0036ac014 Mon Sep 17 00:00:00 2001 From: Naveen Tatikonda Date: Wed, 26 Jun 2024 22:14:24 -0500 Subject: [PATCH] Rephrase sentences Signed-off-by: Naveen Tatikonda --- ...024-06-19-optimizing-opensearch-with-fp16-quantization.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md b/_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md index d3039d2de6..3268f155bf 100644 --- a/_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md +++ b/_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md @@ -20,7 +20,7 @@ leading to higher memory requirements and increased operational costs. Faiss sca ## Why use Faiss scalar quantization? -When you index vectors in [OpenSearch 2.13](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.13.0.md) or later versions, you can configure your k-NN index to apply _scalar quantization_. Scalar quantization converts each dimension of a vector from a 32-bit floating-point (`fp32`) to a 16-bit floating-point (`fp16`) representation. Using the Faiss scalar quantizer (SQfp16), integrated in the k-NN plugin, you can get up to a 50% memory savings with a very minimal loss of recall (see [Benchmarking results](#benchmarking-results)). When used with [SIMD optimization](https://opensearch.org/docs/latest/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), +When you index vectors in [OpenSearch 2.13](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.13.0.md) or later versions, you can configure your k-NN index to apply _scalar quantization_. Scalar quantization converts each dimension of a vector from a 32-bit floating-point (`fp32`) to a 16-bit floating-point (`fp16`) representation. Using the Faiss scalar quantizer (SQfp16), integrated in the k-NN plugin, saves about 50% of the memory with minimal reduction in recall (see [Benchmarking results](#benchmarking-results)). When used with [SIMD optimization](https://opensearch.org/docs/latest/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQfp16 quantization can also significantly reduce search latencies and improve indexing throughput. ## How to use Faiss scalar quantization @@ -234,8 +234,7 @@ To achieve even greater memory efficiency, we plan to introduce `int8` quantizat This technique will enable a remarkable 75% reduction in memory requirements, or 4x compression, compared to full-precision vectors and we expect to find minimal reduction in recall. The quantizers will accept `fp32` vectors as input, perform online training, and quantize the data into byte-sized vectors, eliminating the need for external quantization or extra training steps. -Furthermore, we aim to release binary vector support, enabling an unprecedented 32x compression rate. This approach will further reduce memory consumption. In -addition to this we will soon add support for avx512 optimization which helps to further reduce search latency. +Furthermore, we aim to release binary vector support, enabling an unprecedented 32x compression rate. This approach will further reduce memory consumption. Moreover, we plan to incorporate AVX-512 optimization, which will contribute to further reducing search latency. Our ongoing analysis and tuning of OpenSearch lets you address large-scale similarity search while minimizing resource requirements and maximizing cost-effectiveness.