[ENHANCEMENT] Partial vector search datastrcuture loading in memory for doing search #1693

navneet1v · 2024-05-07T06:08:17Z

Description

Currently k-NN plugin supports 3 engines, Nmslib, Faiss and Lucene. The first 2 being native engines requires the whole vector related data structures(aka HNSW graphs here) to be loaded into memory for these engines to work. For large workloads the cost can soon become high if some quantization techniques are not applied.

If we look closely a HNSW graph mainly consist of below things:

Full precision 32 bit vectors
Graph representations
Metadata like dimensions, number of vectors, space type, headers etc.

From the above items, main memory is used by these full precision vectors(4bytes * number of vectors * number of dim).

The way Faiss stores these vectors is in a Flat Index and during serialization and deserialization these vectors are written and read to/from the file and put in the main memory which increases the memory consumption

Proposal

The proposal I am putting here is we can avoid loading this Flat Index into memory and as required we can read full precision vectors from disk on demand/or from cache(limited sized cache). The cache can be tiered cache too. Ref: opensearch-project/OpenSearch#10024
This will definitely have an impact on the search latency, but overall it will reduce the cost.

More Details Needed

We need to chalk out more details on where this new Index needs to be present like should it be in Faiss or should it be in k-NN plugin.
Should this new index me mmaped just like other files?
Does the Full precision vectors be stored in a separate file or in the same file as graph?
Can we directly use Binary docValues as an Faiss Index?
How to place vectors in the File to ensure min disk round trips.

What alternatives have you considered?

Given that Lucene doesn't load all the vectors in memory but mmap the .vec file, Lucene engine is able to work with low memory footprint, but it takes a hit on the Latency.

But using Lucene Engine posses other challenges like no support of fp16, dynamic search parameters, SIMD instruction set etc.

jmazanec15 · 2024-05-09T16:18:42Z

Seems plausible. My initial opinion is to map the portion of the faiss file into memory that has the flat index and use that.

In regards to binary doc values vs lucene vectors values vs native engine vectors, I think we would end up writing a lot of glue code in order to get binary doc values/lucene vectors to work, so may be good to wait on this.

In long run, I think each engine should implement Lucene's KnnVectorsFormat. Then, a particular engine can decide how it wants to handle its vectors - i.e. delegate to lucene flatformat or store things on its own. With this, we could support both fp16, binary vectors, etc. In terms of dynamic search parameters, I think these would be able to be passed through KnnCollector.

luyuncheng · 2024-05-09T16:32:16Z

My initial opinion is to map the portion of the faiss file into memory that has the flat index and use that.

i have been tried rewrite an new IndexFlat in faiss as new VectorStore without read all vector into memory. but the iops and latency are not good. also i test Lucene's KnnVectorsFormat performance dependents on page cache.

luyuncheng · 2024-05-09T16:38:11Z

Can we directly use Binary docValues as an Faiss Index?

if we have an jni access the KnnVectorsFormat we can extends IndexFlatCodes and implements the interface and access the vector with KnnVectorsFormat but it would introduce io and many memory copies

i am wondering which is better: 1. implements an new IndexFlat vector store in Faiss ( also can easy support fp16 wich SQuantizer)or 2. implements jni interface to Lucene#KnnVectorsFormat

luyuncheng · 2024-05-09T16:44:50Z

BTW i wanna talk about diskann. and it shows great memory reduce, and less disk round trips (which iops and io throughput less than KnnVectorsFormat)

navneet1v · 2024-05-09T17:30:49Z

@jmazanec15 and @luyuncheng thanks for all your comments. I created this issue to discuss on the specific part around making our current HNSW disk based with Faiss without even bringing any new algorithm.

My initial opinion is to map the portion of the faiss file into memory that has the flat index and use that.

As you know there is only 1 file. So either we need to build something in faiss that can read HNSW graph separately and then map vectors from same file partially in memory, or we can actually during serialization put vectors and HNSW structures in separate files. I don't have a strong opinion on this. whatever is feasible my vote is on that.

Whether we should KNNVectorFormat/BinaryDocValues/FlatIndices, my problem with all 3 of them is they are not optimized for reads as HNSW algo progresses. So, my thinking would be to take some ideas from DiskANN implementation and put vectors and its neighbors side by side. This can reduce iops.

In long run, I think each engine should implement Lucene's KnnVectorsFormat.

@jmazanec15 on this I think you mean an interface like KNNVectorsFormat where serialization/de-serialization is handled by Engine and how to read from the file. right?

BTW i wanna talk about diskann. and it shows great memory reduce, and less disk round trips (which iops and io throughput less than KnnVectorsFormat)

@luyuncheng I think we are aware of diskANN and what benefits it bring in. There are multiple implementations diskANN too like JVector, MS DiskANN etc. The question which we need to ans is which algorithm we want to main in this plugin for diskANN ?
Because 1 stream of thought atleast I have is lets just open up the KNNEngine interface along with mapper and query to allow community user to add their own algorithm in k-NN plugin rather than we maintaining different algorithms. Some issues related to same: #1328, #1540.

jmazanec15 · 2024-05-09T17:45:54Z

created this issue to discuss on the specific part around making our current HNSW disk based with Faiss without even bringing any new algorithm.

I see. Yes, then I think we need to focus on hnsw disk access patterns and what we can do (i.e. vector ordering/caching strategy). As @luyuncheng mentioned, I have also seen performance very slow when using lucene's hnsw in memory constrained environment and its because of excessive disk reads - Im guessing faiss will have same issue. So Im not sure itd be worth adding disk-access support before figuring out how to make it more efficient.

@jmazanec15 on this I think you mean an interface like KNNVectorsFormat where serialization/de-serialization is handled by Engine and how to read from the file. right?

Im saying that each engine (faiss, nmslib) should implement KnnVectorsFormat. Then, internally they can decide how they want to handle the vectors (i.e. use Lucene's flatvectors writers and consume it or use the engines to create files directly). Doing this, I think will make it easier to extend and experiment. Basically, extension point is a KnnVectorsFormat, an engine name, and a structure of search/index/train params. It will also logically isolate engines from each other.

navneet1v · 2024-05-09T17:53:33Z

Im saying that each engine (faiss, nmslib) should implement KnnVectorsFormat. Then, internally they can decide how they want to handle the vectors (i.e. use Lucene's flatvectors writers and consume it or use the engines to create files directly). Doing this, I think will make it easier to extend and experiment. Basically, extension point is a KnnVectorsFormat, an engine name, and a structure of search/index/train params. It will also logically isolate engines from each other.

Yeah, I think its the same thing which I am thinking. But I don't want to rely on an interface from Lucene as they can change those interfaces without considering things which we would have done in the plugin. Hence I like the idea of having a common interface, but not exactly extending KnnVectorsFormat. KNNEngine interface has some promise, but it is bit overloaded. We can think over it.

jmazanec15 · 2024-05-09T18:26:03Z

@navneet1v sounds good - maybe we can come up with a proposal

navneet1v · 2024-05-09T18:43:40Z

I see. Yes, then I think we need to focus on hnsw disk access patterns and what we can do (i.e. vector ordering/caching strategy).

Yes exactly. The way I want to break this problem is in 2 part:

Create a new FlatIndex that can be partially mapped in memory.
Work on serializing the nodes and neighbors vectors in an optimal format by looking at the how disk ann does this.

As @luyuncheng mentioned, I have also seen performance very slow when using lucene's hnsw in memory constrained environment and its because of excessive disk reads - Im guessing faiss will have same issue. So Im not sure itd be worth adding disk-access support before figuring out how to make it more efficient.

Yeah I think the reason why it happens is the FlatIndex and KNNVectorFormats are just plain list of vectors without any optimization.

navneet1v · 2024-05-09T18:44:32Z

@navneet1v sounds good - maybe we can come up with a proposal

Sure I can come up with a proposal.

0ctopus13prime · 2024-08-14T00:13:59Z

After resolved this issue (#1951), will present few ideas. We can discuss pros and cons from there.
Thank you

0ctopus13prime · 2024-11-07T21:38:08Z

Currently drafting the first edition of design document on this.

navneet1v added untriaged enhancement labels May 7, 2024

navneet1v changed the title ~~[FEATURE] Support disk based access of Vectors for Faiss HNSW algorithm~~ [ENHANCEMENT] Support disk based access of Vectors for Faiss HNSW algorithm May 7, 2024

navneet1v removed the untriaged label May 7, 2024

navneet1v added this to Vector Search RoadMap May 7, 2024

github-project-automation bot moved this to Backlog in Vector Search RoadMap May 7, 2024

github-actions bot added the untriaged label May 7, 2024

navneet1v added storage-improvements memory-reduction and removed untriaged labels May 7, 2024

luyuncheng mentioned this issue May 14, 2024

[Enhancement] [Build-Time V1] Reduce Memory Footprint during Native Index Creation #1600

Closed

navneet1v mentioned this issue Jul 11, 2024

[RFC] Optimized Disk-Based Vector Search #1779

Closed

navneet1v mentioned this issue Jul 18, 2024

[RFC] Integrating KNNVectorsFormat in Native Vector Search Engine #1853

Closed

vamshin moved this from Backlog to Backlog (Hot) in Vector Search RoadMap Aug 12, 2024

navneet1v assigned 0ctopus13prime Aug 14, 2024

0ctopus13prime mentioned this issue Sep 4, 2024

[RFC] Introducing Loading/Writing Layer in Native KNN Engines #2033

Closed

navneet1v changed the title ~~[ENHANCEMENT] Support disk based access of Vectors for Faiss HNSW algorithm~~ [ENHANCEMENT] Partial vector search datastrcuture loading in memory for doing search Oct 31, 2024

navneet1v mentioned this issue Nov 12, 2024

[RFC]Reduce Disk Usage By Reusing NativeEngine Files #2266

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENHANCEMENT] Partial vector search datastrcuture loading in memory for doing search #1693

[ENHANCEMENT] Partial vector search datastrcuture loading in memory for doing search #1693

navneet1v commented May 7, 2024 •

edited

Loading

jmazanec15 commented May 9, 2024

luyuncheng commented May 9, 2024 •

edited

Loading

luyuncheng commented May 9, 2024 •

edited

Loading

luyuncheng commented May 9, 2024 •

edited

Loading

navneet1v commented May 9, 2024

jmazanec15 commented May 9, 2024

navneet1v commented May 9, 2024

jmazanec15 commented May 9, 2024

navneet1v commented May 9, 2024

navneet1v commented May 9, 2024

0ctopus13prime commented Aug 14, 2024

0ctopus13prime commented Nov 7, 2024

[ENHANCEMENT] Partial vector search datastrcuture loading in memory for doing search #1693

[ENHANCEMENT] Partial vector search datastrcuture loading in memory for doing search #1693

Comments

navneet1v commented May 7, 2024 • edited Loading

Description

Proposal

More Details Needed

What alternatives have you considered?

jmazanec15 commented May 9, 2024

luyuncheng commented May 9, 2024 • edited Loading

luyuncheng commented May 9, 2024 • edited Loading

luyuncheng commented May 9, 2024 • edited Loading

navneet1v commented May 9, 2024

jmazanec15 commented May 9, 2024

navneet1v commented May 9, 2024

jmazanec15 commented May 9, 2024

navneet1v commented May 9, 2024

navneet1v commented May 9, 2024

0ctopus13prime commented Aug 14, 2024

0ctopus13prime commented Nov 7, 2024

navneet1v commented May 7, 2024 •

edited

Loading

luyuncheng commented May 9, 2024 •

edited

Loading

luyuncheng commented May 9, 2024 •

edited

Loading

luyuncheng commented May 9, 2024 •

edited

Loading