Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT] Partial vector search datastrcuture loading in memory for doing search #1693

Open
navneet1v opened this issue May 7, 2024 · 12 comments

Comments

@navneet1v
Copy link
Collaborator

navneet1v commented May 7, 2024

Description

Currently k-NN plugin supports 3 engines, Nmslib, Faiss and Lucene. The first 2 being native engines requires the whole vector related data structures(aka HNSW graphs here) to be loaded into memory for these engines to work. For large workloads the cost can soon become high if some quantization techniques are not applied.

If we look closely a HNSW graph mainly consist of below things:

  1. Full precision 32 bit vectors
  2. Graph representations
  3. Metadata like dimensions, number of vectors, space type, headers etc.

From the above items, main memory is used by these full precision vectors(4bytes * number of vectors * number of dim).

The way Faiss stores these vectors is in a Flat Index and during serialization and deserialization these vectors are written and read to/from the file and put in the main memory which increases the memory consumption

Proposal

The proposal I am putting here is we can avoid loading this Flat Index into memory and as required we can read full precision vectors from disk on demand/or from cache(limited sized cache). The cache can be tiered cache too. Ref: opensearch-project/OpenSearch#10024
This will definitely have an impact on the search latency, but overall it will reduce the cost.

More Details Needed

  1. We need to chalk out more details on where this new Index needs to be present like should it be in Faiss or should it be in k-NN plugin.
  2. Should this new index me mmaped just like other files?
  3. Does the Full precision vectors be stored in a separate file or in the same file as graph?
  4. Can we directly use Binary docValues as an Faiss Index?
  5. How to place vectors in the File to ensure min disk round trips.

What alternatives have you considered?

Given that Lucene doesn't load all the vectors in memory but mmap the .vec file, Lucene engine is able to work with low memory footprint, but it takes a hit on the Latency.

But using Lucene Engine posses other challenges like no support of fp16, dynamic search parameters, SIMD instruction set etc.

@navneet1v navneet1v changed the title [FEATURE] Support disk based access of Vectors for Faiss HNSW algorithm [ENHANCEMENT] Support disk based access of Vectors for Faiss HNSW algorithm May 7, 2024
@jmazanec15
Copy link
Member

Seems plausible. My initial opinion is to map the portion of the faiss file into memory that has the flat index and use that.

In regards to binary doc values vs lucene vectors values vs native engine vectors, I think we would end up writing a lot of glue code in order to get binary doc values/lucene vectors to work, so may be good to wait on this.

In long run, I think each engine should implement Lucene's KnnVectorsFormat. Then, a particular engine can decide how it wants to handle its vectors - i.e. delegate to lucene flatformat or store things on its own. With this, we could support both fp16, binary vectors, etc. In terms of dynamic search parameters, I think these would be able to be passed through KnnCollector.

@luyuncheng
Copy link
Collaborator

luyuncheng commented May 9, 2024

My initial opinion is to map the portion of the faiss file into memory that has the flat index and use that.

i have been tried rewrite an new IndexFlat in faiss as new VectorStore without read all vector into memory. but the iops and latency are not good. also i test Lucene's KnnVectorsFormat performance dependents on page cache.

@luyuncheng
Copy link
Collaborator

luyuncheng commented May 9, 2024

  1. Can we directly use Binary docValues as an Faiss Index?

if we have an jni access the KnnVectorsFormat we can extends IndexFlatCodes and implements the interface and access the vector with KnnVectorsFormat but it would introduce io and many memory copies

i am wondering which is better: 1. implements an new IndexFlat vector store in Faiss ( also can easy support fp16 wich SQuantizer)or 2. implements jni interface to Lucene#KnnVectorsFormat

@luyuncheng
Copy link
Collaborator

luyuncheng commented May 9, 2024

BTW i wanna talk about diskann. and it shows great memory reduce, and less disk round trips (which iops and io throughput less than KnnVectorsFormat)

@navneet1v
Copy link
Collaborator Author

@jmazanec15 and @luyuncheng thanks for all your comments. I created this issue to discuss on the specific part around making our current HNSW disk based with Faiss without even bringing any new algorithm.

My initial opinion is to map the portion of the faiss file into memory that has the flat index and use that.

As you know there is only 1 file. So either we need to build something in faiss that can read HNSW graph separately and then map vectors from same file partially in memory, or we can actually during serialization put vectors and HNSW structures in separate files. I don't have a strong opinion on this. whatever is feasible my vote is on that.

Whether we should KNNVectorFormat/BinaryDocValues/FlatIndices, my problem with all 3 of them is they are not optimized for reads as HNSW algo progresses. So, my thinking would be to take some ideas from DiskANN implementation and put vectors and its neighbors side by side. This can reduce iops.

In long run, I think each engine should implement Lucene's KnnVectorsFormat.

@jmazanec15 on this I think you mean an interface like KNNVectorsFormat where serialization/de-serialization is handled by Engine and how to read from the file. right?

BTW i wanna talk about diskann. and it shows great memory reduce, and less disk round trips (which iops and io throughput less than KnnVectorsFormat)

@luyuncheng I think we are aware of diskANN and what benefits it bring in. There are multiple implementations diskANN too like JVector, MS DiskANN etc. The question which we need to ans is which algorithm we want to main in this plugin for diskANN ?
Because 1 stream of thought atleast I have is lets just open up the KNNEngine interface along with mapper and query to allow community user to add their own algorithm in k-NN plugin rather than we maintaining different algorithms. Some issues related to same: #1328, #1540.

@jmazanec15
Copy link
Member

created this issue to discuss on the specific part around making our current HNSW disk based with Faiss without even bringing any new algorithm.

I see. Yes, then I think we need to focus on hnsw disk access patterns and what we can do (i.e. vector ordering/caching strategy). As @luyuncheng mentioned, I have also seen performance very slow when using lucene's hnsw in memory constrained environment and its because of excessive disk reads - Im guessing faiss will have same issue. So Im not sure itd be worth adding disk-access support before figuring out how to make it more efficient.

@jmazanec15 on this I think you mean an interface like KNNVectorsFormat where serialization/de-serialization is handled by Engine and how to read from the file. right?

Im saying that each engine (faiss, nmslib) should implement KnnVectorsFormat. Then, internally they can decide how they want to handle the vectors (i.e. use Lucene's flatvectors writers and consume it or use the engines to create files directly). Doing this, I think will make it easier to extend and experiment. Basically, extension point is a KnnVectorsFormat, an engine name, and a structure of search/index/train params. It will also logically isolate engines from each other.

@navneet1v
Copy link
Collaborator Author

Im saying that each engine (faiss, nmslib) should implement KnnVectorsFormat. Then, internally they can decide how they want to handle the vectors (i.e. use Lucene's flatvectors writers and consume it or use the engines to create files directly). Doing this, I think will make it easier to extend and experiment. Basically, extension point is a KnnVectorsFormat, an engine name, and a structure of search/index/train params. It will also logically isolate engines from each other.

Yeah, I think its the same thing which I am thinking. But I don't want to rely on an interface from Lucene as they can change those interfaces without considering things which we would have done in the plugin. Hence I like the idea of having a common interface, but not exactly extending KnnVectorsFormat. KNNEngine interface has some promise, but it is bit overloaded. We can think over it.

@jmazanec15
Copy link
Member

@navneet1v sounds good - maybe we can come up with a proposal

@navneet1v
Copy link
Collaborator Author

I see. Yes, then I think we need to focus on hnsw disk access patterns and what we can do (i.e. vector ordering/caching strategy).

Yes exactly. The way I want to break this problem is in 2 part:

  1. Create a new FlatIndex that can be partially mapped in memory.
  2. Work on serializing the nodes and neighbors vectors in an optimal format by looking at the how disk ann does this.

As @luyuncheng mentioned, I have also seen performance very slow when using lucene's hnsw in memory constrained environment and its because of excessive disk reads - Im guessing faiss will have same issue. So Im not sure itd be worth adding disk-access support before figuring out how to make it more efficient.

Yeah I think the reason why it happens is the FlatIndex and KNNVectorFormats are just plain list of vectors without any optimization.

@navneet1v
Copy link
Collaborator Author

@navneet1v sounds good - maybe we can come up with a proposal

Sure I can come up with a proposal.

@0ctopus13prime
Copy link
Contributor

After resolved this issue (#1951), will present few ideas. We can discuss pros and cons from there.
Thank you

@navneet1v navneet1v changed the title [ENHANCEMENT] Support disk based access of Vectors for Faiss HNSW algorithm [ENHANCEMENT] Partial vector search datastrcuture loading in memory for doing search Oct 31, 2024
@0ctopus13prime
Copy link
Contributor

Currently drafting the first edition of design document on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog (Hot)
Development

No branches or pull requests

4 participants