feat: lightweight, pure rust k-ANN vector database for long-term memory/knowledge-base #2

jon-chuang · 2023-04-12T11:38:17Z

I think the next step in the project is lightweight ANN (approx. k-nearest neighbours search) vector database. Applications:

Document store over local documents: code bases, journals, articles. Input: directory with text files.
Chrome-assistant: A memory over your currently and recently opened tabs with llama-wasm
Mobile: similar to local

Details:

The k-ANN database should always be an optional dep compiled under a feature flag.
We will reuse the loaded model for encoding. See: here, section 4.3. It suggests using the analog to [CLS] token. See e.g. LlamaIndex. Not too sure for decoding. It can just return the text string from the metadata. Alternately, one can load an embedding/decoding model that is serialized to ggml.

Problem definition:

We are ok trading off a bit of performance to have something that has minimal surface area
We want the database to be persistent - it should persist the index after generation. The model is very similar to ggml, we should generate the artifact once, or periodically, and then load the artifact (index) into memory.

Options:

❌ connect to existing vector database (qdrant, milvus, pinecone). But these are heavy deps, and also have many features designed that are more about scaling out (cloud native). We like transparency and owning the artifacts involved. We are willing to tradeoff a bit of perf and or implementation complexity for aim.
❌ Compile faiss as an optional dependency. Still a pretty huge dependency.
🦀 Something rust-native: e.g. hora. Not actively maintained, but still works and I've run it locally. We can slice out the core functionality (e.g. just HNSW). It already has a persisted format for index. We can add mmap support (optimization). Hopefully we can slice out to about 2K loc.

Plan:

Use prompt engineering to allow model to indicate via a special unicode sequence. llama-rs will detect the unicode sequence and trigger a database lookup.
- not clear how well this would work. Ideally, we should have prompt-tuned this, LoRA-based fine-tuning might work.
Implement either partial encoding with existing LLM, or allow loading an embedding model.

hhamud · 2023-04-12T21:02:10Z

interesting, like a rust specific version of this?

It would also be interesting if we could use this to store prompts and their outputs in the database and press the up key to re-use previous prompts or their outputs but we wouldn't even need a vector database for this specifically, we could just do that with a typical SQL database.

We would also need to re-visit this rustformers/llm#56

jon-chuang · 2023-04-12T23:25:23Z

interesting, like a rust specific version of this?

Yes, there are many options available but they mainly offer the same type of indexes.

re-use previous prompts or their outputs

The problem with a hash table or KV store is that natural language queries are rarely exactly the same, especially if you are not averaging over the human population but just running local.

Milvus has already promoted a similarity search-based "cacheing" as one of its applications (repo)

philpax · 2023-04-13T00:45:12Z

I think this is out of scope for this repository specifically. I could see a batteries-included implementation being built atop llama-rs, but it's unlikely to feature an implementation of a vector database itself because our focus is specifically on robust, fast inference of LLMs.

jon-chuang · 2023-04-13T01:01:29Z

our focus is specifically on robust, fast inference of LLMs.

Yes, but I think the broader focus is "low-resource, low-dependency embedded LLM toolchain".

I can definitely see the sliced out k-ANN code existing in a separate repo (perhaps under this org) and compiled in as an optional dependency to llama-rs and available in the cli (on crates.io it would be cargo install llama-rs --features "knowledge-base")

hhamud · 2023-04-13T01:09:35Z

"low-resource, low-dependency embedded LLM toolchain".

Literally what I was thinking of yesterday

jon-chuang · 2023-04-13T01:51:40Z

I've made an issue here to sound out the idea: ggerganov/llama.cpp#930

philpax · 2023-04-13T01:57:43Z

I can definitely see the sliced out k-ANN code existing in a separate repo (perhaps under this org) and compiled in as an optional dependency to llama-rs and available in the cli (on crates.io it would be cargo install llama-rs --features "knowledge-base")

Sure, but I don't see why it would have to be part of llama-rs specifically. The CLI is really just a demo application for the library; it doesn't aspire to higher functionality than that.

I'm not opposed to having this kind of functionality - having a full-stack solution for using a LLM to do knowledge base inference would be great - but I think it's a hard sell to make it part of this crate specifically. By analogy, we're like hyper, not reqwest - we're not trying to solve all the problems, just the core problem that enables other people to solve their problems.

jon-chuang · 2023-04-13T02:11:10Z

but I think it's a hard sell to make it part of this crate specifically.

I'm in agreement here. But do you think that rustformers org more generally could be expanded to this broader scope of a low-resource LLM toolchain and host the broader-scoped llama-rs-toolchain?

philpax · 2023-04-16T14:27:55Z

Sorry - meant to get back to you earlier. Yeah, I think having this as part of a larger solution would be great. I've created this repository to track issues that aren't directly related to llama-rs, but are for the ecosystem around it.

Has anyone experimented with this? Are there any estimates on how much work it would be?

jon-chuang · 2023-04-25T06:59:39Z

I’ve not experimented, but it’s on my (currently very long) todo list. I estimate it could be a week of work to get the code in place, but it may take some additional experimentation with prompting (e.g. to emit sequence of tokens indicating search action) to get the models to work well with the knowledge base.

I’ll hopefully get to it once I’m back from holiday.

hhamud · 2023-05-08T13:41:08Z

Any updates on this? @jon-chuang

itsbalamurali · 2023-06-13T19:48:52Z

@jon-chuang @hhamud & @philpax i've taken a dig at porting chroma to rust: https://gist.github.com/itsbalamurali/118e7ce18f1519f26780b9845dee4e87 has the basic structure to it.

needs : https://github.com/chroma-core/chroma/blob/d98be4d0bfb760155d9f85c9012952ef459c10a6/chromadb/db/clickhouse.py#L583

hhamud · 2023-06-18T20:53:45Z

@jon-chuang @hhamud & @philpax i've taken a dig at porting chroma to rust: https://gist.github.com/itsbalamurali/118e7ce18f1519f26780b9845dee4e87 has the basic structure to it.

needs : https://github.com/chroma-core/chroma/blob/d98be4d0bfb760155d9f85c9012952ef459c10a6/chromadb/db/clickhouse.py#L583

Nice, do you have an actual full repo to share rather than just a gist?

shkr · 2023-07-10T10:15:10Z

I am interested in implementing a rust knowledge base for llms

zicklag · 2023-07-10T13:43:58Z

Cozo might be useful. I'm totally out-of-the-loop, so it might not work for what you're looking for. I figured I'd share just in case.

ealmloff · 2023-07-10T15:30:55Z

I implemented an in memory version of this as part of Floneum. Here is the relivent code: https://github.com/floneum/floneum/blob/master/plugin/src/vector_db.rs

Instant distance is fairly easy to work with and actively maintained

shkr · 2023-07-14T11:44:18Z

Cozo might be useful. I'm totally out-of-the-loop, so it might not work for what you're looking for. I figured I'd share just in case.

Thanks cozo is very interesting, and might solve the use case I was thinking of.

ayourtch · 2023-08-10T22:52:54Z

I saw https://github.com/tensorchord/pgvecto.rs today - it fits the bill of “rust only”. (Admittedly I am too new to this field to even fully understand if this is relevant or not, but in case someone might find it useful)

jon-chuang changed the title ~~feat: lightweight, native ann-vector database for long-term memory~~ feat: lightweight, native ann-vector database for long-term memory/knowledge-base Apr 12, 2023

jon-chuang changed the title ~~feat: lightweight, native ann-vector database for long-term memory/knowledge-base~~ feat: lightweight, native k-ANN vector database for long-term memory/knowledge-base Apr 12, 2023

jon-chuang changed the title ~~feat: lightweight, native k-ANN vector database for long-term memory/knowledge-base~~ feat: lightweight, pure rust k-ANN vector database for long-term memory/knowledge-base Apr 12, 2023

philpax transferred this issue from rustformers/llm Apr 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: lightweight, pure rust k-ANN vector database for long-term memory/knowledge-base #2

feat: lightweight, pure rust k-ANN vector database for long-term memory/knowledge-base #2

jon-chuang commented Apr 12, 2023 •

edited

Loading

hhamud commented Apr 12, 2023 •

edited

Loading

jon-chuang commented Apr 12, 2023 •

edited

Loading

philpax commented Apr 13, 2023

jon-chuang commented Apr 13, 2023 •

edited

Loading

hhamud commented Apr 13, 2023

jon-chuang commented Apr 13, 2023

philpax commented Apr 13, 2023

jon-chuang commented Apr 13, 2023 •

edited

Loading

philpax commented Apr 16, 2023

jon-chuang commented Apr 25, 2023 •

edited

Loading

hhamud commented May 8, 2023

itsbalamurali commented Jun 13, 2023 •

edited

Loading

hhamud commented Jun 18, 2023

shkr commented Jul 10, 2023

zicklag commented Jul 10, 2023

ealmloff commented Jul 10, 2023 •

edited

Loading

shkr commented Jul 14, 2023

ayourtch commented Aug 10, 2023

feat: lightweight, pure rust k-ANN vector database for long-term memory/knowledge-base #2

feat: lightweight, pure rust k-ANN vector database for long-term memory/knowledge-base #2

Comments

jon-chuang commented Apr 12, 2023 • edited Loading

hhamud commented Apr 12, 2023 • edited Loading

jon-chuang commented Apr 12, 2023 • edited Loading

philpax commented Apr 13, 2023

jon-chuang commented Apr 13, 2023 • edited Loading

hhamud commented Apr 13, 2023

jon-chuang commented Apr 13, 2023

philpax commented Apr 13, 2023

jon-chuang commented Apr 13, 2023 • edited Loading

philpax commented Apr 16, 2023

jon-chuang commented Apr 25, 2023 • edited Loading

hhamud commented May 8, 2023

itsbalamurali commented Jun 13, 2023 • edited Loading

hhamud commented Jun 18, 2023

shkr commented Jul 10, 2023

zicklag commented Jul 10, 2023

ealmloff commented Jul 10, 2023 • edited Loading

shkr commented Jul 14, 2023

ayourtch commented Aug 10, 2023

jon-chuang commented Apr 12, 2023 •

edited

Loading

hhamud commented Apr 12, 2023 •

edited

Loading

jon-chuang commented Apr 12, 2023 •

edited

Loading

jon-chuang commented Apr 13, 2023 •

edited

Loading

jon-chuang commented Apr 13, 2023 •

edited

Loading

jon-chuang commented Apr 25, 2023 •

edited

Loading

itsbalamurali commented Jun 13, 2023 •

edited

Loading

ealmloff commented Jul 10, 2023 •

edited

Loading