RAG Stack #34
Replies: 2 comments
-
Hi Riley! The decision to use (or not use) a vector database is not just a matter of scale - it can also be a matter of making the iteration on the system easier. No matter what stack you chose, you want to always know what/where is the latest scraped data, has it been embedded already and how and where are the vectors. The "how" aspect means storing all the parameters like chunking, model arguments, model(s) choice etc. Most people start with an in-process vector search library like FAISS, Voyager, usearch bundled into their python service - and then everytime they start the service, they read the latest data from their DB or S3/GCS and re-calculate the vectors. The library choice boils down to the features you need FAISS is pretty bare-bones (i.e. no hybrid search), Voyager is optimized for higher precision (of the nearest neighbor algorithm) and usearch has features like semantic joins. Using Chroma or other in-process DBs is another option if you need even more features. Managed solutions like Pinecone or Turbopuffer are useful if you don't want to keep re-embedding your data upon system restarts. Keep in mind that you will than have to manage and synchronize 2 systems - your application server and the managed DB. This is super common when you build any kind of web app, so it's totally doable! We have a few RAG articles up on VectorHub already - RAG basics with FAISS and a piece on improving RAG retrieval quality with agents. To give a more targeted advice, it would be helpful if you shared a few examples of queries and results that come up in your use-case and what would be a better result to show. Always important to get hands on with the data and talk on top of examples :) |
Beta Was this translation helpful? Give feedback.
-
Thanks Dan, super helpful. In our case, we'll have a completely new batch of data every day that we want to use hybrid search against, but the "answers" we want to embed for vector search won't change very often. It sounds like we should look at Voyager next? Basically, we're building a content recommendation engine. We use google to capture about 100 pages of relevant, recent content per person, then we surface the 3-5 most pressing items they need to know based on their job. There's no way to get google, bing, or even perplexity to understand what data is relevant to our customer, so we're trying to explicitly spell out 10 things that matter, then embed them and use vector search to surface items that actually fit the mold, then rank them. We'd like to avoid having an agent-per-item or a hierarchy of agents if we can still get the best performance. For https://hub.superlinked.com/enhancing-rag-with-a-multi-agent-system, are there services that already have this chain of agents built-in? It appears that link already has multi-query generation and a ranker akin to Reciprocal Rank Fusion in the diagram...there's a lot of moving pieces and it's usually best to simplify. Or is this a situation where we should keep everything modular so we can swap out best-of-breed as new tech drops? |
Beta Was this translation helpful? Give feedback.
-
Hey everyone! We are looking for a stack recommendation. Product goal is essentially a content recommendation engine, taking a set of 100 webpages and surfacing the best new items in the user's context. This seems like it would be an ideal RAG use case, especially with hybrid search.
In particular, we are thinking about 1. What's a prototype/MVP stack and 2. What would be a scalable production stack.
For #1, I haven't cracked the code on how to get the RAG to understand how to properly rank and surface the best items. I know what the user cares about, but embedding the question and prompt engineering didn't produce good enough results (or we did it poorly) . We think we need to try embedding the answers. But I havent found a workflow automation stack that makes that development iteration practical... I'm sure it exists, but it's not vellum or azure or python+Langchain.
I was thinking about trying cohere and weaviate next... Chroma doesn't have a ready made hybrid search library, weaviate does but a full vectordb feels like overkill for this amount of data.
Pretty new to RAG architectures, so any advice is appreciated!
Beta Was this translation helpful? Give feedback.
All reactions