feat: various improvements

superlinear-ai · Oct 6, 2024 · fc8fb27 · fc8fb27
1 parent c9f4d0d
commit fc8fb27
Show file tree

Hide file tree

Showing 4 changed files with 34 additions and 24 deletions.
diff --git a/README.md b/README.md
@@ -9,8 +9,8 @@ RAGLite is a Python package for Retrieval-Augmented Generation (RAG) with Postgr
 ##### Configurable
 
 - 🧠 Choose any LLM provider with [LiteLLM](https://github.com/BerriAI/litellm), including local [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) models
-- 💾 Either [PostgreSQL](https://github.com/postgres/postgres) or [SQLite](https://github.com/sqlite/sqlite) as a keyword & vector search database
-- 🥇 Choose any reranker with [Rerankers](https://github.com/AnswerDotAI/rerankers), and fast multi-lingual [FlashRank](https://github.com/PrithivirajDamodaran/FlashRank) as the default
+- 💾 Choose either [PostgreSQL](https://github.com/postgres/postgres) or [SQLite](https://github.com/sqlite/sqlite) as a keyword & vector search database
+- 🥇 Choose any reranker with [Rerankers](https://github.com/AnswerDotAI/rerankers), including multi-lingual [FlashRank](https://github.com/PrithivirajDamodaran/FlashRank) as the default
 
 ##### Fast and permissive
 
@@ -22,15 +22,15 @@ RAGLite is a Python package for Retrieval-Augmented Generation (RAG) with Postgr
 - 📖 PDF to Markdown conversion on top of [pdftext](https://github.com/VikParuchuri/pdftext) and [pypdfium2](https://github.com/pypdfium2-team/pypdfium2)
 - 🧬 Multi-vector chunk embedding with [late chunking](https://weaviate.io/blog/late-chunking) and [contextual chunk headings](https://d-star.ai/solving-the-out-of-context-chunk-problem-for-rag)
 - ✂️ Optimal [level 4 semantic chunking](https://medium.com/@anuragmishra_27746/five-levels-of-chunking-strategies-in-rag-notes-from-gregs-video-7b735895694d) by solving a [binary integer programming problem](https://en.wikipedia.org/wiki/Integer_programming)
-- 🔍 [Hybrid search](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) that combines the database's built-in keyword search ([tsvector](https://www.postgresql.org/docs/current/datatype-textsearch.html), [FTS5](https://www.sqlite.org/fts5.html)) with their native vector search extensions ([pgvector](https://github.com/pgvector/pgvector), [sqlite-vec](https://github.com/asg017/sqlite-vec)[^1])
+- 🔍 [Hybrid search](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) with the database's native keyword & vector search ([tsvector](https://www.postgresql.org/docs/current/datatype-textsearch.html) + [pgvector](https://github.com/pgvector/pgvector), [FTS5](https://www.sqlite.org/fts5.html) + [sqlite-vec](https://github.com/asg017/sqlite-vec)[^1])
 - 🌀 Optimal [closed-form linear query adapter](src/raglite/_query_adapter.py) by solving an [orthogonal Procrustes problem](https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem)
 
 ##### Extensible
 
-- ✍️ Optional: conversion of any input document to Markdown with [Pandoc](https://github.com/jgm/pandoc)
-- ✅ Optional: evaluation of retrieval and generation performance with [Ragas](https://github.com/explodinggradients/ragas)
+- ✍️ Optional conversion of any input document to Markdown with [Pandoc](https://github.com/jgm/pandoc)
+- ✅ Optional evaluation of retrieval and generation performance with [Ragas](https://github.com/explodinggradients/ragas)
 
-[^1]: We use [PyNNDescent](https://github.com/lmcinnes/pynndescent) while [sqlite-vec](https://github.com/asg017/sqlite-vec) is still in development.
+[^1]: We use [PyNNDescent](https://github.com/lmcinnes/pynndescent) until [sqlite-vec](https://github.com/asg017/sqlite-vec) is more mature.
 
 ## Installing
 
@@ -71,10 +71,10 @@ pip install raglite[ragas]
 ### 1. Configuring RAGLite
 
 > [!TIP]
-> 🧠 RAGLite extends [LiteLLM](https://github.com/BerriAI/litellm) with support for [llama.cpp](https://github.com/ggerganov/llama.cpp) models using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python). To select a llama.cpp model (e.g., from [bartowski's collection](https://huggingface.co/collections/bartowski/recent-highlights-65cf8e08f8ab7fc669d7b5bd)), use a model identifier of the form `"llama-cpp-python/<hugging_face_repo_id>/<filename>@<n_ctx>"`, where `n_ctx` is an optional parameter that specifies the context size of the model.
+> 🧠 RAGLite extends [LiteLLM](https://github.com/BerriAI/litellm) with support for [llama.cpp](https://github.com/ggerganov/llama.cpp) models using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python). To select a llama.cpp model (e.g., from [bartowski's collection](https://huggingface.co/bartowski)), use a model identifier of the form `"llama-cpp-python/<hugging_face_repo_id>/<filename>@<n_ctx>"`, where `n_ctx` is an optional parameter that specifies the context size of the model.
 
 > [!TIP]
-> 💾 You can create a PostgreSQL database for free in a few clicks at [neon.tech](https://neon.tech) (not sponsored).
+> 💾 You can create a PostgreSQL database in a few clicks at [neon.tech](https://neon.tech).
 
 First, configure RAGLite with your preferred PostgreSQL or SQLite database and [any LLM supported by LiteLLM](https://docs.litellm.ai/docs/providers/openai):
 
@@ -114,16 +114,21 @@ insert_document(Path("Special Relativity.pdf"), config=my_config)
 
 ### 3. Searching and Retrieval-Augmented Generation (RAG)
 
-Now, you can search for chunks with keyword search, vector search, or a hybrid of the two. You can also answer questions with RAG and the search method of your choice (`hybrid` is the default):
+Now, you can search for chunks with vector search, keyword search, or a hybrid of the two. You can also rerank the search results with the configured reranker. And you can use any search method of your choice (`hybrid_search` is the default) together with reranking to answer questions with RAG:
 
 ```python
 # Search for chunks:
 from raglite import hybrid_search, keyword_search, vector_search
 
 prompt = "How is intelligence measured?"
-results_vector = vector_search(prompt, num_results=5, config=my_config)
-results_keyword = keyword_search(prompt, num_results=5, config=my_config)
-results_hybrid = hybrid_search(prompt, num_results=5, config=my_config)
+chunk_ids_vector, _ = vector_search(prompt, num_results=20, config=my_config)
+chunk_ids_keyword, _ = keyword_search(prompt, num_results=20, config=my_config)
+chunk_ids_hybrid, _ = hybrid_search(prompt, num_results=20, config=my_config)
+
+# Rerank chunks:
+from raglite import rerank
+
+chunk_ids_reranked = rerank(prompt, chunk_ids_hybrid, config=my_config)
 
 # Answer questions with RAG:
 from raglite import rag

diff --git a/src/raglite/_config.py b/src/raglite/_config.py
@@ -46,7 +46,7 @@ class RAGLiteConfig:
     vector_search_index_metric: str = "cosine"  # The query adapter supports "dot" and "cosine".
     vector_search_query_adapter: bool = True
     # Reranking config.
-    rerankers: tuple[tuple[str, BaseRanker], ...] | None = field(
+    reranker: BaseRanker | tuple[tuple[str, BaseRanker], ...] | None = field(
         default_factory=lambda: (
             ("en", FlashRankRanker("ms-marco-MiniLM-L-12-v2", verbose=0)),
             ("other", FlashRankRanker("ms-marco-MultiBERT-L-12", verbose=0)),

diff --git a/src/raglite/_rag.py b/src/raglite/_rag.py
@@ -31,11 +31,11 @@ def rag(
     max_tokens_per_context *= 1 + len(context_neighbors or [])
     max_contexts = min(max_contexts, max_tokens // max_tokens_per_context)
     # If the user has configured a reranker, we retrieve extra contexts to rerank.
-    extra_contexts = 4 * max_contexts if config.rerankers else 0
+    extra_contexts = 4 * max_contexts if config.reranker else 0
     # Retrieve relevant contexts.
     chunk_ids, _ = search(prompt, num_results=max_contexts + extra_contexts, config=config)  # type: ignore[call-arg]
     # Rerank the relevant contexts and select the top contexts.
-    if config.rerankers:
+    if config.reranker:
         chunk_ids = rerank(query=prompt, chunk_ids=chunk_ids, config=config)[:max_contexts]
     # Extend the top contexts with their neighbors and group chunks into contiguous segments.
     segments = retrieve_segments(chunk_ids, neighbors=context_neighbors, config=config)

diff --git a/src/raglite/_search.py b/src/raglite/_search.py
@@ -3,6 +3,7 @@
 import re
 import string
 from collections import defaultdict
+from collections.abc import Sequence
 from itertools import groupby
 from typing import cast
 
@@ -229,19 +230,23 @@ def rerank(
     """Rerank chunks according to their relevance to a given query."""
     # Early exit if no reranker is configured.
     config = config or RAGLiteConfig()
-    if not config.rerankers:
+    if not config.reranker:
         return chunk_ids
     # Retrieve the chunks.
     chunks = retrieve_chunks(chunk_ids, config=config)
-    # Detect the languages of the chunks and queries.
-    langs = {detect(str(chunk)) for chunk in chunks}
-    langs.add(detect(query))
-    # If all chunks and the query are in the same language, use the language-specific reranker.
-    rerankers = dict(config.rerankers)
-    if len(langs) == 1 and (lang := next(iter(langs))) in rerankers:
-        reranker = rerankers[lang]
+    # Select the reranker.
+    if isinstance(config.reranker, Sequence):
+        # Detect the languages of the chunks and queries.
+        langs = {detect(str(chunk)) for chunk in chunks}
+        langs.add(detect(query))
+        # If all chunks and the query are in the same language, use the language-specific reranker.
+        rerankers = dict(config.reranker)
+        if len(langs) == 1 and (lang := next(iter(langs))) in rerankers:
+            reranker = rerankers[lang]
+        else:
+            reranker = rerankers.get("other")
     else:
-        reranker = rerankers.get("other")
+        reranker = config.reranker
     # Rerank the chunks.
     if reranker:
         results = reranker.rank(