Skip to content

Commit

Permalink
feat: various improvements
Browse files Browse the repository at this point in the history
  • Loading branch information
lsorber committed Oct 6, 2024
1 parent c9f4d0d commit fc8fb27
Show file tree
Hide file tree
Showing 4 changed files with 34 additions and 24 deletions.
29 changes: 17 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ RAGLite is a Python package for Retrieval-Augmented Generation (RAG) with Postgr
##### Configurable

- 🧠 Choose any LLM provider with [LiteLLM](https://github.com/BerriAI/litellm), including local [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) models
- 💾 Either [PostgreSQL](https://github.com/postgres/postgres) or [SQLite](https://github.com/sqlite/sqlite) as a keyword & vector search database
- 🥇 Choose any reranker with [Rerankers](https://github.com/AnswerDotAI/rerankers), and fast multi-lingual [FlashRank](https://github.com/PrithivirajDamodaran/FlashRank) as the default
- 💾 Choose either [PostgreSQL](https://github.com/postgres/postgres) or [SQLite](https://github.com/sqlite/sqlite) as a keyword & vector search database
- 🥇 Choose any reranker with [Rerankers](https://github.com/AnswerDotAI/rerankers), including multi-lingual [FlashRank](https://github.com/PrithivirajDamodaran/FlashRank) as the default

##### Fast and permissive

Expand All @@ -22,15 +22,15 @@ RAGLite is a Python package for Retrieval-Augmented Generation (RAG) with Postgr
- 📖 PDF to Markdown conversion on top of [pdftext](https://github.com/VikParuchuri/pdftext) and [pypdfium2](https://github.com/pypdfium2-team/pypdfium2)
- 🧬 Multi-vector chunk embedding with [late chunking](https://weaviate.io/blog/late-chunking) and [contextual chunk headings](https://d-star.ai/solving-the-out-of-context-chunk-problem-for-rag)
- ✂️ Optimal [level 4 semantic chunking](https://medium.com/@anuragmishra_27746/five-levels-of-chunking-strategies-in-rag-notes-from-gregs-video-7b735895694d) by solving a [binary integer programming problem](https://en.wikipedia.org/wiki/Integer_programming)
- 🔍 [Hybrid search](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) that combines the database's built-in keyword search ([tsvector](https://www.postgresql.org/docs/current/datatype-textsearch.html), [FTS5](https://www.sqlite.org/fts5.html)) with their native vector search extensions ([pgvector](https://github.com/pgvector/pgvector), [sqlite-vec](https://github.com/asg017/sqlite-vec)[^1])
- 🔍 [Hybrid search](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) with the database's native keyword & vector search ([tsvector](https://www.postgresql.org/docs/current/datatype-textsearch.html) + [pgvector](https://github.com/pgvector/pgvector), [FTS5](https://www.sqlite.org/fts5.html) + [sqlite-vec](https://github.com/asg017/sqlite-vec)[^1])
- 🌀 Optimal [closed-form linear query adapter](src/raglite/_query_adapter.py) by solving an [orthogonal Procrustes problem](https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem)

##### Extensible

- ✍️ Optional: conversion of any input document to Markdown with [Pandoc](https://github.com/jgm/pandoc)
- ✅ Optional: evaluation of retrieval and generation performance with [Ragas](https://github.com/explodinggradients/ragas)
- ✍️ Optional conversion of any input document to Markdown with [Pandoc](https://github.com/jgm/pandoc)
- ✅ Optional evaluation of retrieval and generation performance with [Ragas](https://github.com/explodinggradients/ragas)

[^1]: We use [PyNNDescent](https://github.com/lmcinnes/pynndescent) while [sqlite-vec](https://github.com/asg017/sqlite-vec) is still in development.
[^1]: We use [PyNNDescent](https://github.com/lmcinnes/pynndescent) until [sqlite-vec](https://github.com/asg017/sqlite-vec) is more mature.

## Installing

Expand Down Expand Up @@ -71,10 +71,10 @@ pip install raglite[ragas]
### 1. Configuring RAGLite

> [!TIP]
> 🧠 RAGLite extends [LiteLLM](https://github.com/BerriAI/litellm) with support for [llama.cpp](https://github.com/ggerganov/llama.cpp) models using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python). To select a llama.cpp model (e.g., from [bartowski's collection](https://huggingface.co/collections/bartowski/recent-highlights-65cf8e08f8ab7fc669d7b5bd)), use a model identifier of the form `"llama-cpp-python/<hugging_face_repo_id>/<filename>@<n_ctx>"`, where `n_ctx` is an optional parameter that specifies the context size of the model.
> 🧠 RAGLite extends [LiteLLM](https://github.com/BerriAI/litellm) with support for [llama.cpp](https://github.com/ggerganov/llama.cpp) models using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python). To select a llama.cpp model (e.g., from [bartowski's collection](https://huggingface.co/bartowski)), use a model identifier of the form `"llama-cpp-python/<hugging_face_repo_id>/<filename>@<n_ctx>"`, where `n_ctx` is an optional parameter that specifies the context size of the model.
> [!TIP]
> 💾 You can create a PostgreSQL database for free in a few clicks at [neon.tech](https://neon.tech) (not sponsored).
> 💾 You can create a PostgreSQL database in a few clicks at [neon.tech](https://neon.tech).
First, configure RAGLite with your preferred PostgreSQL or SQLite database and [any LLM supported by LiteLLM](https://docs.litellm.ai/docs/providers/openai):

Expand Down Expand Up @@ -114,16 +114,21 @@ insert_document(Path("Special Relativity.pdf"), config=my_config)

### 3. Searching and Retrieval-Augmented Generation (RAG)

Now, you can search for chunks with keyword search, vector search, or a hybrid of the two. You can also answer questions with RAG and the search method of your choice (`hybrid` is the default):
Now, you can search for chunks with vector search, keyword search, or a hybrid of the two. You can also rerank the search results with the configured reranker. And you can use any search method of your choice (`hybrid_search` is the default) together with reranking to answer questions with RAG:

```python
# Search for chunks:
from raglite import hybrid_search, keyword_search, vector_search

prompt = "How is intelligence measured?"
results_vector = vector_search(prompt, num_results=5, config=my_config)
results_keyword = keyword_search(prompt, num_results=5, config=my_config)
results_hybrid = hybrid_search(prompt, num_results=5, config=my_config)
chunk_ids_vector, _ = vector_search(prompt, num_results=20, config=my_config)
chunk_ids_keyword, _ = keyword_search(prompt, num_results=20, config=my_config)
chunk_ids_hybrid, _ = hybrid_search(prompt, num_results=20, config=my_config)

# Rerank chunks:
from raglite import rerank

chunk_ids_reranked = rerank(prompt, chunk_ids_hybrid, config=my_config)

# Answer questions with RAG:
from raglite import rag
Expand Down
2 changes: 1 addition & 1 deletion src/raglite/_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ class RAGLiteConfig:
vector_search_index_metric: str = "cosine" # The query adapter supports "dot" and "cosine".
vector_search_query_adapter: bool = True
# Reranking config.
rerankers: tuple[tuple[str, BaseRanker], ...] | None = field(
reranker: BaseRanker | tuple[tuple[str, BaseRanker], ...] | None = field(
default_factory=lambda: (
("en", FlashRankRanker("ms-marco-MiniLM-L-12-v2", verbose=0)),
("other", FlashRankRanker("ms-marco-MultiBERT-L-12", verbose=0)),
Expand Down
4 changes: 2 additions & 2 deletions src/raglite/_rag.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,11 @@ def rag(
max_tokens_per_context *= 1 + len(context_neighbors or [])
max_contexts = min(max_contexts, max_tokens // max_tokens_per_context)
# If the user has configured a reranker, we retrieve extra contexts to rerank.
extra_contexts = 4 * max_contexts if config.rerankers else 0
extra_contexts = 4 * max_contexts if config.reranker else 0
# Retrieve relevant contexts.
chunk_ids, _ = search(prompt, num_results=max_contexts + extra_contexts, config=config) # type: ignore[call-arg]
# Rerank the relevant contexts and select the top contexts.
if config.rerankers:
if config.reranker:
chunk_ids = rerank(query=prompt, chunk_ids=chunk_ids, config=config)[:max_contexts]
# Extend the top contexts with their neighbors and group chunks into contiguous segments.
segments = retrieve_segments(chunk_ids, neighbors=context_neighbors, config=config)
Expand Down
23 changes: 14 additions & 9 deletions src/raglite/_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import re
import string
from collections import defaultdict
from collections.abc import Sequence
from itertools import groupby
from typing import cast

Expand Down Expand Up @@ -229,19 +230,23 @@ def rerank(
"""Rerank chunks according to their relevance to a given query."""
# Early exit if no reranker is configured.
config = config or RAGLiteConfig()
if not config.rerankers:
if not config.reranker:
return chunk_ids
# Retrieve the chunks.
chunks = retrieve_chunks(chunk_ids, config=config)
# Detect the languages of the chunks and queries.
langs = {detect(str(chunk)) for chunk in chunks}
langs.add(detect(query))
# If all chunks and the query are in the same language, use the language-specific reranker.
rerankers = dict(config.rerankers)
if len(langs) == 1 and (lang := next(iter(langs))) in rerankers:
reranker = rerankers[lang]
# Select the reranker.
if isinstance(config.reranker, Sequence):
# Detect the languages of the chunks and queries.
langs = {detect(str(chunk)) for chunk in chunks}
langs.add(detect(query))
# If all chunks and the query are in the same language, use the language-specific reranker.
rerankers = dict(config.reranker)
if len(langs) == 1 and (lang := next(iter(langs))) in rerankers:
reranker = rerankers[lang]
else:
reranker = rerankers.get("other")
else:
reranker = rerankers.get("other")
reranker = config.reranker
# Rerank the chunks.
if reranker:
results = reranker.rank(
Expand Down

0 comments on commit fc8fb27

Please sign in to comment.