Repo contains scripts with overly detailed explanations as well as advanced scripts with not an excessive number of details and comments (ready to run ones). These resources aim to provide someone with concise guidance and practical examples for creating and evaluating a RAG system from scratch.
Beginners start : start_here.ipynb
or start_here.py
Further methods : Advanced option to rule RAG
"Baby, don't hurt me..."
RAG = Retrieval Augmented Generation
- Retrieval - the process of searching for and extracting relevant information (retriever).
- Retrieval Augmented - supplementing the user's query with found relevant information.
- Retrieval Augmented Generation - generating a response to the user while taking into account additionally found relevant information.
Walkthrough example:
- User query: "Baby, don't hurt me..."
- RAG process:
- Input Interpretation: The system receives the user's plea and detects a potential for a song lyric reference.
- Data Retrieval: It quickly scours the attached database for relevant information, focusing on the lyrics of the song "What is Love" by Haddaway.
- Augmentation: Next, it augments the user's query with additional context, ensuring a deep understanding of the reference.
- Generation: Armed with the knowledge of the song's lyrics, the system crafts a witty response, perhaps something like: "No worries, user! I'll only hurt you with my endless knowledge of 90s pop hits."
- RAG delivery: Finally, the system delivers the response with a touch of humor, leaving the user amused and impressed by the AI's cleverness.
-
Economically Efficient Deployment: The development of chatbots typically starts with basic models, which are LLM models trained on generalized data. RAG offers a more cost-effective method for incorporating new data into LLM, without finetuning whole LLM.
-
Up-to-Date Information: RAG enables to integrate rapidly changing and the latest data directly into generative models. By connecting LLM to real-time social media feeds or news websites, users receive the most current information.
-
Increased User Trust: With RAG, LLM can provide accurate information while citing sources, boosting user confidence in the generative AI. Users can verify information by accessing the source documents themselves, enhancing trust in the system.
- with RunnableSequences (langchain) (if you want clean and structured approach and easy-to-follow code sequences)
- with HuggingFace models (if you want to try some the very resent releases and cutting-edge technology)
- localy (if you love the smell of code in the morning)
You can start with start_here.ipynb
or start_here.py
file and proceed with other exceptionally detailed for the begginers files and notebooks from tutorials section.
How to choose retrieval model (llm embedder)? --> mteb/leaderboard, tab: Retrieval or Retrieval w/Instruction
How to choose reranking model (reorder list of relevant documents)? --> mteb/leaderboard, tab: Reranking
How to choose generator model (llm for generate final answer)? --> open-llm-leaderboard/open_llm_leaderboard
Pls, refer to the other options and files listed below, to get less commented, but more advanced scripts, examples and techniques.
HOWTO | Option | Go-to file | Outer documentation |
---|---|---|---|
Basic tutorials | |||
Basic and simple | default | ||
Run scripts for full RAG system | |||
How to run HuggingFace models | localy:
|
||
remotely:
|
in progress... release imminent |
||
How to evaluate and monitoring application | with LangSmith |
in progress... release imminent |
HOWTO | Option | Go-to file | Outer documentation |
---|---|---|---|
How to store and embed documents? | |||
How to store embeddings in vectorstore (FAISS or Chroma) |
default with:
|
||
How to embed documents | default | ||
with Caching (save your time while next creating) |
|||
with Compressing (save RAM while store and retrieving) |
in progress... release imminent |
||
How RunnableSequence chains work? | |||
How to retrieve documents | default | ||
with Multiple Queries Generation | |||
with
|
in progress... release imminent |
||
with Prompting
Hint: ask GPT to provide instruction for your RAG system and use it as prompt template |
|||
How to generate answer | default | ||
with Prompting
Hint: ask GPT to provide instruction for your RAG system and use it as prompt template |
|||
with GPTQQuantizer (save RAM and fast generation) |
|
||
with vLLM (If you encounter |
|
||
with LlamaCpp (save RAM and fast generation) |
|
||
How to further improve your chain? | |||
Advansed chain elements | Amplification |
in progress... release imminent |
Christiano et al. 2018. Supervising strong learners by amplifying weak experts |
Debate |
in progress... release imminent |
||
Advansed prompt techniques | default |
in progress... release imminent |
Schulhoff et al. 2024. The Prompt Report: A Systematic Survey of Prompting Techniques |
Further reading:
Mrs Wallbreaker or: How I Learned to Stop Worrying and Love the AGI.
About AI Risk, AI Alignment, AI Safety, AI Ethics