RAG locally with LangChain RunnableSequence. HOWTOs

Repo contains scripts with overly detailed explanations as well as advanced scripts with not an excessive number of details and comments (ready to run ones). These resources aim to provide someone with concise guidance and practical examples for creating and evaluating a RAG system from scratch.

Beginners start : start_here.ipynb or start_here.py

Further methods : Advanced option to rule RAG

What is RAG?

~~"Baby, don't hurt me..."~~

RAG = Retrieval Augmented Generation

Retrieval - the process of searching for and extracting relevant information (retriever).
Retrieval Augmented - supplementing the user's query with found relevant information.
Retrieval Augmented Generation - generating a response to the user while taking into account additionally found relevant information.

Walkthrough example:

User query: "Baby, don't hurt me..."
RAG process:
- Input Interpretation: The system receives the user's plea and detects a potential for a song lyric reference.
- Data Retrieval: It quickly scours the attached database for relevant information, focusing on the lyrics of the song "What is Love" by Haddaway.
- Augmentation: Next, it augments the user's query with additional context, ensuring a deep understanding of the reference.
- Generation: Armed with the knowledge of the song's lyrics, the system crafts a witty response, perhaps something like: "No worries, user! I'll only hurt you with my endless knowledge of 90s pop hits."
RAG delivery: Finally, the system delivers the response with a touch of humor, leaving the user amused and impressed by the AI's cleverness.

Why RAG?

Economically Efficient Deployment: The development of chatbots typically starts with basic models, which are LLM models trained on generalized data. RAG offers a more cost-effective method for incorporating new data into LLM, without finetuning whole LLM.
Up-to-Date Information: RAG enables to integrate rapidly changing and the latest data directly into generative models. By connecting LLM to real-time social media feeds or news websites, users receive the most current information.
Increased User Trust: With RAG, LLM can provide accurate information while citing sources, boosting user confidence in the generative AI. Users can verify information by accessing the source documents themselves, enhancing trust in the system.

How to read and create RAG:

with RunnableSequences (langchain) (if you want clean and structured approach and easy-to-follow code sequences)
with HuggingFace models (if you want to try some the very resent releases and cutting-edge technology)
localy (if you love the smell of code in the morning)

You can start with start_here.ipynb or start_here.py file and proceed with other exceptionally detailed for the begginers files and notebooks from tutorials section.

Where to find the model and how to choose one:

How to choose retrieval model (llm embedder)? --> mteb/leaderboard, tab: Retrieval or Retrieval w/Instruction

How to choose reranking model (reorder list of relevant documents)? --> mteb/leaderboard, tab: Reranking

How to choose generator model (llm for generate final answer)? --> open-llm-leaderboard/open_llm_leaderboard

Advanced option to rule RAG

Pls, refer to the other options and files listed below, to get less commented, but more advanced scripts, examples and techniques.

HOWTO	Option	Go-to file	Outer documentation
Basic tutorials
Basic and simple	default	`start_here.ipynb` `start_here.py`
Run scripts for full RAG system
How to run HuggingFace models	localy: with HuggingFaceEmbeddings with HuggingFacePipeline	`local_rag_chain_simple.py` `local_rag_retrieval_qa_class.py`
How to run HuggingFace models	remotely: with HuggingFaceHub	in progress... release imminent	Hugging Face Hub documentation
How to evaluate and monitoring application	with LangSmith	in progress... release imminent	Get started with LangSmith

Individual components and elements

HOWTO	Option	Go-to file	Outer documentation
How to store and embed documents?
How to store embeddings in vectorstore (FAISS or Chroma)	default with: text splitter progress bar on creating vectorstore dump and load from disk	`get_vectorstore.py` `create_vectorstore.py`	FAISS Chroma
How to embed documents	default	`create_llm_emb_default.py`	Text embedding models
	with Caching (save your time while next creating)	`create_llm_emb_cached.py`	Caching Embeddings
	with Compressing (save RAM while store and retrieving)	in progress... release imminent
How RunnableSequence chains work?
How to retrieve documents	default	`local_rag_chain_simple.py` `combine_simple_RAG_chains.py`
	with Multiple Queries Generation	`local_rag_chain_multi_query.py` `multiple_queries_chain.py`
	with `chain_type` : `stuff`, `map_reduce`, `refine`, `map_rerank`	in progress... release imminent
	with Prompting Hint: ask GPT to provide instruction for your RAG system and use it as prompt template	`prompt_templates_retrieve.py`
How to generate answer	default	`create_llm_gen_default.py`
	with Prompting Hint: ask GPT to provide instruction for your RAG system and use it as prompt template	`prompt_templates_generate.py`
	with GPTQQuantizer (save RAM and fast generation)	`pip install optimum auto-gptq` `create_llm_gen_default.py`
	with vLLM (If you encounter `RuntimeError: probability tensor contains either inf, nan or element < 0` during `GPTQQuantizer` inference)	`pip install vllm` `create_llm_gen_vLLM.py`	vLLM in LangChain
	with LlamaCpp (save RAM and fast generation)	`pip install llama-cpp-python` `create_llm_gen_llama_cpp.py`	LlamaCpp in LangChain
How to further improve your chain?
Advansed chain elements	Amplification	in progress... release imminent	Christiano et al. 2018. Supervising strong learners by amplifying weak experts
Advansed chain elements	Debate	in progress... release imminent	Irving et al. 2018. AI safety via debate
Advansed prompt techniques	default	in progress... release imminent	Schulhoff et al. 2024. The Prompt Report: A Systematic Survey of Prompting Techniques

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflows		.github/workflows
chains		chains
embedders		embedders
generators		generators
run_examples		run_examples
tests		tests
tools		tools
tutorials		tutorials
vectorstores		vectorstores
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Procfile		Procfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG locally with LangChain RunnableSequence. HOWTOs

What is RAG?

Why RAG?

How to read and create RAG:

Where to find the model and how to choose one:

Advanced option to rule RAG

Individual components and elements

About

Releases

Packages

Languages

eericheva/langchain_rag

Folders and files

Latest commit

History

Repository files navigation

RAG locally with LangChain RunnableSequence. HOWTOs

What is RAG?

Why RAG?

How to read and create RAG:

Where to find the model and how to choose one:

Advanced option to rule RAG

Individual components and elements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages