From 0640845da4bc4a15bbba561a0a67af32ebce720e Mon Sep 17 00:00:00 2001 From: Arunesh Singh <43724007+AruneshSingh@users.noreply.github.com> Date: Wed, 23 Oct 2024 11:29:20 +0200 Subject: [PATCH] fix: proper rendering for article title --- docs/articles/semantic_chunking.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/articles/semantic_chunking.md b/docs/articles/semantic_chunking.md index 142509fd..8bb788aa 100644 --- a/docs/articles/semantic_chunking.md +++ b/docs/articles/semantic_chunking.md @@ -1,8 +1,8 @@ +# Semantic Chunking + -# Semantic Chunking - Chunking in Natural Language Processing is simply dividing large bodies of text into smaller pieces that computers can manage more easily. Splitting large datasets into chunks enables your Retrieval Augmented Generation (RAG) system to embed, index, and store even very large datasets optimally. But *how* you chunk your data is crucial in determining whether you can efficiently return only the most relevant results to your user queries. To get your RAG system to handle user queries better, you need a chunking method that's a good fit for your data. Some widely used chunking algorithms are **rule-based** - e.g., fixed character splitter, recursive character splitter, document-specific splitter, among others. But in some real-world applications, rule-based methods have trouble. If, for example, your dataset has multi-topic documents, rule-based splitting algorithms can result in incomplete contexts or noise-filled chunks. **Semantic chunking**, on the other hand - because it divides text on the basis of meaning rather than rules - creates chunks that are semantically independent and cohesive, and therefore results in more effective text processing and information retrieval.