You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary
- Adds an optional Semantic Chunker to the admin-api-lib and centralizes embedding implementations in rag-core-lib (rag-core-api now re-exports).
- Helm chart gains chunker selection + tuning; admin container now preloads NLTK data at startup.
- Dependency updates across admin libs/services; new tests for chunking logic.
Motivation
- Provide more accurate chunk boundaries (semantic-aware) while retaining the existing recursive splitter as the default.
- Deduplicate/embedder logic across projects to reduce drift and config duplication.
Key changes
- Admin chunking
- New `SemanticTextChunker` backed by LangChain’s `SemanticChunker`, with optional min/max enforcement via `RecursiveCharacterTextSplitter`.
- Trailing undersized chunks are sentence-aware rebalanced (NLTK Punkt with regex fallback) to avoid tiny tails.
- Configurable via:
- `CHUNKER_CLASS_TYPE_CHUNKER_TYPE`: `recursive` (default) or `semantic`
- `CHUNKER_MAX_SIZE` (default `1000`), `CHUNKER_OVERLAP` (default `100`)
- Semantic-only: `CHUNKER_BREAKPOINT_THRESHOLD_TYPE` (default `percentile`), `CHUNKER_BREAKPOINT_THRESHOLD_AMOUNT` (default `95`), `CHUNKER_BUFFER_SIZE` (default `1`), `CHUNKER_MIN_SIZE` (default `200`)
- DI wiring
- `DependencyContainer` selects chunker (`recursive` or `semantic`) and, for semantic mode, resolves embeddings via `EmbedderClassTypeSettings`:
- `stackit` → `StackitEmbedder` (with shared retry settings)
- `ollama` → `LangchainCommunityEmbedder(OllamaEmbeddings)`
- Container bootstrapping simplified in `main.py` (internalizes class-type wiring).
- Embeddings centralization
- New in `rag-core-lib`: `impl/embeddings/*` and embedder settings (`stackit`, `ollama`, `fake`), plus `EmbedderType` and base `Embedder`.
- `rag-core-api` re-exports these for backward compatibility (no breaking imports).
- Helm / deployment
- Values (`infrastructure/rag/values.yaml`): new `adminBackend.envs.chunker.*` keys for selection & tuning (chart default `recursive`; overlap default now `100`).
- Deployment: mounts NLTK data dir and fetches `punkt` + `averaged_perceptron_tagger_eng` at startup; adds `configmap.chunkerName` and `secret.stackitEmbedderName` to env sources.
- Behavior fixes & docs
- De-duplicate `meta["related"]` in page summaries.
- Docs: libs README adds “Chunker configuration (multiple chunkers)” and updates DI tables to rag-core-lib classes; admin-backend README adds “Chunking modes”.
- Tests
- New `semantic_text_chunker_test.py` exercising: supported-kwargs passthrough to LC chunker, empty-input behavior, min/max enforcement + balancing, sentence-aware split.
Configuration / migration
- Default remains `recursive` splitter; to enable semantic chunking:
1) Set `CHUNKER_CLASS_TYPE_CHUNKER_TYPE=semantic`.
2) Choose embeddings via `EMBEDDER_CLASS_TYPE_EMBEDDER_TYPE` (`stackit` or `ollama`) and configure:
- STACKIT: `STACKIT_EMBEDDER_MODEL`, `STACKIT_EMBEDDER_BASE_URL`, `STACKIT_EMBEDDER_API_KEY` (+ optional retry overrides).
- Ollama: `OLLAMA_EMBEDDER_MODEL`, `OLLAMA_EMBEDDER_BASE_URL`.
3) Ensure Helm chart has corresponding ConfigMaps/Secrets (`stackitEmbedder`, etc.).
- NLTK data is preloaded on container start; no runtime downloads required.
Dependencies
- Add: `langchain-experimental`, `nltk` (and transitive `joblib`).
- Bump: `fastapi` (0.118.x), `uvicorn` (0.37.x), `langfuse` (3.6.x), `langchain`/`community`/`core` minor versions, `requests` (2.32.5).
- Test note: ensure LC packages (`langchain_core`, etc.) are present to run unit tests locally.
Risks & mitigations
- Startup time increases slightly due to NLTK data fetch → mitigated via one-time download into an emptyDir.
- Semantic mode depends on external embeddings; ensure credentials/secrets are present before switching default.
- Chunk size tuning may affect vector DB costs; start with defaults and adjust based on retrieval quality.
Docs
- libs/README.md: “2.4 Chunker configuration (multiple chunkers)” and corrected DI references.
- services/admin-backend/README.md: “Chunking modes” and Helm guidance.
0 commit comments