- remove unnecessary stop sequence (#84)
- fix Markdown heading boundary probas (#81)
- improve (re)insertion speed (#80)
- deps: exclude litellm versions that break get_model_info (#78)
- conditionally enable
LlamaRAMCache
(#83)
- add support for Python 3.12 (#69)
- upgrade from xx_sent_ud_sm to SaT (#74)
- add streaming tool use to llama-cpp-python (#71)
- improve sentence splitting (#72)
- improve output for empty databases (#68)
- add MCP server (#67)
- let LLM choose whether to retrieve context (#62)
- support pgvector v0.7.0+ (#63)
- add and enable OpenAI strict mode (#55)
- support embedding with LiteLLM for Ragas (#56)
- improve late chunking and optimize pgvector settings (#51)
- support prompt caching and apply Anthropic's long-context prompt format (#52)
- improve structured output extraction and query adapter updates (#34)
- upgrade rerankers and remove flashrank patch (#47)
- improve unpacking of keyword search results (#46)
- add fallbacks for model info (#44)
- add Chainlit frontend (#33)
- fix optimal chunking edge cases (#32)
- upgrade pdftext (#30)
- improve chunk and segment ordering (#29)
- avoid pdftext v0.3.11 (#27)
- patch rerankers flashrank issue (#22)
- add reranking (#20)
- add LiteLLM and late chunking (#19)
- add PostgreSQL support (#18)
- make query adapter minimally invasive (#16)
- upgrade default CPU model to Phi-3.5-mini (#15)
- add evaluation (#14)
- infer missing font sizes (#12)
- automatically adjust number of RAG contexts (#10)
- improve exception feedback for extraction (#9)
- optimize config for CPU and GPU (#7)
- simplify document insertion (#6)
- implement basic features (#2)
- initial commit
- lazily import optional dependencies (#11)
- improve indexing of multiple documents (#8)