Releases: neuml/txtai
v8.3.1
v8.3.0
This release adds support for GLiNER, Chonkie, Kokoro TTS and Static Vectors
See below for full details on the new features, improvements and bug fixes.
New Features
- Add support for GLiNER models (#862) Thank you @urchade
- Add semantic chunking pipeline (#812) Thank you @bhavnicksm
- Add Kokoro TTS support to TextToSpeech pipeline (#854) Thank you @hexgrad
- Add staticvectors inference (#859)
- Add example notebook for Entity Extraction with GLiNER (#873)
- Add example notebook for RAG Chunking (#874)
- Add notebook that analyzes NeuML LinkedIn posts (#851)
Improvements
- Add new methods for audio signal processing (#855)
- Remove fasttext dependency (#857)
- Remove WordVectors.build method (#858)
- Detect graph queries and route to graph index (#865)
- Replace python-louvain library with networkx equivalent (#867)
- Word vector model improvements (#868)
- Improve parsing of table text in HTML to Markdown pipeline (#872)
Bug Fixes
v8.2.0
This release simplifies LLM chat messages, adds attribute filtering to Graph RAG and enables multi-cpu/gpu vector encoding
See below for full details on the new features, improvements and bug fixes.
New Features
- Add defaultrole to LLM pipeline (#841)
- Feature Request: Graph RAG - Add extra attributes (#684)
- Support graph=True in embeddings config (#848)
- Support pulling attribute data in graph.scan (#849)
- Encoding using multiple-GPUs (#541)
- Add vectors argument to Model2Vec vectors (#846)
- Enhanced Docs: LLM Embedding Examples (#843, #844) Thank you @igorlima!
Improvements
v8.1.0
This release adds Docling integration, Embeddings context managers and significant database component enhancements
See below for full details on the new features, improvements and bug fixes.
New Features
- Add text extraction with Docling (#814)
- Add Embeddings context manager (#832)
- Add support for halfvec and bit vector types with PGVector ANN (#839)
- Persist embeddings components to specified schema (#829)
- Add example notebook that analyzes the Hugging Face Posts dataset (#817)
- Add an example notebook for autonomous agents (#820)
Improvements
- Cloud storage improvements (#821)
- Autodetect Model2Vec model paths (#822)
- Add parameter to disable text cleaning in Segmentation pipeline (#823)
- Refactor vectors package (#826)
- Refactor Textractor pipeline into multiple pipelines (#828)
- RDBMS graph.delete tests and upgrade graph dependency (#837)
- Bound ANN hamming scores between 0.0 and 1.0 (#838)
Bug Fixes
- Fix error with inferring function parameters in agents (#816)
- Add programmatic workaround for Faiss + macOS (#818) Thank you @yukiman76!
- docs: update 49_External_database_integration.ipynb (#819) Thank you @eltociear!
- Fix memory issue with llama.cpp LLM pipeline (#824)
- Fix issue with calling cached_file for local directories (#825)
- Fix resource issues with embeddings indexing components backed by databases (#831)
- Fix bug with NetworkX.hasedge method (#834)
v8.0.0
🎉 We're excited to announce the release of txtai 8.0 🎉
If you like txtai, please remember to give it a ⭐!
8.0 introduces agents. Agents automatically create workflows to answer multi-faceted user requests. Agents iteratively prompt and/or interface with tools to step through a process and ultimately come to an answer for a request.
This release also adds support for Model2Vec vectorization. See below for more.
New Features
- Add txtai agents 🚀 (#804)
- Add agents package to txtai (#808)
- Add documentation for txtai agents (#809)
- Add agents to Application and API interfaces (#810)
- Add agents example notebook (#811)
- Add model2vec vectorization (#801)
Improvements
- Update BASE_IMAGE in Dockerfile (#799)
- Cleanup vectors package (#802)
- Build script improvements (#805)
Bug Fixes
- ImportError: cannot import name 'DuckDuckGoSearchTool' from 'transformers.agents' (#807)
v7.5.1
v7.5.0
This release adds Speech to Speech RAG, new TTS models and Generative Audio features
See below for full details on the new features, improvements and bug fixes.
New Features
- Add Speech to Speech example notebook (#789)
- Add streaming speech generation (#784)
- Add a microphone pipeline (#785)
- Add an audio playback pipeline (#786)
- Add Text to Audio pipeline (#792)
- Add support for SpeechT5 ONNX exports with Text to Speech pipeline (#793)
- Add audio signal processing and mixing methods (#795)
- Add Generative Audio example notebook (#798)
- Add example notebook covering open data access (#782)
Improvements
- Issue with Language Specific Transcription Using txtai and Whisper (#593)
- Update TextToSpeech pipeline to support speaker parameter (#787)
- Update Text to Speech Generation Notebook (#790)
- Update hf_hub_download methods to use cached_file (#794)
- Require Python >= 3.9 (#796)
- Upgrade pylint and black (#797)
v7.4.0
This release adds the SQLite ANN, new text extraction features and a programming language neutral embeddings index format
See below for full details on the new features, improvements and bug fixes.
New Features
- Add SQLite ANN (#780)
- Enhance markdown support for Textractor (#758)
- Update txtai index format to remove Python-specific serialization (#769)
- Add new functionality to RAG application (#753)
- Add bm25s library to benchmarks (#757) Thank you @a0346f102085fe9f!
- Add serialization package for handling supported data serialization methods (#770)
- Add MessagePack serialization as a top level dependency (#771)
Improvements
- Support
<pre>
blocks with Textractor (#749) - Update HF LLM to reduce noisy warnings (#752)
- Update NLTK model downloads (#760)
- Refactor benchmarks script (#761)
- Update documentation to use base imports (#765)
- Update examples to use RAG pipeline instead of Extractor when paired with LLMs (#766)
- Modify NumPy and Torch ANN components to use np.load/np.save (#772)
- Persist Embeddings index ids (only used when content storage is disabled) with MessagePack (#773)
- Persist Reducer component with skops library (#774)
- Persist NetworkX graph component with MessagePack (#775)
- Persist Scoring component metadata with MessagePack (#776)
- Modify vector transforms to load/save data using np.load/np.save (#777)
- Refactor embeddings configuration into separate component (#778)
- Document txtai index format (#779)
Bug Fixes
v7.3.0
This release adds a new RAG front-end application template, streaming LLM and streaming RAG support along with significant text extraction improvements
See below for full details on the new features, improvements and bug fixes.
New Features
- Add support for streaming LLM generation (#680)
- Add RAG API endpoint (#735)
- Add RAG deepdive notebook (#737)
- Add RAG example application (#743)
Improvements
- Improve textractor pipeline (#748)
- Can't specify embedding model via API? (#632)
- Configuration documentation update request (#705)
- RAG alias for Extractor (#732)
- Rename Extractor pipeline to RAG (#736)
- Support max_seq_length parameter with model pooling (#746)
Bug Fixes
v7.2.0
This release adds Postgres integration for all components, LLM Chat Messages and vectorization with llama.cpp/LiteLLM
See below for full details on the new features, improvements and bug fixes.
New Features
- Add pgvector ANN backend (#698)
- Add RDBMS Graph (#699)
- Add notebook covering txtai integration with Postgres (#701)
- Add Postgres Full Text Scoring (#713)
- Add support for chat messages in LLM pipeline (#718)
- Add support for LiteLLM vector backend (#725)
- Add support for llama.cpp vector backend (#726)
- Add notebook showing to run RAG with llama.cpp and LiteLLM (#728)
Improvements
- Split similarity extras install (#696)
- Ensure config.path = None and config.path missing mean the same thing (#704)
- Add close methods to ANN and Graph (#711)
- Update finalizers to check object attributes haven't already been cleared (#722)
- Update LLM pipeline to support GPU parameter with llama.cpp backend (#724)
- Refactor vector module to support additional backends (#727)