This release adds Docling integration, Embeddings context managers and significant database component enhancements
See below for full details on the new features, improvements and bug fixes.
New Features
- Add text extraction with Docling (#814)
- Add Embeddings context manager (#832)
- Add support for halfvec and bit vector types with PGVector ANN (#839)
- Persist embeddings components to specified schema (#829)
- Add example notebook that analyzes the Hugging Face Posts dataset (#817)
- Add an example notebook for autonomous agents (#820)
Improvements
- Cloud storage improvements (#821)
- Autodetect Model2Vec model paths (#822)
- Add parameter to disable text cleaning in Segmentation pipeline (#823)
- Refactor vectors package (#826)
- Refactor Textractor pipeline into multiple pipelines (#828)
- RDBMS graph.delete tests and upgrade graph dependency (#837)
- Bound ANN hamming scores between 0.0 and 1.0 (#838)
Bug Fixes
- Fix error with inferring function parameters in agents (#816)
- Add programmatic workaround for Faiss + macOS (#818) Thank you @yukiman76!
- docs: update 49_External_database_integration.ipynb (#819) Thank you @eltociear!
- Fix memory issue with llama.cpp LLM pipeline (#824)
- Fix issue with calling cached_file for local directories (#825)
- Fix resource issues with embeddings indexing components backed by databases (#831)
- Fix bug with NetworkX.hasedge method (#834)