Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Added lancedb as an optional dependency Signed-off-by: Marcel Coetzee <[email protected]> * Added lancedb to dependencies in test workflow Signed-off-by: Marcel Coetzee <[email protected]> * Add initial capabilities for LanceDB destination Signed-off-by: Marcel Coetzee <[email protected]> * Added new lancedb_adapter Signed-off-by: Marcel Coetzee <[email protected]> * Added LanceDB factory in destinations implementation Signed-off-by: Marcel Coetzee <[email protected]> * Added LanceDB client configuration with embedding details Signed-off-by: Marcel Coetzee <[email protected]> * Added LanceDB Client with data load and schema management functionalities Signed-off-by: Marcel Coetzee <[email protected]> * Lockfile Signed-off-by: Marcel Coetzee <[email protected]> * Wireframe LanceDB client implementation Signed-off-by: Marcel Coetzee <[email protected]> * Add abstract methods Signed-off-by: Marcel Coetzee <[email protected]> * Enhance LanceDB client with additional functionality Signed-off-by: Marcel Coetzee <[email protected]> * Add tests and GitHub workflow for LanceDB destination Signed-off-by: Marcel Coetzee <[email protected]> * Update Python version to 3.11.x in GitHub workflow Signed-off-by: Marcel Coetzee <[email protected]> * Refactor and cleanup LanceDBClient and LoadLanceDBJob classes Signed-off-by: Marcel Coetzee <[email protected]> * Refactor load tests in lancedb/utils.py and add test for LanceDB model inference Signed-off-by: Marcel Coetzee <[email protected]> * Added functionality to infer LanceDB model from data and refactored name for reserved fields Signed-off-by: Marcel Coetzee <[email protected]> * Remove storage options Storage options are only available in asynchronous Python API. See https://lancedb.github.io/lancedb/guides/storage/ Signed-off-by: Marcel Coetzee <[email protected]> * Refactor test pipeline and implement lancedb_adapter in LanceDBClient Signed-off-by: Marcel Coetzee <[email protected]> * Add schema argument to LoadLanceDBJob function Signed-off-by: Marcel Coetzee <[email protected]> * Format Signed-off-by: Marcel Coetzee <[email protected]> * Refactor LanceDB related code and increase type hint coverage Signed-off-by: Marcel Coetzee <[email protected]> * Refactor LanceDB client and tests, enhance DB type mapping Signed-off-by: Marcel Coetzee <[email protected]> * Refactor code to improve readability by reducing line breaks Signed-off-by: Marcel Coetzee <[email protected]> * Refactor LanceDB client code by adding schema_conversion and utils modules Signed-off-by: Marcel Coetzee <[email protected]> * Remove redundant variables in lancedb_client.py Signed-off-by: Marcel Coetzee <[email protected]> * Refactor code to improve readability and move environment variable set function to utils.py Signed-off-by: Marcel Coetzee <[email protected]> * Refactor LanceDB client implementation and error handling Signed-off-by: Marcel Coetzee <[email protected]> * Refactor code for better readability and add type ignore comments Signed-off-by: Marcel Coetzee <[email protected]> * Added lancedb as an optional dependency Signed-off-by: Marcel Coetzee <[email protected]> * Added lancedb to dependencies in test workflow Signed-off-by: Marcel Coetzee <[email protected]> * Add initial capabilities for LanceDB destination Signed-off-by: Marcel Coetzee <[email protected]> * Added new lancedb_adapter Signed-off-by: Marcel Coetzee <[email protected]> * Added LanceDB factory in destinations implementation Signed-off-by: Marcel Coetzee <[email protected]> * Added LanceDB client configuration with embedding details Signed-off-by: Marcel Coetzee <[email protected]> * Added LanceDB Client with data load and schema management functionalities Signed-off-by: Marcel Coetzee <[email protected]> * Wireframe LanceDB client implementation Signed-off-by: Marcel Coetzee <[email protected]> * Add abstract methods Signed-off-by: Marcel Coetzee <[email protected]> * Enhance LanceDB client with additional functionality Signed-off-by: Marcel Coetzee <[email protected]> * Add tests and GitHub workflow for LanceDB destination Signed-off-by: Marcel Coetzee <[email protected]> * Update Python version to 3.11.x in GitHub workflow Signed-off-by: Marcel Coetzee <[email protected]> * Refactor and cleanup LanceDBClient and LoadLanceDBJob classes Signed-off-by: Marcel Coetzee <[email protected]> * Refactor load tests in lancedb/utils.py and add test for LanceDB model inference Signed-off-by: Marcel Coetzee <[email protected]> * Added functionality to infer LanceDB model from data and refactored name for reserved fields Signed-off-by: Marcel Coetzee <[email protected]> * Remove storage options Storage options are only available in asynchronous Python API. See https://lancedb.github.io/lancedb/guides/storage/ Signed-off-by: Marcel Coetzee <[email protected]> * Refactor test pipeline and implement lancedb_adapter in LanceDBClient Signed-off-by: Marcel Coetzee <[email protected]> * Add schema argument to LoadLanceDBJob function Signed-off-by: Marcel Coetzee <[email protected]> * Format Signed-off-by: Marcel Coetzee <[email protected]> * Refactor LanceDB related code and increase type hint coverage Signed-off-by: Marcel Coetzee <[email protected]> * Refactor LanceDB client and tests, enhance DB type mapping Signed-off-by: Marcel Coetzee <[email protected]> * Refactor code to improve readability by reducing line breaks Signed-off-by: Marcel Coetzee <[email protected]> * Refactor LanceDB client code by adding schema_conversion and utils modules Signed-off-by: Marcel Coetzee <[email protected]> * Remove redundant variables in lancedb_client.py Signed-off-by: Marcel Coetzee <[email protected]> * Refactor code to improve readability and move environment variable set function to utils.py Signed-off-by: Marcel Coetzee <[email protected]> * Refactor LanceDB client implementation and error handling Signed-off-by: Marcel Coetzee <[email protected]> * Refactor code for better readability and add type ignore comments Signed-off-by: Marcel Coetzee <[email protected]> * Dependency Versioning Signed-off-by: Marcel Coetzee <[email protected]> * Remove unnecessary dependencies and update lancedb and pylance versions Signed-off-by: Marcel Coetzee <[email protected]> * Silence mypy warnings Signed-off-by: Marcel Coetzee <[email protected]> * Revert mypy ignores Signed-off-by: Marcel Coetzee <[email protected]> * Revert mypy ignores Signed-off-by: Marcel Coetzee <[email protected]> * Fix versioning with 3.8 Signed-off-by: Marcel Coetzee <[email protected]> * Fix versioning Signed-off-by: Marcel Coetzee <[email protected]> * Update default URI and dataset separator in LanceDB configuration Signed-off-by: Marcel Coetzee <[email protected]> * Refactor LanceDB typemapper with timestamp and decimal precision adjustments Signed-off-by: Marcel Coetzee <[email protected]> * Updated method for retrieving sentinel table name Signed-off-by: Marcel Coetzee <[email protected]> * Remove redundant table normalisation for version_table_name Signed-off-by: Marcel Coetzee <[email protected]> * Refactor LanceDB functionalities and improve handling of optional embedding fields Signed-off-by: Marcel Coetzee <[email protected]> * Refactor LanceDBClient and update parameter defaults in schema.py Signed-off-by: Marcel Coetzee <[email protected]> * Added lancedb to default vector configs and improved type annotations in tests. Signed-off-by: Marcel Coetzee <[email protected]> * Return self in enter context manager method Signed-off-by: Marcel Coetzee <[email protected]> * Handle FileNotFoundError Signed-off-by: Marcel Coetzee <[email protected]> * Replace FileNotFoundError with DestinationUndefinedEntity in lancedb_client.py Signed-off-by: Marcel Coetzee <[email protected]> * Refactor LanceDB client for simplified table name handling Signed-off-by: Marcel Coetzee <[email protected]> * Refactored LanceDB schema creation and storage update processes to pyarrow Signed-off-by: Marcel Coetzee <[email protected]> * Remove LanceModels Signed-off-by: Marcel Coetzee <[email protected]> * Ensure 'records' is a list in lancedb_client.py Signed-off-by: Marcel Coetzee <[email protected]> * Refactor code and add batch error handling in lancedb client Signed-off-by: Marcel Coetzee <[email protected]> * Refactor LanceDB client and schema for improved embedding handling Signed-off-by: Marcel Coetzee <[email protected]> * Improve error handling and retries in LanceDB client Signed-off-by: Marcel Coetzee <[email protected]> * Add error decorator to get_stored_state method in lancedb_client Signed-off-by: Marcel Coetzee <[email protected]> * Change error handling from FileNotFoundError to IndexError Signed-off-by: Marcel Coetzee <[email protected]> * Refactor lancedb_client.py and add error decorators Signed-off-by: Marcel Coetzee <[email protected]> * Add configurable read consistency to LanceDB client Signed-off-by: Marcel Coetzee <[email protected]> * Versioning Signed-off-by: Marcel Coetzee <[email protected]> * Refactor code for readability and change return type in tests Signed-off-by: Marcel Coetzee <[email protected]> * Update queries in lancedb_client to order by insertion date Signed-off-by: Marcel Coetzee <[email protected]> * Refactor LanceDB client and schema for better table creation and management Signed-off-by: Marcel Coetzee <[email protected]> * Combine "skip" and "append" write dispositions in batch upload Signed-off-by: Marcel Coetzee <[email protected]> * Add schema version hash check in LanceDB client write operations Signed-off-by: Marcel Coetzee <[email protected]> * Remove testing code Signed-off-by: Marcel Coetzee <[email protected]> * Refactor return statement in lancedb_client for successful state loads Signed-off-by: Marcel Coetzee <[email protected]> * Update lancedb_client.py to improve table handling and embedding fields Signed-off-by: Marcel Coetzee <[email protected]> * Refactor LanceDB schema generation and handle metadata for embedding functions Signed-off-by: Marcel Coetzee <[email protected]> * Refactor schema creation and remove unused code Signed-off-by: Marcel Coetzee <[email protected]> * Add mapping for provider environment variables and update schema comment Signed-off-by: Marcel Coetzee <[email protected]> * Update package versions in pyproject.toml and poetry.lock Signed-off-by: Marcel Coetzee <[email protected]> * Refactor LanceDB utils and client, handle exception and remove unnecessary comment Signed-off-by: Marcel Coetzee <[email protected]> * Refactor utility functions in lancedb tests Signed-off-by: Marcel Coetzee <[email protected]> * Update 'replace' mode and improve table handling in lancedb client Signed-off-by: Marcel Coetzee <[email protected]> * Refactor assert_unordered_list_equal to handle dictionaries Signed-off-by: Marcel Coetzee <[email protected]> * Refactor code for better readability and remove unnecessary blank lines Signed-off-by: Marcel Coetzee <[email protected]> * Refactor code for readability and remove redundant comments Signed-off-by: Marcel Coetzee <[email protected]> * Update sentinel table name in test_pipeline.py Signed-off-by: Marcel Coetzee <[email protected]> * "Add order by clause to database query in lancedb_client" Signed-off-by: Marcel Coetzee <[email protected]> * Use super method to reduce redundancy Signed-off-by: Marcel Coetzee <[email protected]> * Syntax Signed-off-by: Marcel Coetzee <[email protected]> * Remove bare except clauses Signed-off-by: Marcel Coetzee <[email protected]> * Revert "Remove bare except clauses" This reverts commit 3b44631. * Remove bare except clause Signed-off-by: Marcel Coetzee <[email protected]> * Remove bare except clause Signed-off-by: Marcel Coetzee <[email protected]> * Remove bare except clause Signed-off-by: Marcel Coetzee <[email protected]> * Remove bare except clause Signed-off-by: Marcel Coetzee <[email protected]> * Refactor error handling in LanceDB client Signed-off-by: Marcel Coetzee <[email protected]> * Add configurable sentinel table name in LanceDB client configuration Signed-off-by: Marcel Coetzee <[email protected]> * Update embedding model config and schema in LanceDB Signed-off-by: Marcel Coetzee <[email protected]> * Refactor lancedb_client.py, remove unused methods and imports Signed-off-by: Marcel Coetzee <[email protected]> * Add support for adding multiple fields to LanceDB table in a single operation Signed-off-by: Marcel Coetzee <[email protected]> * Only filter by successful loads Signed-off-by: Marcel Coetzee <[email protected]> * Remove redundant exception handling in JSON extraction Signed-off-by: Marcel Coetzee <[email protected]> * Refactor lancedb_client.py for better code readability Signed-off-by: Marcel Coetzee <[email protected]> * Refactor lancedb_client.py for improved code readability Signed-off-by: Marcel Coetzee <[email protected]> * Fix module docstring Signed-off-by: Marcel Coetzee <[email protected]> * Remove embedding_fields from make_arrow_field_schema function Signed-off-by: Marcel Coetzee <[email protected]> * Add merge key support Signed-off-by: Marcel Coetzee <[email protected]> * Refactor `get_stored_state` to perform join in memory Signed-off-by: Marcel Coetzee <[email protected]> * Packaging Signed-off-by: Marcel Coetzee <[email protected]> * Format Signed-off-by: Marcel Coetzee <[email protected]> * Update dependencies in GitHub workflow for testing lancedb Signed-off-by: Marcel Coetzee <[email protected]> * Add "cohere" to package dependencies in pyproject.toml Signed-off-by: Marcel Coetzee <[email protected]> * Dependencies Signed-off-by: Marcel Coetzee <[email protected]> * Update dependencies installation in GitHub workflow Signed-off-by: Marcel Coetzee <[email protected]> * Dependencies Signed-off-by: Marcel Coetzee <[email protected]> * Update dependency in GitHub workflow Signed-off-by: Marcel Coetzee <[email protected]> * Dependencies Signed-off-by: Marcel Coetzee <[email protected]> * Dependencies Signed-off-by: Marcel Coetzee <[email protected]> * Add documentation for LanceDB Signed-off-by: Marcel Coetzee <[email protected]> * Add limitations Signed-off-by: Marcel Coetzee <[email protected]> * Offload ordering logic from LanceDB Signed-off-by: Marcel Coetzee <[email protected]> * Update import statements in lancedb client and exceptions files Signed-off-by: Marcel Coetzee <[email protected]> * Create `_get_table_name` getter Signed-off-by: Marcel Coetzee <[email protected]> * Format Signed-off-by: Marcel Coetzee <[email protected]> * Avoid race conditions by delegating all state management to dlt Signed-off-by: Marcel Coetzee <[email protected]> * Imports Signed-off-by: Marcel Coetzee <[email protected]> * small doc and test fixes * Fix OpenAI embedding handling of empty strings Replace empty strings with a placeholder before sending to the OpenAI API, and handle the placeholder as an empty embedding in the results. This avoids BadRequestErrors from the API when empty strings are present in the input data. Implemented by subclassing OpenAIEmbeddings and overriding sanitize_input and generate_embeddings methods. Signed-off-by: Marcel Coetzee <[email protected]> * Add 'embeddings' dependencies manually Signed-off-by: Marcel Coetzee <[email protected]> * Finally... Signed-off-by: Marcel Coetzee <[email protected]> * Dependencies Signed-off-by: Marcel Coetzee <[email protected]> * Dependencies Signed-off-by: Marcel Coetzee <[email protected]> * Docs Signed-off-by: Marcel Coetzee <[email protected]> * Remove superfluous helper method. Signed-off-by: Marcel Coetzee <[email protected]> * Lock File Signed-off-by: Marcel Coetzee <[email protected]> * Make api_key and embedding_model_provider_api_key optional Signed-off-by: Marcel Coetzee <[email protected]> * Clear environment for config test Signed-off-by: Marcel Coetzee <[email protected]> * Minor test config Signed-off-by: Marcel Coetzee <[email protected]> * test config Signed-off-by: Marcel Coetzee <[email protected]> * lancedb config Signed-off-by: Marcel Coetzee <[email protected]> * Config test Signed-off-by: Marcel Coetzee <[email protected]> * Config Signed-off-by: Marcel Coetzee <[email protected]> * config Signed-off-by: Marcel Coetzee <[email protected]> * Import lancedb_adapter function instead of module in adapter collection module Signed-off-by: Marcel Coetzee <[email protected]> * Clarify embedding facilities in LanceDB docs Signed-off-by: Marcel Coetzee <[email protected]> * update lancedb to support new naming setup (cleanup will follow) * update lockfile --------- Signed-off-by: Marcel Coetzee <[email protected]> Co-authored-by: Dave <[email protected]>
- Loading branch information