Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

master merge for 0.5.1 release #1561

Merged
merged 62 commits into from
Jul 8, 2024
Merged

master merge for 0.5.1 release #1561

merged 62 commits into from
Jul 8, 2024

Conversation

rudolfix
Copy link
Collaborator

@rudolfix rudolfix commented Jul 8, 2024

Description

master merge for 0.5.1 release
closes #1486

rudolfix and others added 30 commits May 29, 2024 10:19
…nation (#1423)

* Add HMAC credentials and update Clickhouse configuration

Signed-off-by: Marcel Coetzee <[email protected]>

* Revert "Add HMAC credentials and update Clickhouse configuration"

This reverts commit cb80c6b.

* Refactor error handling for storage authentication in Clickhouse

Signed-off-by: Marcel Coetzee <[email protected]>

* Revert "Refactor error handling for storage authentication in Clickhouse"

This reverts commit f24eb1d.

* Remove GCS ClickHouse buckets in CI until named destinations are supported

Signed-off-by: Marcel Coetzee <[email protected]>

* Add GCS S3 compatibility test, remove GCP credentials from Clickhouse

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor ClickHouse test code for better readability

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor endpoint handling and update GCS bucket configuration

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor test for clickhouse gcs_s3 compatibility

Signed-off-by: Marcel Coetzee <[email protected]>

* Update ClickHouse docs and tests for S3-compatible staging

Signed-off-by: Marcel Coetzee <[email protected]>

* Update ClickHouse documentation on staging areas

Signed-off-by: Marcel Coetzee <[email protected]>

---------

Signed-off-by: Marcel Coetzee <[email protected]>
* Rename full_refresh -> dev_mode, add deprecation warning

* Replace some full_refresh usage in code and docs

* Replace full_refresh usage in tests

* Init experimental refresh = full with drop command

* Refresh modes with dropped_tables file

* Init separate local/write drop command

* Use load_package_state instead of drop tables file

* Use drop schema in init_client (TODO: error)

* Separate cli drop command instructions/execute

* drop tables in init_client

* dropped_tables field on load package state

* Fix import

* test/fix truncate mode

* Save truncated tables in load package state

* Remove load package state copying

* cleanup

* Drop cmd use package state, refactoring

* Don't drop tables without data

* Validate literals in configspec

* Match stored schema by version+version_hash

solves detecting when dropped tables need to be recreated

* Cleanup

* Fix dlt version test

* Cleanup

* Remove dropped_tables_filename

* Fix snippet

* Pipeline refresh docs

* refresh argument docstring

* Restore and update commented drop cmd tests

* Cleanup refresh tests and test whether table dropped vs truncated

* Test existing schema hash

* Revert "Match stored schema by version+version_hash"

This reverts commit 689b3ca.

* Use replace_schema flag

* Change drop_tables replace_schema to delete_schema

* Refresh drop only selected sources

* Rename refresh modes, update docs

* pipeline.run/extract refresh argument

* Don't modify schema when refresh='drop_data'

* Move refresh tests to load, add filesystem truncate test

* Fix duck import

* Remove generated doc sections

* Default full_refresh=None

* Cleanup unused imports

* Close caution blocks

* Update config field docstring

* Add filesystem drop_tables method

* Run all refresh tests on local filesystem destination

* Fix test drop

* Fix iter filesystem schemas

* Fix drop_resources

* Default config.full_refresh also None

* Fix filesystem + test
* runs motherduck init on ci

* fixes edge cases for optional new types, new types of optional types and literal detection

* skips streamlit tests if not installed

* defines singleton sentinels for dlt.config.value and dlt.secrets.value

* uses sentinels to detect config and secret values, removes source code unparsing

* converts sentinels if used in configspec to right defaults

* adds SPECs to callables as attributes

* simplifies and fixes nested dict update, adds dict clone in utils

* adds several missing config injects tests

* gives precedence to apply_hints when setting incremental, fixes resolve and merge configs, detect EMPTY incremental on resolve

* moves wrapping resources in config and incremental wrappers from decorator to resources, rewraps resource on clone to separate sections and incremental instances

* fixes mssql examples in sql_database docs

* allows to use None as explicit value when resolving config, allows to use sentinels to request injected values

* adds is_subclass working with type aliases

* tests configspecs with generics
* add example

* move postgres to postgres example to separate folder, change credentials setting

* deleted unused functions, uncomment norm and loading

* fix bugs, reformat, rename vars

* delete old file, fix typing

* make title shorter

* fix type for load_type

* ignore load_type type

* add tests

* fix typing

* fix dateime

* add deps

* update lock file

* merge and update lock file

* install duckdb extensions

* fix install duckdb extensions

* fix install duckdb extensions

* fix install duckdb extensions

* fix version of duckdb for extensions

* fix path to duckdb dump

* update duckdb

* update duckdb in makefile

* delete manual duckdb extension installation

---------

Co-authored-by: sspaeti <[email protected]>
* add delta table support for filesystem destination

* Merge branch 'refs/heads/devel' into 978-filesystem-delta-table

* remove duplicate method definition

* make property robust

* exclude high-precision decimal columns

* make delta imports conditional

* include pyarrow in deltalake dependency

* install extra deltalake dependency

* disable high precision decimal arrow test columns by default

* include arrow max precision decimal column

* introduce directory job and refactor delta table code

* refactor delta table load

* revert import changes

* add delta table format child table handling

* make table_format key lookups robust

* write remote path to reference file

* add supported table formats and file format adapter to destination capabilities

* remove jsonl and parquet from table formats

* add object_store rust crate credentials handling

* add deltalake_storage_options to filesystem config

* move function to top level to prevent multiprocessing pickle error

* add new deltalake_storage_options filesystem config key to tests

* replace secrets with dummy values in test

* reorganize object_store rust crate credentials tests

* add delta table format docs

* move delta table logical delete logic to filesystem client

* rename pyarrow lib method names

* rename utils to delta_utils

* import pyarrow from dlt common libs

* move delta lake utitilities to module in dlt common libs

* import delta lake utils early to assert dependencies availability

* handle file format adaptation at table level

* initialize file format variables

* split delta table format tests

* handle table schema is None case

* add test for dynamic dispatching of delta tables

* mark core delta table test as essential

* simplify item normalizer dict key

* make list copy to prevent in place mutations

* add extra deltalake dependency

* only test deltalake lib on local filesystem

* properly evaluates lazy annotations

* uses base FilesystemConfiguration from common in libs

* solves union type reordering due to caching and clash with delta-rs DeltaTable method signature

* creates a table with just root name to cache item normalizers properly

---------

Co-authored-by: Jorrit Sandbrink <[email protected]>
Co-authored-by: Marcin Rudolf <[email protected]>
* Updated source/filesystem docs with explanations for bucket URLs

* Updated

* Updated as per comments
* add test and some notes on existing table

* small update

* add section to destination tables
* fix error on missing nullable hint

* remove unneeded function (and unrelated formatting :) )
* update dependencies for databricks/dbt

* use kwargs if args not defined, fix typing

* Revert to use inline params to keep support for 13.x cluster

* Typing fix

* adds dbt support for mssql

* converts dbt deps from extra to group, allows databricks client >2.9.3

* fixes dict to env util

* limits dbt version to <1.8 in destination tests

* skips chess dbt package for mssql

---------

Co-authored-by: Oon Tong Tan <[email protected]>
Co-authored-by: Marcin Rudolf <[email protected]>
* add service principal auth support for synapse copy into job

* remove unused mypy ignore statements

* add blank lines

* re-add mypy ignore statements

* make synapse test conditional on active destinations
* add support for parallelism strategies and max worker overriding from the destination

* add direct support for custom destination

* add tests

* clean up print statement

* make concurrency test more lenient

* review fixes:
* remove underscore from settingsvalue
* simplify job selection function
* let config always override destination settings

* add info to docs

* sort jobs before grouping
* update sentry dependency

* migrate to sentry 2.0

* implement current scope span accessor
* fix: allow loggeradapter in addition to logger in logcollector

the _log method in LogCollector now checks for either logging.Logger or logging.LoggerAdapter unlike previously were it only allowed the former.

* fix: allow loggeradapter in addition to logger in logcollector

The _log method in LogCollector now checks for either logging.Logger or logging.LoggerAdapter unlike previously were it only allowed the former.
* Add load_id to arrow tables in extract step instead of normalize

* Test arrow load id in extract

* Get normalize config without decorator

* Normalize load ID column name

* Load ID column goes last

* adds update_table column order tests

---------

Co-authored-by: Marcin Rudolf <[email protected]>
* set default next item mode to round robin

* fix sources tests

* fix some existing tests to work again

* parametrize one test
* correctly handles explicit initial values, still allowing optional args to be resolved

* allows pure authenticator, allows to specify token right in the credentials

* adds base method to generate typed query params in connection string credentials, serializes to str for to_url

* drops resolve() from __init__ and parse_native_value methods

* updates snowflake docs

* runs native value parsing for side effects
* Added how to retrieve secrets using google secret manager

* Added some minor corrections to snowflake docs

* Updated as per comments

* Fixing linting error

* small correction

---------

Co-authored-by: Alena <[email protected]>
jorritsandbrink and others added 27 commits June 26, 2024 15:24
* remove metrics resource

* fix link

* Update docs/website/docs/dlt-ecosystem/verified-sources/stripe.md

* Update docs/website/docs/dlt-ecosystem/verified-sources/stripe.md

---------

Co-authored-by: Anton Burnashev <[email protected]>
* allows to decorate async function with dlt.source

* adds pytest-async and updates pytest to 7.x

* fixes forked teardown issue 7.x

* bumps deps for py 3.12

* adds py 12 common tests

* fixes typings after deps bump

* bumps airflow, yanks duckdb to 0.9.2

* fixes tests

* fixes pandas version

* adds 3.12 duckdb dep

* adds right hand pipe operator

* fixes docker ci build

* adds docs on async sources and resources

* normalizes default hints and preferred types in schema

* defines pipeline state table in utils, column normalization in simple regex

* normalizes all identifiers used by relational normalizer, fixes other modules

* fixes sql job client to use normalized identifiers in queries

* runs state sync tests for lower and upper case naming conventions

* fixes weaviate to use normalized identifiers in queries

* partially fixes qdrant incorrect state and version retrieval queries

* initial sql uppercase naming convention

* adds native df readers to databricks and bigquery

* adds casing identifier capability to support different casing in naming conventions, fixes how identifiers are normalized in destinations

* cleans typing for relational normalizer

* renames escape functions

* destination capabilities for case fold and case sensitivity

* drops supports naming module and allows naming to be instance in config and schema

* checks all tables in information schema in one go, observes case folding and sensitivity in sql destinations

* moves schema verification to destination utils

* adds method to remove processing hints from schema, helper functions for schema settings, refactor, tests

* accepts naming convention instances when resolving configs

* fixes the cloning of schema in decorator, removes processing hints

* removes processing hints when saving imported schema

* adds docs on naming conventions, removes technical docs

* adds casing info to databrick caps, makes caps an instance attr

* adjusts destination casing in caps from schema naming and config

* raises detailed schema identifier clash exceptions

* adds is_case_sensitive and name to NamingConvention

* adds sanity check if _dlt prefix is preserved

* finds genric types in non generic classes deriving from generic

* uses casefold INSERT VALUES job column names

* adds a method make_qualified_table_name_path that calculates components of fully qualified table name and uses it to query INFO SCHEMA

* adds casing info to destinations, caps as instance attrs, custom table name paths

* adds naming convention to restore state tests, make them essential

* fixes table builder tests

* removes processing hints when exporting schema to import folder, warns on schema import overriding local schema, warns on processing hints present

* allows to subclass INFO SCHEMA query generation and uses specialized big query override

* uses correct schema escaping function in sql jobs

* passes pipeline state to package state via extract

* fixes optional normalizers module

* excludes version_hash from pipeline state SELECT

* passes pipeline state to package state pt.2

* re-enables sentry tests

* bumps qdrant client, makes test running for local version

* makes weaviate running

* uses schemata to find databases on athena

* uses api get_table for hidden dataset on bigquery to reflect schemas, support case insensitive datasets

* adds naming conventions to two restore state tests

* fixes escape identifiers to column escape

* fix conflicts in docs

* adjusts capabilities in capabilities() method, uses config and naming optionally

* allows to add props to classes without vectorizer in weaviate

* moves caps function into factories, cleansup adapters and custom destination

* sentry_dsn

* adds basic destination reference tests

* fixes table builder tests

* fix deps and docs

* fixes more tests

* case sensitivity docs stubs

* fixes drop_pipeline fixture

* improves partial config generation for capabilities

* adds snowflake csv support

* creates separate csv tests

* allows to import files into extract storage, adds import file writer and spec

* handles ImportFileMeta in extractor

* adds import file item normalizer and router to normalize

* supports csv format config for snowflake

* removes realpath wherever possible and adds fast make_full_path to FileStorage

* adds additional methods to load_package storage to make listings faster

* adds file_format to dlt.resource, uses preferred file format for dlt state table

* docs for importing files, file_format

* code improvements and tests

* docs hard links note

* moves loader parallelism test to pipeliens, solves duckdb ci test error issue

* fixes tests

* moves drop_pipeline fixture level up

* drops default naming convention from caps so naming in saved schema persists, allows (section, <schema_name>, schema) config section for schema settings

* unifies all representations of pipeline state

* tries to decompress text file first in fs_client

* tests get stored state in test_job_client

* removes credentials from dlt.attach, addes destination and staging factories

* cleans up env variables and pipeline dropping fixutere precedence

* removes dev_mode from dlt.attach

* adds missing arguments to filesystem factory

* fixes tests

* updates destination and naming convention docs

* removes is_case_sensitive from naming convention initializer

* simplifies with_file_import mark

* adds case sensitivity tests

* uses dev_mode everywhere

* improves csv docs

* fixes encodings in fsspec

* improves naming convention docs

* fixes tests and renames clash to collision

* fixes getting original bases from instance
* apply forked decorator to example tests

* add missing pytest import
* Added lancedb as an optional dependency

Signed-off-by: Marcel Coetzee <[email protected]>

* Added lancedb to dependencies in test workflow

Signed-off-by: Marcel Coetzee <[email protected]>

* Add initial capabilities for LanceDB destination

Signed-off-by: Marcel Coetzee <[email protected]>

* Added new lancedb_adapter

Signed-off-by: Marcel Coetzee <[email protected]>

* Added LanceDB factory in destinations implementation

Signed-off-by: Marcel Coetzee <[email protected]>

* Added LanceDB client configuration with embedding details

Signed-off-by: Marcel Coetzee <[email protected]>

* Added LanceDB Client with data load and schema management functionalities

Signed-off-by: Marcel Coetzee <[email protected]>

* Lockfile

Signed-off-by: Marcel Coetzee <[email protected]>

* Wireframe LanceDB client implementation

Signed-off-by: Marcel Coetzee <[email protected]>

* Add abstract methods

Signed-off-by: Marcel Coetzee <[email protected]>

* Enhance LanceDB client with additional functionality

Signed-off-by: Marcel Coetzee <[email protected]>

* Add tests and GitHub workflow for LanceDB destination

Signed-off-by: Marcel Coetzee <[email protected]>

* Update Python version to 3.11.x in GitHub workflow

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor and cleanup LanceDBClient and LoadLanceDBJob classes

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor load tests in lancedb/utils.py and add test for LanceDB model inference

Signed-off-by: Marcel Coetzee <[email protected]>

* Added functionality to infer LanceDB model from data and refactored name for reserved fields

Signed-off-by: Marcel Coetzee <[email protected]>

* Remove storage options

Storage options are only available in asynchronous Python API. See https://lancedb.github.io/lancedb/guides/storage/

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor test pipeline and implement lancedb_adapter in LanceDBClient

Signed-off-by: Marcel Coetzee <[email protected]>

* Add schema argument to LoadLanceDBJob function

Signed-off-by: Marcel Coetzee <[email protected]>

* Format

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor LanceDB related code and increase type hint coverage

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor LanceDB client and tests, enhance DB type mapping

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor code to improve readability by reducing line breaks

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor LanceDB client code by adding schema_conversion and utils modules

Signed-off-by: Marcel Coetzee <[email protected]>

* Remove redundant variables in lancedb_client.py

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor code to improve readability and move environment variable set function to utils.py

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor LanceDB client implementation and error handling

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor code for better readability and add type ignore comments

Signed-off-by: Marcel Coetzee <[email protected]>

* Added lancedb as an optional dependency

Signed-off-by: Marcel Coetzee <[email protected]>

* Added lancedb to dependencies in test workflow

Signed-off-by: Marcel Coetzee <[email protected]>

* Add initial capabilities for LanceDB destination

Signed-off-by: Marcel Coetzee <[email protected]>

* Added new lancedb_adapter

Signed-off-by: Marcel Coetzee <[email protected]>

* Added LanceDB factory in destinations implementation

Signed-off-by: Marcel Coetzee <[email protected]>

* Added LanceDB client configuration with embedding details

Signed-off-by: Marcel Coetzee <[email protected]>

* Added LanceDB Client with data load and schema management functionalities

Signed-off-by: Marcel Coetzee <[email protected]>

* Wireframe LanceDB client implementation

Signed-off-by: Marcel Coetzee <[email protected]>

* Add abstract methods

Signed-off-by: Marcel Coetzee <[email protected]>

* Enhance LanceDB client with additional functionality

Signed-off-by: Marcel Coetzee <[email protected]>

* Add tests and GitHub workflow for LanceDB destination

Signed-off-by: Marcel Coetzee <[email protected]>

* Update Python version to 3.11.x in GitHub workflow

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor and cleanup LanceDBClient and LoadLanceDBJob classes

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor load tests in lancedb/utils.py and add test for LanceDB model inference

Signed-off-by: Marcel Coetzee <[email protected]>

* Added functionality to infer LanceDB model from data and refactored name for reserved fields

Signed-off-by: Marcel Coetzee <[email protected]>

* Remove storage options

Storage options are only available in asynchronous Python API. See https://lancedb.github.io/lancedb/guides/storage/

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor test pipeline and implement lancedb_adapter in LanceDBClient

Signed-off-by: Marcel Coetzee <[email protected]>

* Add schema argument to LoadLanceDBJob function

Signed-off-by: Marcel Coetzee <[email protected]>

* Format

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor LanceDB related code and increase type hint coverage

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor LanceDB client and tests, enhance DB type mapping

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor code to improve readability by reducing line breaks

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor LanceDB client code by adding schema_conversion and utils modules

Signed-off-by: Marcel Coetzee <[email protected]>

* Remove redundant variables in lancedb_client.py

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor code to improve readability and move environment variable set function to utils.py

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor LanceDB client implementation and error handling

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor code for better readability and add type ignore comments

Signed-off-by: Marcel Coetzee <[email protected]>

* Dependency Versioning

Signed-off-by: Marcel Coetzee <[email protected]>

* Remove unnecessary dependencies and update lancedb and pylance versions

Signed-off-by: Marcel Coetzee <[email protected]>

* Silence mypy warnings

Signed-off-by: Marcel Coetzee <[email protected]>

* Revert mypy ignores

Signed-off-by: Marcel Coetzee <[email protected]>

* Revert mypy ignores

Signed-off-by: Marcel Coetzee <[email protected]>

* Fix versioning with 3.8

Signed-off-by: Marcel Coetzee <[email protected]>

* Fix versioning

Signed-off-by: Marcel Coetzee <[email protected]>

* Update default URI and dataset separator in LanceDB configuration

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor LanceDB typemapper with timestamp and decimal precision adjustments

Signed-off-by: Marcel Coetzee <[email protected]>

* Updated method for retrieving sentinel table name

Signed-off-by: Marcel Coetzee <[email protected]>

* Remove redundant table normalisation for version_table_name

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor LanceDB functionalities and improve handling of optional embedding fields

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor LanceDBClient and update parameter defaults in schema.py

Signed-off-by: Marcel Coetzee <[email protected]>

* Added lancedb to default vector configs and improved type annotations in tests.

Signed-off-by: Marcel Coetzee <[email protected]>

* Return self in enter context manager method

Signed-off-by: Marcel Coetzee <[email protected]>

* Handle FileNotFoundError

Signed-off-by: Marcel Coetzee <[email protected]>

* Replace FileNotFoundError with DestinationUndefinedEntity in lancedb_client.py

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor LanceDB client for simplified table name handling

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactored LanceDB schema creation and storage update processes to pyarrow

Signed-off-by: Marcel Coetzee <[email protected]>

* Remove LanceModels

Signed-off-by: Marcel Coetzee <[email protected]>

* Ensure 'records' is a list in lancedb_client.py

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor code and add batch error handling in lancedb client

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor LanceDB client and schema for improved embedding handling

Signed-off-by: Marcel Coetzee <[email protected]>

* Improve error handling and retries in LanceDB client

Signed-off-by: Marcel Coetzee <[email protected]>

* Add error decorator to get_stored_state method in lancedb_client

Signed-off-by: Marcel Coetzee <[email protected]>

* Change error handling from FileNotFoundError to IndexError

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor lancedb_client.py and add error decorators

Signed-off-by: Marcel Coetzee <[email protected]>

* Add configurable read consistency to LanceDB client

Signed-off-by: Marcel Coetzee <[email protected]>

* Versioning

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor code for readability and change return type in tests

Signed-off-by: Marcel Coetzee <[email protected]>

* Update queries in lancedb_client to order by insertion date

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor LanceDB client and schema for better table creation and management

Signed-off-by: Marcel Coetzee <[email protected]>

* Combine "skip" and "append" write dispositions in batch upload

Signed-off-by: Marcel Coetzee <[email protected]>

* Add schema version hash check in LanceDB client write operations

Signed-off-by: Marcel Coetzee <[email protected]>

* Remove testing code

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor return statement in lancedb_client for successful state loads

Signed-off-by: Marcel Coetzee <[email protected]>

* Update lancedb_client.py to improve table handling and embedding fields

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor LanceDB schema generation and handle metadata for embedding functions

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor schema creation and remove unused code

Signed-off-by: Marcel Coetzee <[email protected]>

* Add mapping for provider environment variables and update schema comment

Signed-off-by: Marcel Coetzee <[email protected]>

* Update package versions in pyproject.toml and poetry.lock

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor LanceDB utils and client, handle exception and remove unnecessary comment

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor utility functions in lancedb tests

Signed-off-by: Marcel Coetzee <[email protected]>

* Update 'replace' mode and improve table handling in lancedb client

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor assert_unordered_list_equal to handle dictionaries

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor code for better readability and remove unnecessary blank lines

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor code for readability and remove redundant comments

Signed-off-by: Marcel Coetzee <[email protected]>

* Update sentinel table name in test_pipeline.py

Signed-off-by: Marcel Coetzee <[email protected]>

* "Add order by clause to database query in lancedb_client"

Signed-off-by: Marcel Coetzee <[email protected]>

* Use super method to reduce redundancy

Signed-off-by: Marcel Coetzee <[email protected]>

* Syntax

Signed-off-by: Marcel Coetzee <[email protected]>

* Remove bare except clauses

Signed-off-by: Marcel Coetzee <[email protected]>

* Revert "Remove bare except clauses"

This reverts commit 3b44631.

* Remove bare except clause

Signed-off-by: Marcel Coetzee <[email protected]>

* Remove bare except clause

Signed-off-by: Marcel Coetzee <[email protected]>

* Remove bare except clause

Signed-off-by: Marcel Coetzee <[email protected]>

* Remove bare except clause

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor error handling in LanceDB client

Signed-off-by: Marcel Coetzee <[email protected]>

* Add configurable sentinel table name in LanceDB client configuration

Signed-off-by: Marcel Coetzee <[email protected]>

* Update embedding model config and schema in LanceDB

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor lancedb_client.py, remove unused methods and imports

Signed-off-by: Marcel Coetzee <[email protected]>

* Add support for adding multiple fields to LanceDB table in a single operation

Signed-off-by: Marcel Coetzee <[email protected]>

* Only filter by successful loads

Signed-off-by: Marcel Coetzee <[email protected]>

* Remove redundant exception handling in JSON extraction

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor lancedb_client.py for better code readability

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor lancedb_client.py for improved code readability

Signed-off-by: Marcel Coetzee <[email protected]>

* Fix module docstring

Signed-off-by: Marcel Coetzee <[email protected]>

* Remove embedding_fields from make_arrow_field_schema function

Signed-off-by: Marcel Coetzee <[email protected]>

* Add merge key support

Signed-off-by: Marcel Coetzee <[email protected]>

* Refactor `get_stored_state` to perform join in memory

Signed-off-by: Marcel Coetzee <[email protected]>

* Packaging

Signed-off-by: Marcel Coetzee <[email protected]>

* Format

Signed-off-by: Marcel Coetzee <[email protected]>

* Update dependencies in GitHub workflow for testing lancedb

Signed-off-by: Marcel Coetzee <[email protected]>

* Add "cohere" to package dependencies in pyproject.toml

Signed-off-by: Marcel Coetzee <[email protected]>

* Dependencies

Signed-off-by: Marcel Coetzee <[email protected]>

* Update dependencies installation in GitHub workflow

Signed-off-by: Marcel Coetzee <[email protected]>

* Dependencies

Signed-off-by: Marcel Coetzee <[email protected]>

* Update dependency in GitHub workflow

Signed-off-by: Marcel Coetzee <[email protected]>

* Dependencies

Signed-off-by: Marcel Coetzee <[email protected]>

* Dependencies

Signed-off-by: Marcel Coetzee <[email protected]>

* Add documentation for LanceDB

Signed-off-by: Marcel Coetzee <[email protected]>

* Add limitations

Signed-off-by: Marcel Coetzee <[email protected]>

* Offload ordering logic from LanceDB

Signed-off-by: Marcel Coetzee <[email protected]>

* Update import statements in lancedb client and exceptions files

Signed-off-by: Marcel Coetzee <[email protected]>

* Create `_get_table_name` getter

Signed-off-by: Marcel Coetzee <[email protected]>

* Format

Signed-off-by: Marcel Coetzee <[email protected]>

* Avoid race conditions by delegating all state management to dlt

Signed-off-by: Marcel Coetzee <[email protected]>

* Imports

Signed-off-by: Marcel Coetzee <[email protected]>

* small doc and test fixes

* Fix OpenAI embedding handling of empty strings

Replace empty strings with a placeholder before sending to the OpenAI API, and handle the placeholder as an empty embedding in the results. This avoids BadRequestErrors from the API when empty strings are present in the input data.

Implemented by subclassing OpenAIEmbeddings and overriding sanitize_input and generate_embeddings methods.

Signed-off-by: Marcel Coetzee <[email protected]>

* Add 'embeddings' dependencies manually

Signed-off-by: Marcel Coetzee <[email protected]>

* Finally...

Signed-off-by: Marcel Coetzee <[email protected]>

* Dependencies

Signed-off-by: Marcel Coetzee <[email protected]>

* Dependencies

Signed-off-by: Marcel Coetzee <[email protected]>

* Docs

Signed-off-by: Marcel Coetzee <[email protected]>

* Remove superfluous helper method.

Signed-off-by: Marcel Coetzee <[email protected]>

* Lock File

Signed-off-by: Marcel Coetzee <[email protected]>

* Make api_key and embedding_model_provider_api_key optional

Signed-off-by: Marcel Coetzee <[email protected]>

* Clear environment for config test

Signed-off-by: Marcel Coetzee <[email protected]>

* Minor test config

Signed-off-by: Marcel Coetzee <[email protected]>

* test config

Signed-off-by: Marcel Coetzee <[email protected]>

* lancedb config

Signed-off-by: Marcel Coetzee <[email protected]>

* Config test

Signed-off-by: Marcel Coetzee <[email protected]>

* Config

Signed-off-by: Marcel Coetzee <[email protected]>

* config

Signed-off-by: Marcel Coetzee <[email protected]>

* Import lancedb_adapter function instead of module in adapter collection module

Signed-off-by: Marcel Coetzee <[email protected]>

* Clarify embedding facilities in LanceDB docs

Signed-off-by: Marcel Coetzee <[email protected]>

* update lancedb to support new naming setup (cleanup will follow)

* update lockfile

---------

Signed-off-by: Marcel Coetzee <[email protected]>
Co-authored-by: Dave <[email protected]>
* adds naming convention example

* improves naming convention docs

* simplifies naming convention classes and configurations, implements sql cs, adds tests

* bumps to version 0.5.1a0

* linter fixes

* format fixes
* removes deprected credentials argument from Pipeline

* fixes dependency in tests

* fixes explicit creds tests dependencies
* Added pg_replication docs

* Updated

* Update docs/website/docs/dlt-ecosystem/verified-sources/pg_replication.md

* Update docs/website/docs/dlt-ecosystem/verified-sources/pg_replication.md

---------

Co-authored-by: Alena Astrakhantseva <[email protected]>
* Extend orjson dependency allowed range with excluded versions

* bumps orjson in lock

---------

Co-authored-by: Marcin Rudolf <[email protected]>
* Run qdrant server in local tests

* Add qdrant to test destination configs

* Fix stringify UUID objects

* Install qdrant deps

* Fix qdrant image version

* Disable httpx logging in tests

* Add index and use order by for fetching state

* Try qdrant local support

* Fix qdrant load stored state

* Disable parallel load in qdrant local

* Test destination config for qdrant local and server

* Fixes

* qdrant example test

* Missing module

* Cleanup

* resolves configuration to get full capabilities in load

* uses embedded qdrant for zendesk example

---------

Co-authored-by: Marcin Rudolf <[email protected]>
* adds more info to pipeline drop and info commands

* extracts known env variables to separate module

* drops tables on staging

* tests create/drop datasets and tables

* simplifies drop command and helpers + tests

* adds no print linter module and a few other small fixes

* improves collision detection when normalizers change

* allows glob to work with memory filesystem

* replaces walk in filesystem destination with own glob

* standardizes drop_dataset beahvior for all destinations

* creates athena iceberg tables in random locations
* Refactor mock_api_server, add integration tests for header link, json link and offset paginator

* Fix assert_pagination

* Rename a param

* Fix off-by-one error in PageNumberPaginator; add pagination tests

* Rename the index_base param; add mock api tests

* Fixed a test condition

* Fix formatting

* Add default page size and total pages

* Implement tests for JSONResponseCursorPaginator; extend tests for mock_api_server

* Rename an arg in the doctring

* Rename the index_base parameter

* Update the docs
…1547)

* selects all tables from info schema if number of tables to check is more than a threshold

* adds tests
* allows to configure staging dataset name

* adds some missing docs on naming conventions

* adds missing test cases

* bumps dlt version to 0.5.1

* fixes linter and tests
Copy link

netlify bot commented Jul 8, 2024

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit 34e97cc
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/668bee308e4f7900080fc843
😎 Deploy Preview https://deploy-preview-1561--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@rudolfix rudolfix merged commit d1e5666 into master Jul 8, 2024
47 of 51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.