-
Notifications
You must be signed in to change notification settings - Fork 187
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
data pond: expose readable datasets as dataframes and arrow tables (#…
…1507) * add simple ibis helper * start working on dataframe reading interface * a bit more work * first simple implementation * small change * more work on dataset * some work on filesystem destination * add support for parquet files and compression on jsonl files in filesystem dataframe implementation * fix test after devel merge * add nice composable pipeline example * small updates to demo * enable tests for all bucket providers remove resource based dataset accessor * fix tests * create views in duckdb filesystem accessor * move to relations based interface * add generic duckdb interface to filesystem * move code for accessing frames and tables to the cursor and use duckdb dbapi cursor in filesystem * add native db api cursor fetching to exposed dataset * some small changes * switch dataaccess pandas to pyarrow * add native bigquery support for df and arrow tables * change iter functions to always expect chunk size (None will default to full frame/table) * add native implementation for databricks * add dremio native implementation for full frames and tables * fix filesystem test make filesystem duckdb instance use glob pattern * add test for evolving filesystem * fix empty dataframe retrieval * remove old df test * clean up interfaces a bit (more to come?) remove pipeline dependency from dataset * move dataset creation into destination client and clean up interfaces / reference a bit more * renames some interfaces and adds brief docstrings * add filesystem cached duckdb and remove the need to declare needed views for filesystem * fix tests for snowflake * make data set a function * fix db-types depdency for bigquery * create duckdb based sql client for filesystem * fix example pipeline * enable filesystem sql client to work on streamlit * add comments * rename sql to query remove unneeded code * fix tests that rely on sql client * post merge cleanups * move imports around a bit * exclude abfss buckets from test * add support for arrow schema creation from known dlt schema * re-use sqldatabase code for cursors * fix bug * add default columns where needed * add sql glot to filesystem deps * store filesystem tables in correct dataset * move cursor columns location * fix snowflake and mssql disable tests with sftp * clean up compose files a bit * fix sqlalchemy * add mysql docker compose file * fix linting * prepare hint checking * disable part of state test * enable hint check * add column type support for filesystem json * rename dataset implementation to DBAPI remove dataset specific code from destination client * wrap functions in dbapi readable dataset * remove example pipeline * rename test_decimal_name * make column code a bit clearer and fix mssql again * rename df methods to pandas * fix bug in default columns * fix hints test and columns bug removes some uneeded code * catch mysql error if no rows returned * add exceptions for not implemented bucket and filetypes * fix docs * add config section for getting pipeline clients * set default dataset in filesystem sqlclient * add config section for sync_destination * rename readablerelation methods * use more functions of the duckdb sql client in filesystem version * update dependencies * use active pipeline capabilities if available for arrow table * update types * rename dataset accessor function * add test for accessing tables with unquqlified tablename * fix sql client * add duckdb native support for azure, s3 and gcs (via s3) * some typing * add dataframes tests back in * add join table and update view tests for filesystem * start adding tests for creating views on remote duckdb * fix snippets * fix some dependencies and mssql/synapse tests * fix bigquery dependencies and abfss tests * add tests for adding view to external dbs and persistent secrets * add support for delta tables * add duckdb to read interface tests * fix delta tests * make default secret name derived from bucket url * try fix azure tests again * fix df access tests * PR fixes * correct internal table access * allow datasets without schema * skips parametrized queries, skips tables from non-dataset schemas * move filesystem specific sql_client tests to correct location and test a few more things * fix sql client tests * make secret name when dropping optional * fix gs test * remove moved filesystem tests from test_read_interfaces * fix sql client tests again... :) * clear duckdb secrets * disable secrets deleting for delta tests --------- Co-authored-by: Marcin Rudolf <[email protected]>
- Loading branch information
Showing
45 changed files
with
1,734 additions
and
298 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -94,6 +94,3 @@ jobs: | |
# always run full suite, also on branches | ||
- run: poetry run pytest tests/load -x --ignore tests/load/sources | ||
name: Run tests Linux | ||
env: | ||
DESTINATION__SQLALCHEMY_MYSQL__CREDENTIALS: mysql://root:[email protected]:3306/dlt_data # Use root cause we need to create databases | ||
DESTINATION__SQLALCHEMY_SQLITE__CREDENTIALS: sqlite:///_storage/dl_data.sqlite |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.