-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clickhouse #1097
Merged
Merged
Clickhouse #1097
Changes from 145 commits
Commits
Show all changes
165 commits
Select commit
Hold shift + click to select a range
2d72134
Merge branch 'devel' of github.com:dlt-hub/dlt into 1055-implement-cl…
Pipboyguy 1f28952
Add clickhouse driver dependency #1055
Pipboyguy dc5d4b0
Merge branch 'devel' into 1055-implement-clickhouse-destination
Pipboyguy d736dee
Preliminary wireframe #1055
Pipboyguy 0407ab8
Update preliminary Clickhouse configurations #1055
Pipboyguy d503c14
Format #1055
Pipboyguy 9b3ef87
Merge
Pipboyguy ed218e5
Finalize wireframing #1055
Pipboyguy 81091c7
Wireframe ClickhouseSqlClient #1055
Pipboyguy 7b7edff
Refactor Clickhouse SqlClient wireframing and update capabilities #1055
Pipboyguy 94ca2a9
Merge branch 'devel' into 1055-implement-clickhouse-destination
Pipboyguy 5f55bab
Update error messages and transaction capability in Clickhouse
Pipboyguy c9fb02e
Update Clickhouse configuration and factory
Pipboyguy f53ca52
Update
Pipboyguy 0df32f5
Merge branch 'devel' into 1055-implement-clickhouse-destination
Pipboyguy c1f106b
Update identifier escaping logic #1055
Pipboyguy 29ed8f6
Merge branch 'devel' into 1055-implement-clickhouse-destination
Pipboyguy 6fc750f
Refactor ClickhouseSqlClient for better error handling #1055
Pipboyguy 14d39e4
Merge branch 'devel' into 1055-implement-clickhouse-destination
Pipboyguy dbad9b1
Refine clickhouse destination and basic adapter #1055
Pipboyguy 16df851
Finish ClickhouseClient
Pipboyguy c134cad
Add escape_clickhouse_literal function for Clickhouse
Pipboyguy 2fa644c
Add "insert_values" to supported loader file formats
Pipboyguy 0af45b9
Add `wei` to "from_db_type"
Pipboyguy de66efa
Improve Clickhouse loader code and update comments #1055
Pipboyguy 7e0d508
Preliminary CH tests and utility module
Pipboyguy 57ceeee
Refactor Clickhouse utilities and update tests
Pipboyguy 6cdf086
Refactor URL conversion and staging for Clickhouse
Pipboyguy 6606f5e
Tests don't pass, but they are there #1055
Pipboyguy 383290d
Merge branch 'devel' into 1055-implement-clickhouse-destination
Pipboyguy 45bafa7
Fix poetry collision #1055
Pipboyguy 29b5a07
Swap engine/primary key clause ordering #1055
Pipboyguy c9b1ae4
Pass basic tests #1055
Pipboyguy 1e96a78
Add Clickhouse tests and improve its configuration handling
Pipboyguy 2087b37
Resove merge conflict
Pipboyguy ab50287
Format and Lint
Pipboyguy 5d768fa
Merge branch 'devel' into 1055-implement-clickhouse-destination
Pipboyguy 9d905ec
Merge branch 'devel' into 1055-implement-clickhouse-destination
Pipboyguy 86685fd
Improve DRYness #1055
Pipboyguy e702d8a
Remove old comment #1055
Pipboyguy 83b3812
Update pyproject.toml #1055
Pipboyguy d26183b
Fix secure connection settings #1055
Pipboyguy e596ae3
Minor config parsing amendments #1055
Pipboyguy e970a31
Fix ssl connection (correct port) #1055
Pipboyguy 69a672c
Reuse Athena destination pyformat converter #1055
Pipboyguy 0cbb657
Filesystem Auth issues #1055
Pipboyguy bccab86
Fix incorrect arguments in render_object_storage_table_function #1055
Pipboyguy c56f1a4
Pass all providers append tests #1055
Pipboyguy 74e5a6d
Add merge test #1055
Pipboyguy ff4e214
Resolved driver parameter substitution issues #1055
Pipboyguy 5a31ca4
Merge Disposition #1055
Pipboyguy 75a61eb
Fall back to append disposition for merge #1055
Pipboyguy a4ccd2b
Clickhouse CI #1055
Pipboyguy 200dce5
CI and merge #1055
Pipboyguy 7f730c1
Update active destinations to ClickHouse
Pipboyguy dc47c12
Expand Clickhouse dependencies in pyproject.toml
Pipboyguy 34646ad
Update lock file #1055
Pipboyguy 8bc3c21
Revert back to merge implementation #1055
Pipboyguy 99e82ff
Add default sql test #1055
Pipboyguy 6bded9a
Support jsonlines
Pipboyguy eca4d2d
Revert non-applicable changes #1055
Pipboyguy 092a524
Fix 'from_db_type' #1055
Pipboyguy 79d9b80
Remove unused tests #1055
Pipboyguy c5d8709
No staging test case #1055
Pipboyguy 594accc
Minor changes
Pipboyguy 5d8a228
Refactor Clickhouse loader
Pipboyguy d454d8f
WIP
Pipboyguy fdd052a
Remove from standard sql tests
Pipboyguy 939bd35
Remove unnecessary compression #1055
Pipboyguy b554693
Dataset prefix and dataset-table seperator #1055
Pipboyguy 8215f44
Remove DATASET_PREFIX from sql_client.py
Pipboyguy 881a0b9
Add clickhouse connect as local fallback #1055
Pipboyguy a4904a8
Set settings on local #1055
Pipboyguy 89bab97
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy b44bea3
Update lock
Pipboyguy d6309b3
Revert some files back to devel
Pipboyguy dfc274d
Remove redundant merge logic #1055
Pipboyguy 9d0e1ba
Spelling fix
Pipboyguy 9f41edd
Don't synthesise CH credentials __init__. #1055
Pipboyguy 584ebe7
Optimize import and type hinting in Clickhouse factory #1055
Pipboyguy 853c353
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy 1c69a26
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy 32d3f62
Revert back to temp table
Pipboyguy 6cb4ee8
Refactor Clickhouse to ClickHouse for consistency
Pipboyguy 5004b5a
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy a646118
Support compression codec for azure and local #1055
Pipboyguy 350cae3
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy 6fee65e
Merge
Pipboyguy 95caba7
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy 7c0ac80
Fix table-name separator config resolution #1055
Pipboyguy 411ff77
Format
Pipboyguy 7fe0a91
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy e30ce6d
Set compression parameter for local #1055
Pipboyguy 3c89b3b
Set compression method to 'auto' for s3 table function #1055
Pipboyguy df7fcf2
Typing
Pipboyguy 743cc05
Typing
Pipboyguy 76a6bcb
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy 5d327f3
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy 3f20381
Initial draft doc
Pipboyguy 463ca1d
auto compression for parquet, detects compression of local files
rudolfix e7e5925
fixes has_dataset, recognizes more exceptions
rudolfix 871aa4a
fixes some tests
rudolfix 8ed4919
aligns clickhouse config with dataclasses
rudolfix b13942b
Merge remote-tracking branch 'origin/1055-implement-clickhouse-destin…
Pipboyguy dce1967
Remove empty dataset default #1055
Pipboyguy 68ac747
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy d1c7cde
Update clickhouse configuration and docs sidebar
Pipboyguy 66fbc0e
Clickhouse docs #1055
Pipboyguy 1ab55a1
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy 5ff769c
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy 077455d
Don't use Jinja #1055
Pipboyguy 8d5bb3e
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy 767def4
Merge branch 'devel' into 1055-implement-clickhouse-destination
sh-rp 65bf250
udpate clickhouse workflow file
sh-rp e61f681
add missing secrets to clickhouse workflow
sh-rp 974d150
Merge remote-tracking branch 'origin/1055-implement-clickhouse-destin…
Pipboyguy 027d52a
Add test for clickhouse config settings #1055
Pipboyguy ecfe173
Set experimental session in DSN #1055
Pipboyguy 9cd4f7f
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy 6ca11c8
Update data mapping and test for ClickHouse
Pipboyguy 0cf5f63
Revert previous
Pipboyguy e5500ce
Fix table aliasing issue
Pipboyguy 4edeec8
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy 8830f3e
remove additional clickhouse destinations from test setup
sh-rp 91c257d
fix lockfile
sh-rp 9bda629
fix merging
sh-rp f9e2920
slightly clean up clickhouse load job
sh-rp 23800b8
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy ebe0289
fix merge job a bit more
sh-rp 6fd243e
Refactor key table clause generation in ClickHouse
Pipboyguy dcb6655
Merge remote-tracking branch 'origin/1055-implement-clickhouse-destin…
Pipboyguy 766ecd2
fixes a bunch of tests
sh-rp dc3cb76
simplify clickhouse load job a bit
sh-rp 9ae6979
add sentinel table for dataset existence check
sh-rp 0a26798
Merge branch 'devel' into 1055-implement-clickhouse-destination
sh-rp 56c2b9f
post merge lockfile update
sh-rp 1525e73
add support for scd2
sh-rp 15855d7
add correct high_ts for clickhouse
sh-rp 544a0a4
remove corelated query from scd2 implementation
sh-rp cb9f35b
fix merge sql for clickhouse
sh-rp 4929bd1
fix merge tests
sh-rp 6badbcc
some further fixes
sh-rp 7950642
fix athena tests
sh-rp c98b8b2
disable dbt for now
sh-rp 89f30d0
smaller changes
sh-rp 02dfb08
add clickhouse adapter tests, update small part of docs and correct i…
sh-rp b06bd9b
update scd2 sql based on jorrits suggestions
sh-rp 1208ccd
Merge branch 'devel' into 1055-implement-clickhouse-destination
sh-rp 0ca6c36
change merge change test to make it pass
sh-rp 290faa1
use text for json in clickhouse
sh-rp 9c8a8b2
remove some unrelated unneeded stuff
sh-rp 65c9cec
update docs a bit
sh-rp 04f357a
fix json to string tests and implementation
sh-rp d2231ef
move gcp access credentials into proper config
sh-rp 989d93c
Merge branch 'devel' into 1055-implement-clickhouse-destination
sh-rp 78e4c56
Add GCS Clickhouse staging docs #1055
Pipboyguy e77a160
Merge remote-tracking branch 'origin/1055-implement-clickhouse-destin…
Pipboyguy 1c8e4ef
Add `http_port` to docs #1055
Pipboyguy 8e04b81
Merge branch 'devel' into 1055-implement-clickhouse-destination
sh-rp a8887c4
fix import after merge
sh-rp 2c21e19
small changes to the docs
sh-rp 7627945
tolerate rounding errors when loading from jsonl
sh-rp 4d5db26
Merge branch 'devel' into 1055-implement-clickhouse-destination
sh-rp 13f4b1c
post devel merge fix
sh-rp 244ed77
remove unneeded stuff from scd2 merge
sh-rp File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
|
||
name: test | clickhouse | ||
|
||
on: | ||
pull_request: | ||
branches: | ||
- master | ||
- devel | ||
workflow_dispatch: | ||
schedule: | ||
- cron: '0 2 * * *' | ||
|
||
concurrency: | ||
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} | ||
cancel-in-progress: true | ||
|
||
env: | ||
RUNTIME__SENTRY_DSN: https://[email protected]/4504819859914752 | ||
RUNTIME__LOG_LEVEL: ERROR | ||
DLT_SECRETS_TOML: ${{ secrets.DLT_SECRETS_TOML }} | ||
|
||
ACTIVE_DESTINATIONS: "[\"clickhouse\"]" | ||
ALL_FILESYSTEM_DRIVERS: "[\"memory\"]" | ||
|
||
jobs: | ||
get_docs_changes: | ||
name: docs changes | ||
uses: ./.github/workflows/get_docs_changes.yml | ||
if: ${{ !github.event.pull_request.head.repo.fork || contains(github.event.pull_request.labels.*.name, 'ci from fork')}} | ||
|
||
run_loader: | ||
name: test | clickhouse tests | ||
needs: get_docs_changes | ||
if: needs.get_docs_changes.outputs.changes_outside_docs == 'true' | ||
defaults: | ||
run: | ||
shell: bash | ||
runs-on: "ubuntu-latest" | ||
|
||
steps: | ||
|
||
- name: Check out | ||
uses: actions/checkout@master | ||
|
||
- name: Setup Python | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: "3.10.x" | ||
|
||
- name: Install Poetry | ||
uses: snok/[email protected] | ||
with: | ||
virtualenvs-create: true | ||
virtualenvs-in-project: true | ||
installer-parallel: true | ||
|
||
- name: Load cached venv | ||
id: cached-poetry-dependencies | ||
uses: actions/cache@v3 | ||
with: | ||
path: .venv | ||
key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}-gcp | ||
|
||
- name: Install dependencies | ||
run: poetry install --no-interaction -E clickhouse --with providers -E parquet --with sentry-sdk --with pipeline | ||
|
||
- name: create secrets.toml | ||
run: pwd && echo "$DLT_SECRETS_TOML" > tests/.dlt/secrets.toml | ||
|
||
- run: | | ||
poetry run pytest tests/load -m "essential" | ||
name: Run essential tests Linux | ||
if: ${{ ! (contains(github.event.pull_request.labels.*.name, 'ci full') || github.event_name == 'schedule')}} | ||
|
||
- run: | | ||
poetry run pytest tests/load | ||
name: Run all tests Linux | ||
if: ${{ contains(github.event.pull_request.labels.*.name, 'ci full') || github.event_name == 'schedule'}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
import sys | ||
|
||
from dlt.common.pendulum import pendulum | ||
from dlt.common.arithmetics import DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE | ||
from dlt.common.data_writers.escape import ( | ||
escape_clickhouse_identifier, | ||
escape_clickhouse_literal, | ||
format_clickhouse_datetime_literal, | ||
) | ||
from dlt.common.destination import DestinationCapabilitiesContext | ||
|
||
|
||
def capabilities() -> DestinationCapabilitiesContext: | ||
caps = DestinationCapabilitiesContext() | ||
caps.preferred_loader_file_format = "jsonl" | ||
caps.supported_loader_file_formats = ["parquet", "jsonl"] | ||
caps.preferred_staging_file_format = "jsonl" | ||
caps.supported_staging_file_formats = ["parquet", "jsonl"] | ||
|
||
caps.format_datetime_literal = format_clickhouse_datetime_literal | ||
caps.escape_identifier = escape_clickhouse_identifier | ||
caps.escape_literal = escape_clickhouse_literal | ||
|
||
# https://stackoverflow.com/questions/68358686/what-is-the-maximum-length-of-a-column-in-clickhouse-can-it-be-modified | ||
caps.max_identifier_length = 255 | ||
caps.max_column_identifier_length = 255 | ||
caps.scd2_high_timestamp = pendulum.datetime(2299, 12, 31) # this is the max datetime... | ||
|
||
# ClickHouse has no max `String` type length. | ||
caps.max_text_data_type_length = sys.maxsize | ||
|
||
caps.schema_supports_numeric_precision = True | ||
# Use 'Decimal128' with these defaults. | ||
# https://clickhouse.com/docs/en/sql-reference/data-types/decimal | ||
caps.decimal_precision = (DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE) | ||
# Use 'Decimal256' with these defaults. | ||
caps.wei_precision = (76, 0) | ||
caps.timestamp_precision = 6 | ||
|
||
# https://clickhouse.com/docs/en/operations/settings/settings#max_query_size | ||
caps.is_max_query_length_in_bytes = True | ||
caps.max_query_length = 262144 | ||
|
||
# ClickHouse has limited support for transactional semantics, especially for `ReplicatedMergeTree`, | ||
# the default ClickHouse Cloud engine. It does, however, provide atomicity for individual DDL operations like `ALTER TABLE`. | ||
# https://clickhouse-driver.readthedocs.io/en/latest/dbapi.html#clickhouse_driver.dbapi.connection.Connection.commit | ||
# https://clickhouse.com/docs/en/guides/developer/transactional#transactions-commit-and-rollback | ||
caps.supports_transactions = False | ||
caps.supports_ddl_transactions = False | ||
|
||
caps.supports_truncate_command = True | ||
|
||
return caps |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clickhouse max timestamp is 2299, so I needed to make this configurable for each destinations, I'm not satisfied that the capabilities are the right place, if you have a better idea for it, lmk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need the same timestamp in each destination. my question to @jorritsandbrink why not to allow nulls? so the active record is marked as having no end date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is how I did it in my initial implementation, that would make it a bit easier