Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse #1097

Merged
merged 165 commits into from
Apr 26, 2024
Merged
Show file tree
Hide file tree
Changes from 145 commits
Commits
Show all changes
165 commits
Select commit Hold shift + click to select a range
2d72134
Merge branch 'devel' of github.com:dlt-hub/dlt into 1055-implement-cl…
Pipboyguy Mar 6, 2024
1f28952
Add clickhouse driver dependency #1055
Pipboyguy Mar 8, 2024
dc5d4b0
Merge branch 'devel' into 1055-implement-clickhouse-destination
Pipboyguy Mar 9, 2024
d736dee
Preliminary wireframe #1055
Pipboyguy Mar 9, 2024
0407ab8
Update preliminary Clickhouse configurations #1055
Pipboyguy Mar 11, 2024
d503c14
Format #1055
Pipboyguy Mar 11, 2024
9b3ef87
Merge
Pipboyguy Mar 12, 2024
ed218e5
Finalize wireframing #1055
Pipboyguy Mar 12, 2024
81091c7
Wireframe ClickhouseSqlClient #1055
Pipboyguy Mar 12, 2024
7b7edff
Refactor Clickhouse SqlClient wireframing and update capabilities #1055
Pipboyguy Mar 12, 2024
94ca2a9
Merge branch 'devel' into 1055-implement-clickhouse-destination
Pipboyguy Mar 13, 2024
5f55bab
Update error messages and transaction capability in Clickhouse
Pipboyguy Mar 13, 2024
c9fb02e
Update Clickhouse configuration and factory
Pipboyguy Mar 13, 2024
f53ca52
Update
Pipboyguy Mar 14, 2024
0df32f5
Merge branch 'devel' into 1055-implement-clickhouse-destination
Pipboyguy Mar 14, 2024
c1f106b
Update identifier escaping logic #1055
Pipboyguy Mar 14, 2024
29ed8f6
Merge branch 'devel' into 1055-implement-clickhouse-destination
Pipboyguy Mar 14, 2024
6fc750f
Refactor ClickhouseSqlClient for better error handling #1055
Pipboyguy Mar 15, 2024
14d39e4
Merge branch 'devel' into 1055-implement-clickhouse-destination
Pipboyguy Mar 16, 2024
dbad9b1
Refine clickhouse destination and basic adapter #1055
Pipboyguy Mar 16, 2024
16df851
Finish ClickhouseClient
Pipboyguy Mar 16, 2024
c134cad
Add escape_clickhouse_literal function for Clickhouse
Pipboyguy Mar 14, 2024
2fa644c
Add "insert_values" to supported loader file formats
Pipboyguy Mar 17, 2024
0af45b9
Add `wei` to "from_db_type"
Pipboyguy Mar 17, 2024
de66efa
Improve Clickhouse loader code and update comments #1055
Pipboyguy Mar 17, 2024
7e0d508
Preliminary CH tests and utility module
Pipboyguy Mar 18, 2024
57ceeee
Refactor Clickhouse utilities and update tests
Pipboyguy Mar 18, 2024
6cdf086
Refactor URL conversion and staging for Clickhouse
Pipboyguy Mar 18, 2024
6606f5e
Tests don't pass, but they are there #1055
Pipboyguy Mar 18, 2024
383290d
Merge branch 'devel' into 1055-implement-clickhouse-destination
Pipboyguy Mar 19, 2024
45bafa7
Fix poetry collision #1055
Pipboyguy Mar 19, 2024
29b5a07
Swap engine/primary key clause ordering #1055
Pipboyguy Mar 19, 2024
c9b1ae4
Pass basic tests #1055
Pipboyguy Mar 19, 2024
1e96a78
Add Clickhouse tests and improve its configuration handling
Pipboyguy Mar 19, 2024
2087b37
Resove merge conflict
Pipboyguy Mar 20, 2024
ab50287
Format and Lint
Pipboyguy Mar 20, 2024
5d768fa
Merge branch 'devel' into 1055-implement-clickhouse-destination
Pipboyguy Mar 22, 2024
9d905ec
Merge branch 'devel' into 1055-implement-clickhouse-destination
Pipboyguy Mar 23, 2024
86685fd
Improve DRYness #1055
Pipboyguy Mar 23, 2024
e702d8a
Remove old comment #1055
Pipboyguy Mar 23, 2024
83b3812
Update pyproject.toml #1055
Pipboyguy Mar 23, 2024
d26183b
Fix secure connection settings #1055
Pipboyguy Mar 25, 2024
e596ae3
Minor config parsing amendments #1055
Pipboyguy Mar 25, 2024
e970a31
Fix ssl connection (correct port) #1055
Pipboyguy Mar 25, 2024
69a672c
Reuse Athena destination pyformat converter #1055
Pipboyguy Mar 26, 2024
0cbb657
Filesystem Auth issues #1055
Pipboyguy Mar 26, 2024
bccab86
Fix incorrect arguments in render_object_storage_table_function #1055
Pipboyguy Mar 27, 2024
c56f1a4
Pass all providers append tests #1055
Pipboyguy Mar 30, 2024
74e5a6d
Add merge test #1055
Pipboyguy Mar 30, 2024
ff4e214
Resolved driver parameter substitution issues #1055
Pipboyguy Apr 1, 2024
5a31ca4
Merge Disposition #1055
Pipboyguy Apr 2, 2024
75a61eb
Fall back to append disposition for merge #1055
Pipboyguy Apr 2, 2024
a4ccd2b
Clickhouse CI #1055
Pipboyguy Apr 2, 2024
200dce5
CI and merge #1055
Pipboyguy Apr 2, 2024
7f730c1
Update active destinations to ClickHouse
Pipboyguy Apr 2, 2024
dc47c12
Expand Clickhouse dependencies in pyproject.toml
Pipboyguy Apr 2, 2024
34646ad
Update lock file #1055
Pipboyguy Apr 2, 2024
8bc3c21
Revert back to merge implementation #1055
Pipboyguy Apr 3, 2024
99e82ff
Add default sql test #1055
Pipboyguy Apr 3, 2024
6bded9a
Support jsonlines
Pipboyguy Apr 3, 2024
eca4d2d
Revert non-applicable changes #1055
Pipboyguy Apr 3, 2024
092a524
Fix 'from_db_type' #1055
Pipboyguy Apr 4, 2024
79d9b80
Remove unused tests #1055
Pipboyguy Apr 4, 2024
c5d8709
No staging test case #1055
Pipboyguy Apr 4, 2024
594accc
Minor changes
Pipboyguy Apr 4, 2024
5d8a228
Refactor Clickhouse loader
Pipboyguy Apr 4, 2024
d454d8f
WIP
Pipboyguy Apr 4, 2024
fdd052a
Remove from standard sql tests
Pipboyguy Apr 5, 2024
939bd35
Remove unnecessary compression #1055
Pipboyguy Apr 5, 2024
b554693
Dataset prefix and dataset-table seperator #1055
Pipboyguy Apr 5, 2024
8215f44
Remove DATASET_PREFIX from sql_client.py
Pipboyguy Apr 5, 2024
881a0b9
Add clickhouse connect as local fallback #1055
Pipboyguy Apr 5, 2024
a4904a8
Set settings on local #1055
Pipboyguy Apr 5, 2024
89bab97
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy Apr 5, 2024
b44bea3
Update lock
Pipboyguy Apr 5, 2024
d6309b3
Revert some files back to devel
Pipboyguy Apr 6, 2024
dfc274d
Remove redundant merge logic #1055
Pipboyguy Apr 6, 2024
9d0e1ba
Spelling fix
Pipboyguy Apr 6, 2024
9f41edd
Don't synthesise CH credentials __init__. #1055
Pipboyguy Apr 6, 2024
584ebe7
Optimize import and type hinting in Clickhouse factory #1055
Pipboyguy Apr 6, 2024
853c353
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy Apr 8, 2024
1c69a26
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy Apr 8, 2024
32d3f62
Revert back to temp table
Pipboyguy Apr 8, 2024
6cb4ee8
Refactor Clickhouse to ClickHouse for consistency
Pipboyguy Apr 8, 2024
5004b5a
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy Apr 8, 2024
a646118
Support compression codec for azure and local #1055
Pipboyguy Apr 8, 2024
350cae3
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy Apr 8, 2024
6fee65e
Merge
Pipboyguy Apr 8, 2024
95caba7
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy Apr 8, 2024
7c0ac80
Fix table-name separator config resolution #1055
Pipboyguy Apr 9, 2024
411ff77
Format
Pipboyguy Apr 9, 2024
7fe0a91
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy Apr 9, 2024
e30ce6d
Set compression parameter for local #1055
Pipboyguy Apr 9, 2024
3c89b3b
Set compression method to 'auto' for s3 table function #1055
Pipboyguy Apr 9, 2024
df7fcf2
Typing
Pipboyguy Apr 9, 2024
743cc05
Typing
Pipboyguy Apr 9, 2024
76a6bcb
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy Apr 10, 2024
5d327f3
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy Apr 10, 2024
3f20381
Initial draft doc
Pipboyguy Apr 10, 2024
463ca1d
auto compression for parquet, detects compression of local files
rudolfix Apr 10, 2024
e7e5925
fixes has_dataset, recognizes more exceptions
rudolfix Apr 10, 2024
871aa4a
fixes some tests
rudolfix Apr 10, 2024
8ed4919
aligns clickhouse config with dataclasses
rudolfix Apr 10, 2024
b13942b
Merge remote-tracking branch 'origin/1055-implement-clickhouse-destin…
Pipboyguy Apr 11, 2024
dce1967
Remove empty dataset default #1055
Pipboyguy Apr 11, 2024
68ac747
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy Apr 12, 2024
d1c7cde
Update clickhouse configuration and docs sidebar
Pipboyguy Apr 12, 2024
66fbc0e
Clickhouse docs #1055
Pipboyguy Apr 13, 2024
1ab55a1
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy Apr 14, 2024
5ff769c
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy Apr 15, 2024
077455d
Don't use Jinja #1055
Pipboyguy Apr 15, 2024
8d5bb3e
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy Apr 16, 2024
767def4
Merge branch 'devel' into 1055-implement-clickhouse-destination
sh-rp Apr 16, 2024
65bf250
udpate clickhouse workflow file
sh-rp Apr 16, 2024
e61f681
add missing secrets to clickhouse workflow
sh-rp Apr 16, 2024
974d150
Merge remote-tracking branch 'origin/1055-implement-clickhouse-destin…
Pipboyguy Apr 16, 2024
027d52a
Add test for clickhouse config settings #1055
Pipboyguy Apr 16, 2024
ecfe173
Set experimental session in DSN #1055
Pipboyguy Apr 16, 2024
9cd4f7f
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy Apr 17, 2024
6ca11c8
Update data mapping and test for ClickHouse
Pipboyguy Apr 17, 2024
0cf5f63
Revert previous
Pipboyguy Apr 17, 2024
e5500ce
Fix table aliasing issue
Pipboyguy Apr 17, 2024
4edeec8
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy Apr 17, 2024
8830f3e
remove additional clickhouse destinations from test setup
sh-rp Apr 18, 2024
91c257d
fix lockfile
sh-rp Apr 18, 2024
9bda629
fix merging
sh-rp Apr 18, 2024
f9e2920
slightly clean up clickhouse load job
sh-rp Apr 18, 2024
23800b8
Merge branch 'refs/heads/devel' into 1055-implement-clickhouse-destin…
Pipboyguy Apr 18, 2024
ebe0289
fix merge job a bit more
sh-rp Apr 18, 2024
6fd243e
Refactor key table clause generation in ClickHouse
Pipboyguy Apr 18, 2024
dcb6655
Merge remote-tracking branch 'origin/1055-implement-clickhouse-destin…
Pipboyguy Apr 18, 2024
766ecd2
fixes a bunch of tests
sh-rp Apr 19, 2024
dc3cb76
simplify clickhouse load job a bit
sh-rp Apr 19, 2024
9ae6979
add sentinel table for dataset existence check
sh-rp Apr 19, 2024
0a26798
Merge branch 'devel' into 1055-implement-clickhouse-destination
sh-rp Apr 19, 2024
56c2b9f
post merge lockfile update
sh-rp Apr 19, 2024
1525e73
add support for scd2
sh-rp Apr 19, 2024
15855d7
add correct high_ts for clickhouse
sh-rp Apr 19, 2024
544a0a4
remove corelated query from scd2 implementation
sh-rp Apr 22, 2024
cb9f35b
fix merge sql for clickhouse
sh-rp Apr 22, 2024
4929bd1
fix merge tests
sh-rp Apr 22, 2024
6badbcc
some further fixes
sh-rp Apr 22, 2024
7950642
fix athena tests
sh-rp Apr 22, 2024
c98b8b2
disable dbt for now
sh-rp Apr 22, 2024
89f30d0
smaller changes
sh-rp Apr 23, 2024
02dfb08
add clickhouse adapter tests, update small part of docs and correct i…
sh-rp Apr 23, 2024
b06bd9b
update scd2 sql based on jorrits suggestions
sh-rp Apr 23, 2024
1208ccd
Merge branch 'devel' into 1055-implement-clickhouse-destination
sh-rp Apr 23, 2024
0ca6c36
change merge change test to make it pass
sh-rp Apr 23, 2024
290faa1
use text for json in clickhouse
sh-rp Apr 23, 2024
9c8a8b2
remove some unrelated unneeded stuff
sh-rp Apr 23, 2024
65c9cec
update docs a bit
sh-rp Apr 23, 2024
04f357a
fix json to string tests and implementation
sh-rp Apr 23, 2024
d2231ef
move gcp access credentials into proper config
sh-rp Apr 24, 2024
989d93c
Merge branch 'devel' into 1055-implement-clickhouse-destination
sh-rp Apr 24, 2024
78e4c56
Add GCS Clickhouse staging docs #1055
Pipboyguy Apr 24, 2024
e77a160
Merge remote-tracking branch 'origin/1055-implement-clickhouse-destin…
Pipboyguy Apr 24, 2024
1c8e4ef
Add `http_port` to docs #1055
Pipboyguy Apr 24, 2024
8e04b81
Merge branch 'devel' into 1055-implement-clickhouse-destination
sh-rp Apr 25, 2024
a8887c4
fix import after merge
sh-rp Apr 25, 2024
2c21e19
small changes to the docs
sh-rp Apr 25, 2024
7627945
tolerate rounding errors when loading from jsonl
sh-rp Apr 25, 2024
4d5db26
Merge branch 'devel' into 1055-implement-clickhouse-destination
sh-rp Apr 25, 2024
13f4b1c
post devel merge fix
sh-rp Apr 25, 2024
244ed77
remove unneeded stuff from scd2 merge
sh-rp Apr 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions .github/workflows/test_destination_clickhouse.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@

name: test | clickhouse

on:
pull_request:
branches:
- master
- devel
workflow_dispatch:
schedule:
- cron: '0 2 * * *'

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

env:
RUNTIME__SENTRY_DSN: https://[email protected]/4504819859914752
RUNTIME__LOG_LEVEL: ERROR
DLT_SECRETS_TOML: ${{ secrets.DLT_SECRETS_TOML }}

ACTIVE_DESTINATIONS: "[\"clickhouse\"]"
ALL_FILESYSTEM_DRIVERS: "[\"memory\"]"

jobs:
get_docs_changes:
name: docs changes
uses: ./.github/workflows/get_docs_changes.yml
if: ${{ !github.event.pull_request.head.repo.fork || contains(github.event.pull_request.labels.*.name, 'ci from fork')}}

run_loader:
name: test | clickhouse tests
needs: get_docs_changes
if: needs.get_docs_changes.outputs.changes_outside_docs == 'true'
defaults:
run:
shell: bash
runs-on: "ubuntu-latest"

steps:

- name: Check out
uses: actions/checkout@master

- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: "3.10.x"

- name: Install Poetry
uses: snok/[email protected]
with:
virtualenvs-create: true
virtualenvs-in-project: true
installer-parallel: true

- name: Load cached venv
id: cached-poetry-dependencies
uses: actions/cache@v3
with:
path: .venv
key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}-gcp

- name: Install dependencies
run: poetry install --no-interaction -E clickhouse --with providers -E parquet --with sentry-sdk --with pipeline

- name: create secrets.toml
run: pwd && echo "$DLT_SECRETS_TOML" > tests/.dlt/secrets.toml

- run: |
poetry run pytest tests/load -m "essential"
name: Run essential tests Linux
if: ${{ ! (contains(github.event.pull_request.labels.*.name, 'ci full') || github.event_name == 'schedule')}}

- run: |
poetry run pytest tests/load
name: Run all tests Linux
if: ${{ contains(github.event.pull_request.labels.*.name, 'ci full') || github.event_name == 'schedule'}}
2 changes: 1 addition & 1 deletion dlt/common/configuration/specs/base_configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ def default_factory(att_value=att_value): # type: ignore[no-untyped-def]
synth_init = init and ((not base_params or base_params.init) and has_default_init)
if synth_init != init and has_default_init:
warnings.warn(
f"__init__ method will not be generated on {cls.__name__} because bas class didn't"
f"__init__ method will not be generated on {cls.__name__} because base class didn't"
" synthesize __init__. Please correct `init` flag in confispec decorator. You are"
" probably receiving incorrect __init__ signature for type checking"
)
Expand Down
51 changes: 48 additions & 3 deletions dlt/common/data_writers/escape.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,10 +150,47 @@ def escape_databricks_literal(v: Any) -> Any:
return _escape_extended(json.dumps(v), prefix="'", escape_dict=DATABRICKS_ESCAPE_DICT)
if isinstance(v, bytes):
return f"X'{v.hex()}'"
if v is None:
return "NULL"
return "NULL" if v is None else str(v)

return str(v)

# https://github.com/ClickHouse/ClickHouse/blob/master/docs/en/sql-reference/syntax.md#string
CLICKHOUSE_ESCAPE_DICT = {
"'": "''",
"\\": "\\\\",
"\n": "\\n",
"\t": "\\t",
"\b": "\\b",
"\f": "\\f",
"\r": "\\r",
"\0": "\\0",
"\a": "\\a",
"\v": "\\v",
}

CLICKHOUSE_ESCAPE_RE = _make_sql_escape_re(CLICKHOUSE_ESCAPE_DICT)


def escape_clickhouse_literal(v: Any) -> Any:
if isinstance(v, str):
return _escape_extended(
v, prefix="'", escape_dict=CLICKHOUSE_ESCAPE_DICT, escape_re=CLICKHOUSE_ESCAPE_RE
)
if isinstance(v, (datetime, date, time)):
return f"'{v.isoformat()}'"
if isinstance(v, (list, dict)):
return _escape_extended(
json.dumps(v),
prefix="'",
escape_dict=CLICKHOUSE_ESCAPE_DICT,
escape_re=CLICKHOUSE_ESCAPE_RE,
)
if isinstance(v, bytes):
return f"'{v.hex()}'"
return "NULL" if v is None else str(v)


def escape_clickhouse_identifier(v: str) -> str:
return "`" + v.replace("`", "``").replace("\\", "\\\\") + "`"


def format_datetime_literal(v: pendulum.DateTime, precision: int = 6, no_tz: bool = False) -> str:
Expand All @@ -176,3 +213,11 @@ def format_bigquery_datetime_literal(
"""Returns BigQuery-adjusted datetime literal by prefixing required `TIMESTAMP` indicator."""
# https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical#timestamp_literals
return "TIMESTAMP " + format_datetime_literal(v, precision, no_tz)


def format_clickhouse_datetime_literal(
v: pendulum.DateTime, precision: int = 6, no_tz: bool = False
) -> str:
"""Returns clickhouse compatibel function"""
datetime = format_datetime_literal(v, precision, True)
return f"toDateTime64({datetime}, {precision}, '{v.tzinfo}')"
4 changes: 4 additions & 0 deletions dlt/common/destination/capabilities.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
# sql - any sql statement
TLoaderFileFormat = Literal["jsonl", "typed-jsonl", "insert_values", "parquet", "csv"]
ALL_SUPPORTED_FILE_FORMATS: Set[TLoaderFileFormat] = set(get_args(TLoaderFileFormat))
HIGH_TS = pendulum.datetime(9999, 12, 31)


@configspec
Expand Down Expand Up @@ -53,6 +54,9 @@ class DestinationCapabilitiesContext(ContainerInjectableContext):
insert_values_writer_type: str = "default"
supports_multiple_statements: bool = True
supports_clone_table: bool = False
scd2_high_timestamp: pendulum.DateTime = HIGH_TS
Copy link
Collaborator

@sh-rp sh-rp Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clickhouse max timestamp is 2299, so I needed to make this configurable for each destinations, I'm not satisfied that the capabilities are the right place, if you have a better idea for it, lmk

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need the same timestamp in each destination. my question to @jorritsandbrink why not to allow nulls? so the active record is marked as having no end date

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is how I did it in my initial implementation, that would make it a bit easier

"""High timestamp used to indicate active records in `scd2` merge strategy."""

"""Destination supports CREATE TABLE ... CLONE ... statements"""
max_table_nesting: Optional[int] = None # destination can overwrite max table nesting

Expand Down
1 change: 0 additions & 1 deletion dlt/common/time.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,6 @@ def ensure_pendulum_time(value: Union[str, datetime.time]) -> pendulum.Time:
Returns:
A pendulum.Time object
"""

if isinstance(value, datetime.time):
if isinstance(value, pendulum.Time):
return value
Expand Down
2 changes: 2 additions & 0 deletions dlt/destinations/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from dlt.destinations.impl.synapse.factory import synapse
from dlt.destinations.impl.databricks.factory import databricks
from dlt.destinations.impl.dremio.factory import dremio
from dlt.destinations.impl.clickhouse.factory import clickhouse


__all__ = [
Expand All @@ -32,5 +33,6 @@
"synapse",
"databricks",
"dremio",
"clickhouse",
"destination",
]
53 changes: 53 additions & 0 deletions dlt/destinations/impl/clickhouse/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
import sys

from dlt.common.pendulum import pendulum
from dlt.common.arithmetics import DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE
from dlt.common.data_writers.escape import (
escape_clickhouse_identifier,
escape_clickhouse_literal,
format_clickhouse_datetime_literal,
)
from dlt.common.destination import DestinationCapabilitiesContext


def capabilities() -> DestinationCapabilitiesContext:
caps = DestinationCapabilitiesContext()
caps.preferred_loader_file_format = "jsonl"
caps.supported_loader_file_formats = ["parquet", "jsonl"]
caps.preferred_staging_file_format = "jsonl"
caps.supported_staging_file_formats = ["parquet", "jsonl"]

caps.format_datetime_literal = format_clickhouse_datetime_literal
caps.escape_identifier = escape_clickhouse_identifier
caps.escape_literal = escape_clickhouse_literal

# https://stackoverflow.com/questions/68358686/what-is-the-maximum-length-of-a-column-in-clickhouse-can-it-be-modified
caps.max_identifier_length = 255
caps.max_column_identifier_length = 255
caps.scd2_high_timestamp = pendulum.datetime(2299, 12, 31) # this is the max datetime...

# ClickHouse has no max `String` type length.
caps.max_text_data_type_length = sys.maxsize

caps.schema_supports_numeric_precision = True
# Use 'Decimal128' with these defaults.
# https://clickhouse.com/docs/en/sql-reference/data-types/decimal
caps.decimal_precision = (DEFAULT_NUMERIC_PRECISION, DEFAULT_NUMERIC_SCALE)
# Use 'Decimal256' with these defaults.
caps.wei_precision = (76, 0)
caps.timestamp_precision = 6

# https://clickhouse.com/docs/en/operations/settings/settings#max_query_size
caps.is_max_query_length_in_bytes = True
caps.max_query_length = 262144

# ClickHouse has limited support for transactional semantics, especially for `ReplicatedMergeTree`,
# the default ClickHouse Cloud engine. It does, however, provide atomicity for individual DDL operations like `ALTER TABLE`.
# https://clickhouse-driver.readthedocs.io/en/latest/dbapi.html#clickhouse_driver.dbapi.connection.Connection.commit
# https://clickhouse.com/docs/en/guides/developer/transactional#transactions-commit-and-rollback
caps.supports_transactions = False
caps.supports_ddl_transactions = False

caps.supports_truncate_command = True

return caps
Loading
Loading