Skip to content

Commit

Permalink
update docs a bit
Browse files Browse the repository at this point in the history
  • Loading branch information
sh-rp committed Apr 23, 2024
1 parent 9c8a8b2 commit 65c9cec
Showing 1 changed file with 17 additions and 6 deletions.
23 changes: 17 additions & 6 deletions docs/website/docs/dlt-ecosystem/destinations/clickhouse.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,6 @@ keywords: [ clickhouse, destination, data warehouse ]
pip install dlt[clickhouse]
```

## Dev Todos for docs
* Clickhouse uses string for time
* bytes are converted to base64 strings when using jsonl and regular strings when using parquet
* JSON / complex fields are experimental currently, they are not supported when loading from parquet and nested structures will be changed when loading from jsonl

## Setup Guide

### 1. Initialize the dlt project
Expand Down Expand Up @@ -93,11 +88,27 @@ Data is loaded into ClickHouse using the most efficient method depending on the
- For files in remote storage like S3, Google Cloud Storage, or Azure Blob Storage, ClickHouse table functions like `s3`, `gcs` and `azureBlobStorage` are used to read the files and insert the data
into tables.

## Datasets

`Clickhouse` does not support multiple datasets in one database, dlt relies on datasets to exist for multiple reasons.
To make `clickhouse` work with `dlt`, tables generated by `dlt` in your `clickhouse` database will have their name prefixed with the dataset name separated by
the configurable `dataset_table_separator`. Additionally a special sentinel table that does not contain any data will also be created, so dlt knows which virtual datasets already exist in a clickhouse
destination.

## Supported file formats

- [jsonl](../file-formats/jsonl.md) is the preferred format for both direct loading and staging.
- [parquet](../file-formats/parquet.md) is supported for both direct loading and staging.

The `clickhouse` destination has a few specific deviations from the default sql destinations:

1. `Clickhouse` has an experimental `object` datatype, but we have found it to be a bit unpredictable, so the dlt clickhouse destination will load the complex dataype to a `text` column. If you need
this feature, please get in touch in our slack community and we will consider adding it.
2. `Clickhouse` does not support the `time` datatype. Time will be loaded to a `text` column.
3. `Clickhouse` does not support the `binary` datatype. Binary will be loaded to a `text` column. When loading from `jsonl`, this will be a base64 string, when loading from parquet this will be
the `binary` object converted to `text`.
4. `Clickhouse` accepts adding columns to a populated table that are not null.

## Supported column hints

ClickHouse supports the following [column hints](https://dlthub.com/docs/general-usage/schema#tables-and-columns):
Expand Down Expand Up @@ -149,7 +160,7 @@ pipeline = dlt.pipeline(

### dbt support

Integration with [dbt](../transformations/dbt/dbt.md) is supported.
Integration with [dbt](../transformations/dbt/dbt.md) is generally supported via dbt-clickhouse, but not tested by us at this time.

### Syncing of `dlt` state

Expand Down

0 comments on commit 65c9cec

Please sign in to comment.