Skip to content

Commit

Permalink
updates weaviate doc
Browse files Browse the repository at this point in the history
  • Loading branch information
rudolfix committed Sep 12, 2023
1 parent 851a6e5 commit d2d7927
Show file tree
Hide file tree
Showing 5 changed files with 20 additions and 43 deletions.
21 changes: 16 additions & 5 deletions docs/website/docs/dlt-ecosystem/destinations/weaviate.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,12 +230,23 @@ Here's a summary of the naming normalization approach:
Reserved property names like `id` or `additional` are prefixed with underscores for differentiation. Therefore, `id` becomes `__id` and `_id` is rendered as `___id`.

### Case insensitive naming convention
The default naming convention described above will preserve the casing of the properties (besides the first letter which is lowercased). This generates nice documents
The default naming convention described above will preserve the casing of the properties (besides the first letter which is lowercased). This generates nice classes
in Weaviate but also requires that your input data does not have clashing property names when comparing case insensitive ie. (`caseName` == `casename`). In such case
Weaviate destination will fail.

You can configure alternative naming convention
Weaviate destination will fail to create classes and report a conflict.

You can configure alternative naming convention which will lowercase all properties. The clashing properties will be merged and the classes created. Still if you have a document where clashing properties like:
```json
{"camelCase": 1, "CamelCase": 2}
```
it will be normalized to:
```
{"camelcase": 2}
```
so your best course of action is to clean up the data yourself before loading and use default naming convention. Nevertheless you can configure the alternative in `config.toml`:
```toml
[schema]
naming="dlt.destinations.weaviate.naming"
```

## Additional destination options

Expand Down Expand Up @@ -282,4 +293,4 @@ Currently Weaviate destination does not support dbt.

### Syncing of `dlt` state

Weaviate destination does not support syncing of the `dlt` state.
Weaviate destination supports syncing of the `dlt` state.
4 changes: 4 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,3 +68,7 @@ def _create_pipeline_instance_id(self) -> str:
# disable snowflake logging
for log in ["snowflake.connector.cursor", "snowflake.connector.connection"]:
logging.getLogger(log).setLevel("ERROR")

# disable azure logging
for log in ["azure.core.pipeline.policies.http_logging_policy"]:
logging.getLogger(log).setLevel("ERROR")
19 changes: 0 additions & 19 deletions tests/load/bigquery/test_bigquery_table_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,22 +106,3 @@ def test_double_partition_exception(gcp_client: BigQueryClient) -> None:
gcp_client._get_table_update_sql("event_test_table", mod_update, False)
assert excc.value.columns == ["`col4`", "`col5`"]


def test_partition_alter_table_exception(gcp_client: BigQueryClient) -> None:
mod_update = deepcopy(TABLE_UPDATE)
# timestamp
mod_update[3]["partition"] = True
# double partition
with pytest.raises(DestinationSchemaWillNotUpdate) as excc:
gcp_client._get_table_update_sql("event_test_table", mod_update, True)
assert excc.value.columns == ["`col4`"]


def test_cluster_alter_table_exception(gcp_client: BigQueryClient) -> None:
mod_update = deepcopy(TABLE_UPDATE)
# timestamp
mod_update[3]["cluster"] = True
# double cluster
with pytest.raises(DestinationSchemaWillNotUpdate) as excc:
gcp_client._get_table_update_sql("event_test_table", mod_update, True)
assert excc.value.columns == ["`col4`"]
9 changes: 0 additions & 9 deletions tests/load/redshift/test_redshift_table_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,12 +90,3 @@ def test_create_table_with_hints(client: RedshiftClient) -> None:
# no hints
assert '"col3" boolean NOT NULL' in sql
assert '"col4" timestamp with time zone NOT NULL' in sql


def test_hint_alter_table_exception(client: RedshiftClient) -> None:
mod_update = deepcopy(TABLE_UPDATE)
# timestamp
mod_update[3]["sort"] = True
with pytest.raises(DestinationSchemaWillNotUpdate) as excc:
client._get_table_update_sql("event_test_table", mod_update, True)
assert excc.value.columns == ['"col4"']
10 changes: 0 additions & 10 deletions tests/load/snowflake/test_snowflake_table_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,13 +90,3 @@ def test_create_table_with_partition_and_cluster(snowflake_client: SnowflakeClie

# clustering must be the last
assert sql.endswith('CLUSTER BY ("COL2","COL5")')


def test_cluster_alter_table_exception(snowflake_client: SnowflakeClient) -> None:
mod_update = deepcopy(TABLE_UPDATE)
# timestamp
mod_update[3]["cluster"] = True
# double cluster
with pytest.raises(DestinationSchemaWillNotUpdate) as excc:
snowflake_client._get_table_update_sql("event_test_table", mod_update, True)
assert excc.value.columns == ['"COL4"']

0 comments on commit d2d7927

Please sign in to comment.