Extend custom destination #1107

sh-rp · 2024-03-18T17:07:27Z

Description

This PR introduces two new settings for the custom destination decorator:

max_nesting_level to control the normalizer
skip_dlt_columns_and_tables to skip internal columns and tables

…tion

update readme

netlify · 2024-03-18T17:07:42Z

✅ Deploy Preview for dlt-hub-docs canceled.

Name	Link
🔨 Latest commit	`8bb0e30`
🔍 Latest deploy log	https://app.netlify.com/sites/dlt-hub-docs/deploys/65fa2338c2655a0008f9d633

sh-rp · 2024-03-18T17:09:23Z

dlt/pipeline/pipeline.py

@@ -456,7 +456,18 @@ def normalize(
            return None

        # make sure destination capabilities are available
-        self._get_destination_capabilities()
+        caps = self._get_destination_capabilities()


@rudolfix i need some guidance where to inject / overwrite the max_nesting_level coming from a destination. I realize this place is very likely not the right one, but I am not sure where and how to do it. Should I get the capabilities context in the relationalnormalizer and not persist this setting to the schema at all, or what is the best way?

the only thing you need to do is to fix NormalizersConfiguration

def on_resolved(self) -> None: # get naming from capabilities if not present if self.naming is None: if self.destination_capabilities: self.naming = self.destination_capabilities.naming_convention

detect the type of the json normalizer and apply the settings to it like the below. you can override existing settings. I think capabilities (if not None) should have precedence over the source settings.

what happens later:
when new schema is created this setting will be used
when schema is loaded - it will not but when we call update_normalize - it will. and we do that in normalizer (in schema.clone). so it should work!

since this is set on the nested json normalizer settings i had to change a bit more but not much, I hope it is ok to change the type from mapping to dict in there.

rudolfix

this is good! see the trick to apply max nesting to schema

rudolfix · 2024-03-18T19:43:19Z

dlt/destinations/decorators.py

@@ -23,6 +23,8 @@ def destination(
    batch_size: int = 10,
    name: str = None,
    naming_convention: str = "direct",


good! please add this to our docs: that the default settings are such that data comes to sink without changing identifiers, un-nested and with dlt identifiers removed. and that it is good to push stuff to queues and REST APIs

rudolfix · 2024-03-18T19:44:33Z

dlt/destinations/impl/destination/destination.py

@@ -27,6 +30,8 @@
    TDestinationCallable,
 )

+INTERNAL_MARKER = "_dlt"


you must use schema._dlt_tables_prefix (which may be normalized) to detect dlt identifiers. you may add such method to schema (but it will be slower to call a method)

dlt/destinations/impl/destination/destination.py

rudolfix · 2024-03-18T20:00:49Z

dlt/pipeline/pipeline.py

@@ -456,7 +456,18 @@ def normalize(
            return None

        # make sure destination capabilities are available
-        self._get_destination_capabilities()
+        caps = self._get_destination_capabilities()


the only thing you need to do is to fix NormalizersConfiguration

def on_resolved(self) -> None: # get naming from capabilities if not present if self.naming is None: if self.destination_capabilities: self.naming = self.destination_capabilities.naming_convention

detect the type of the json normalizer and apply the settings to it like the below. you can override existing settings. I think capabilities (if not None) should have precedence over the source settings.

what happens later:
when new schema is created this setting will be used
when schema is loaded - it will not but when we call update_normalize - it will. and we do that in normalizer (in schema.clone). so it should work!

propagate the max_nesting_level the correct way from the destination caps

rudolfix

I have a few additions to docs and cross references. Let's merge this first. LGTM!

rudolfix · 2024-03-19T14:50:11Z

dlt/common/normalizers/configuration.py

+        ):
+            self.json_normalizer = self.json_normalizer or {}
+            self.json_normalizer.setdefault("config", {})
+            self.json_normalizer["config"][


this is the best we can do now. if we have more normalizers with incompatible configs then we'll need to look for something better

* removed sink mentions, fixed code snippets * rename title * trigger tests * trigger tests 2 * revert changes * small edits

sh-rp added 3 commits March 18, 2024 15:47

rename tests file

469b225

add setting to skip dlt internal tables and columns in custom destina…

fc15c94

…tion

add nesting level setting to custom destination

890c0e0

update readme

sh-rp requested a review from rudolfix March 18, 2024 17:07

sh-rp commented Mar 18, 2024

View reviewed changes

rudolfix requested changes Mar 18, 2024

View reviewed changes

sh-rp added 4 commits March 18, 2024 23:21

use correct internal dlt schema item marker

4278853

propagate the max_nesting_level the correct way from the destination caps

add example for custom destination bigquery

1e276c5

fix embedded snippet checker output

cee1e90

add custom destination example to docs

7e02f82

sh-rp force-pushed the d#/custom_destination_enhancements branch 2 times, most recently from d3c7d89 to 040ed32 Compare March 19, 2024 14:08

update custom destination example

f6e7e6f

sh-rp force-pushed the d#/custom_destination_enhancements branch from 040ed32 to f6e7e6f Compare March 19, 2024 14:09

sh-rp requested a review from rudolfix March 19, 2024 14:10

sh-rp added 4 commits March 19, 2024 15:38

pin flake8-encodings to fork

ed5956b

fix snippet marker

6467063

ignore google imports

5f888ad

Merge branch 'devel' into d#/custom_destination_enhancements

cd14cbe

rudolfix marked this pull request as ready for review March 19, 2024 16:18

rudolfix previously approved these changes Mar 19, 2024

View reviewed changes

Docs: fix custom destination (#1113)

897bf8b

* removed sink mentions, fixed code snippets * rename title * trigger tests * trigger tests 2 * revert changes * small edits

sh-rp dismissed rudolfix’s stale review via 897bf8b March 19, 2024 22:05

sh-rp added 5 commits March 19, 2024 23:57

pin databind.json python package

5a5645f

pin databind core

e663e30

add bigquery extra for snippets tests

3739c9a

updates to the readme

b3fe660

rename function for nesting level test

8bb0e30

sh-rp merged commit 713aa31 into devel Mar 20, 2024
43 of 53 checks passed

sh-rp deleted the d#/custom_destination_enhancements branch March 20, 2024 07:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend custom destination #1107

Extend custom destination #1107

sh-rp commented Mar 18, 2024

netlify bot commented Mar 18, 2024 •

edited

Loading

sh-rp Mar 18, 2024

rudolfix Mar 18, 2024

sh-rp Mar 18, 2024

rudolfix left a comment

rudolfix Mar 18, 2024

rudolfix Mar 18, 2024

rudolfix Mar 18, 2024

rudolfix left a comment

rudolfix Mar 19, 2024

Extend custom destination #1107

Extend custom destination #1107

Conversation

sh-rp commented Mar 18, 2024

Description

netlify bot commented Mar 18, 2024 • edited Loading

✅ Deploy Preview for dlt-hub-docs canceled.

sh-rp Mar 18, 2024

Choose a reason for hiding this comment

rudolfix Mar 18, 2024

Choose a reason for hiding this comment

sh-rp Mar 18, 2024

Choose a reason for hiding this comment

rudolfix left a comment

Choose a reason for hiding this comment

rudolfix Mar 18, 2024

Choose a reason for hiding this comment

rudolfix Mar 18, 2024

Choose a reason for hiding this comment

rudolfix Mar 18, 2024

Choose a reason for hiding this comment

rudolfix left a comment

Choose a reason for hiding this comment

rudolfix Mar 19, 2024

Choose a reason for hiding this comment

netlify bot commented Mar 18, 2024 •

edited

Loading