Apply hints for nested tables #2165

steinitzu · 2024-12-19T17:54:14Z

Description

Draft of nested table hints implementation:

apply_hints(path=['a', 'b', 'c'], columns=...)

Is working so far but there are some bugs and tests needed.

Related Issues

Resolves Simplify schema modification of child tables #1647

Additional Context

netlify · 2024-12-19T17:55:03Z

✅ Deploy Preview for dlt-hub-docs canceled.

Name	Link
🔨 Latest commit	`e51bd25`
🔍 Latest deploy log	https://app.netlify.com/sites/dlt-hub-docs/deploys/679a7aff51c87d000948a3af

Adding this type annotation fixed 69 failing tests. The missing Optional impacted the dlt.common.validation.validate_dict().validate_prop() functions to parse the RESTAPIConfig object

rudolfix

Please see my suggestion how to deal with naming convention. Docs requirements are in the ticket.

rudolfix · 2025-01-29T10:04:17Z

dlt/extract/hints.py

+            full_path = (root_table_name,) + path
+            table = instance.compute_table_schema(item, meta)
+            if not table.get("name"):
+                table["name"] = "__".join(full_path)  # TODO: naming convention


compute_table_chain must take NamingConvention instance that has a method to join path and we do not need to hardcode the "__".

overall this is a weakness of dlt that it relies on such separator and stores only normalized names in the schema. we lose a little bit of lineage information but right now we can't really avoid that without a big rewrite

zilto · 2025-01-29T21:19:02Z

The current implementation adds the tables to the schema (as tested), but it doesn't affect how the data is loaded.

For example, the hints will appear in

pipeline.default_schema.tables.keys()
# ignoring the dlt tables
# 'nested_data', 'override_child_outer1', 'override_child_outer1_innerfoo','nested_data__outer1', 'nested_data__outer1__innerfoo'

Whereas the normalizer row counts show no ingested data for the tables

pipeline.last_trace.last_normalize_info.row_counts
# 'nested_data': 2, 'nested_data__outer1': 2, 'nested_data__outer1__innerfoo': 2

I believe changes need to be made to Extractor._write_to_dynamic_table() and _get_dynamic_table_name() to push data to the right table. (Extractor._write_to_static_table() should rely on the explicitly provided table name).

The extractor would need to hold some mapping, but it could be more appropriate to move the logic to dlt.common.normalizers.json.helpers or to a Schema method?

rudolfix · 2025-01-30T09:46:06Z

Relational normalizer follows its logic of creating nested tables and column names. it comes only from the data. there's no mechanism to rename those, except the root table name which the user must set.

dlt is data first, not schema first. it is counterintuitive if you chose to start your work with schema, not data.

I assume that in example you are giving, you used a custom table name for nested table. If this is not the case ping me on slack. maybe there's a bug somewhere

in the ticket above, there's a note:

You still may allow users to specify table_name on the nested hint. If you do so, you'll need to modify the normalizer so it maps paths to those names. IMO this is for another ticket and bigger overhaul of the schema
prevent following to be set on nested table:
parent_table_name: TTableHintTemplate[str] = None,
incremental: TIncrementalConfig = None,

so I'd say we block setting table name on nested hints (also parent name and incremental do make sense)

steinitzu added 3 commits December 18, 2024 09:43

unifies ResourceHints typed dict

1ea2af2

Apply nested hints and compute table chain from nested hints

66a54af

Arrow fix

39ac90f

steinitzu added 5 commits December 20, 2024 10:17

Handle TableNameMeta

3459544

Fix name hint

309c3d7

Arrow fix, all tests/extract running

743816f

Nested hints tests, handle table name overrides

d0a83b2

lint

ee28048

rudolfix mentioned this pull request Jan 8, 2025

allows to define hints for nested tables #1855

Closed

rudolfix assigned zilto Jan 28, 2025

zilto added 2 commits January 28, 2025 18:52

required Optional type annotation added

12a6a4b

Adding this type annotation fixed 69 failing tests. The missing Optional impacted the dlt.common.validation.validate_dict().validate_prop() functions to parse the RESTAPIConfig object

updated pokemon source to pokeapi==2.7.0

3c129f9

zilto marked this pull request as ready for review January 29, 2025 01:42

rudolfix requested changes Jan 29, 2025

View reviewed changes

zilto added 2 commits January 29, 2025 11:31

refactored tests; fixed syntaxerror

7f499f0

use naming convention when resolving nested tables

e51bd25

zilto force-pushed the define-hints-nested-tables branch from bef7a3f to e51bd25 Compare January 29, 2025 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply hints for nested tables #2165

Apply hints for nested tables #2165

steinitzu commented Dec 19, 2024

netlify bot commented Dec 19, 2024 •

edited

Loading

rudolfix left a comment

rudolfix Jan 29, 2025

zilto commented Jan 29, 2025

rudolfix commented Jan 30, 2025

Apply hints for nested tables #2165

Are you sure you want to change the base?

Apply hints for nested tables #2165

Conversation

steinitzu commented Dec 19, 2024

Description

Related Issues

Additional Context

netlify bot commented Dec 19, 2024 • edited Loading

✅ Deploy Preview for dlt-hub-docs canceled.

rudolfix left a comment

Choose a reason for hiding this comment

rudolfix Jan 29, 2025

Choose a reason for hiding this comment

zilto commented Jan 29, 2025

rudolfix commented Jan 30, 2025

netlify bot commented Dec 19, 2024 •

edited

Loading