-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply hints for nested tables #2165
base: devel
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for dlt-hub-docs canceled.
|
Adding this type annotation fixed 69 failing tests. The missing Optional impacted the dlt.common.validation.validate_dict().validate_prop() functions to parse the RESTAPIConfig object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my suggestion how to deal with naming convention. Docs requirements are in the ticket.
dlt/extract/hints.py
Outdated
full_path = (root_table_name,) + path | ||
table = instance.compute_table_schema(item, meta) | ||
if not table.get("name"): | ||
table["name"] = "__".join(full_path) # TODO: naming convention |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compute_table_chain
must take NamingConvention
instance that has a method to join path and we do not need to hardcode the "__".
overall this is a weakness of dlt
that it relies on such separator and stores only normalized names in the schema. we lose a little bit of lineage information but right now we can't really avoid that without a big rewrite
bef7a3f
to
e51bd25
Compare
The current implementation adds the tables to the schema (as tested), but it doesn't affect how the data is loaded. For example, the hints will appear in pipeline.default_schema.tables.keys()
# ignoring the dlt tables
# 'nested_data', 'override_child_outer1', 'override_child_outer1_innerfoo','nested_data__outer1', 'nested_data__outer1__innerfoo' Whereas the normalizer row counts show no ingested data for the tables pipeline.last_trace.last_normalize_info.row_counts
# 'nested_data': 2, 'nested_data__outer1': 2, 'nested_data__outer1__innerfoo': 2 I believe changes need to be made to The extractor would need to hold some mapping, but it could be more appropriate to move the logic to |
Relational normalizer follows its logic of creating nested tables and column names. it comes only from the data. there's no mechanism to rename those, except the root table name which the user must set.
I assume that in example you are giving, you used a custom table name for nested table. If this is not the case ping me on slack. maybe there's a bug somewhere in the ticket above, there's a note:
so I'd say we block setting table name on nested hints (also parent name and incremental do make sense) |
Description
Draft of nested table hints implementation:
Is working so far but there are some bugs and tests needed.
Related Issues
Additional Context