-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schema contract #594
schema contract #594
Conversation
✅ Deploy Preview for dlt-hub-docs canceled.
|
0daed01
to
3bbb8b1
Compare
3bbb8b1
to
edad4ad
Compare
dlt/normalize/normalize.py
Outdated
# coerce row of values into schema table, generating partial table with new columns if any | ||
row, partial_table = schema.coerce_row(table_name, parent_table, row) | ||
# check update | ||
row, partial_table = schema.check_schema_update(table_name, row, partial_table, config.schema_update_mode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't call it if partial table is None! this is 99.99% of cases and this code is time critical
dlt/normalize/normalize.py
Outdated
@@ -196,7 +203,7 @@ def map_parallel(self, schema: Schema, load_id: str, files: Sequence[str]) -> TM | |||
workers = self.pool._processes # type: ignore | |||
chunk_files = self.group_worker_files(files, workers) | |||
schema_dict: TStoredSchema = schema.to_dict() | |||
config_tuple = (self.normalize_storage.config, self.load_storage.config, self.config.destination_capabilities, schema_dict) | |||
config_tuple = (self.config, self.normalize_storage.config, self.load_storage.config, self.config.destination_capabilities, schema_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just FYI: this goes through process boundary and will be pickled. pay attention
dlt/common/schema/schema.py
Outdated
@@ -174,7 +177,32 @@ def coerce_row(self, table_name: str, parent_table: str, row: StrAny) -> Tuple[D | |||
updated_table_partial["columns"][new_col_name] = new_col_def | |||
|
|||
return new_row, updated_table_partial | |||
|
|||
|
|||
def check_schema_update(self, table_name: str, row: DictStrAny, partial_table: TPartialTableSchema, schema_update_mode: TSchemaUpdateMode) -> Tuple[DictStrAny, TPartialTableSchema]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a solid implementation. but I'd like to describe it in the ticket (we can resue it for the docs) and maybe modify the requirements.
5e327c2
to
fc6f083
Compare
dlt/common/validation.py
Outdated
t = extract_optional_type(t) | ||
|
||
if is_literal_type(t): | ||
# TODO: support for union types? | ||
if pk == "schema_evolution_settings": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have a look at this..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - and that should be trivial
4f575fd
to
5659827
Compare
# Conflicts: # dlt/common/schema/schema.py
2481107
to
8bccfe5
Compare
15cada5
to
05f4c1a
Compare
05f4c1a
to
6308369
Compare
…cts pue to skip decoding
…ata filtering still missing
@sh-rp changes from previous version
|
issue #135
TODO: