-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix bigquery autodetect #2035
fix bigquery autodetect #2035
Conversation
✅ Deploy Preview for dlt-hub-docs canceled.
|
The proposed quick fix for this would be to enable merging and staging replace for bigquery auto schema detection based on cloning the table schema from staging to final table in bigquery. Merges with schema updates will fail though, either we somehow freeze the schema for these tables or we add a very big note in the docs about this. Maybe we can amend the exception message for a fail like this with some more info about why this might have failed for bigquery tables with autoschema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fixes the case without schema evolution. it is a good first attempt (btw. I think autodetect schema cannot really evolve existing tables so we are fine!
https://www.reddit.com/r/bigquery/comments/1c02azc/autodetecting_updated_schema_when/?rdt=38122)
dlt/destinations/sql_jobs.py
Outdated
# NOTE: we also need to create all child tables | ||
# NOTE: this will not work if the schema of the staging table changes in the next run.. | ||
# in some cases we need to create final tables here | ||
sql.append(f"CREATE TABLE IF NOT EXISTS {root_table_name} LIKE {staging_root_table_name};") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code should be executed at the end of each copy job - and only if autodetect schema is enabled. this way we can just run our merge job as usual.
we could also migrate columns from staging table to destination tables but maybe we can start with just cloning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have prepended it so the bigquery merge jobs for autodetect tables now. The other solution would be much more complicated with two consecutive followupjobs. I can do it that way too, but I think my solution is fairly clean...
@@ -273,6 +274,20 @@ def _make_database_exception(cls, ex: Exception) -> Exception: | |||
# anything else is transient | |||
return DatabaseTransientException(ex) | |||
|
|||
def truncate_tables(self, *tables: str) -> None: | |||
"""NOTE: We only truncate tables that exist, for auto-detect schema we don't know which tables exist""" | |||
statements: List[str] = ["DECLARE table_exists BOOL;"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we do it only when autodetect schema is on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
5ed912a
to
92deee7
Compare
6e2c8cb
to
a546667
Compare
a546667
to
f515820
Compare
Description
When using autodetect_schema on bigquery, merge and replace with "insert-from-staging" will fail, because dlt is:
Problems: The suggested approach in this ticket will not work if the schema of the table changes, because bigquery has no commands to update the schema of an existing table to match the schema of another existing table. So the staging table will auto evolve but it's not easy to have the main table follow this without directly inserting data there.
Possible solutions: