-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Optimise metadata discovery for large databases #528
Conversation
c641747
to
9109819
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @jamesmeneghello! I left some comments.
Fixed those and the issues from the CI run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jamesmeneghello!
inspected = sa.inspect(engine) | ||
for schema_name in self.get_schema_names(engine, inspected): | ||
# Use get_multi_* data here instead of pulling per-table | ||
table_data = inspected.get_multi_columns(schema=schema_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ignores views, thus the regression. See https://github.com/meltano/sdk/pull/2793/files#r1867006461.
Overrides the SDK functions to instead use the
get_multi_*
functions from SQLAlchemy Inspector. On our database of ~120 tables, this reduces the discovery runtime from 10-12 minutes to about 30 seconds.