Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Optimise metadata discovery for large databases #528

Merged
merged 2 commits into from
Nov 5, 2024

Conversation

jamesmeneghello
Copy link
Contributor

@jamesmeneghello jamesmeneghello commented Nov 4, 2024

Overrides the SDK functions to instead use the get_multi_* functions from SQLAlchemy Inspector. On our database of ~120 tables, this reduces the discovery runtime from 10-12 minutes to about 30 seconds.

@jamesmeneghello jamesmeneghello force-pushed the main branch 2 times, most recently from c641747 to 9109819 Compare November 4, 2024 04:21
Copy link
Member

@edgarrmondragon edgarrmondragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @jamesmeneghello! I left some comments.

tap_postgres/client.py Outdated Show resolved Hide resolved
tap_postgres/client.py Outdated Show resolved Hide resolved
tap_postgres/client.py Show resolved Hide resolved
@edgarrmondragon edgarrmondragon changed the title Optimise metadata discovery for large databases refactor: Optimise metadata discovery for large databases Nov 4, 2024
@edgarrmondragon edgarrmondragon self-assigned this Nov 4, 2024
@edgarrmondragon edgarrmondragon self-requested a review November 4, 2024 18:18
@edgarrmondragon edgarrmondragon added the enhancement New feature or request label Nov 4, 2024
@jamesmeneghello
Copy link
Contributor Author

Fixed those and the issues from the CI run.

Copy link
Member

@edgarrmondragon edgarrmondragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgarrmondragon edgarrmondragon merged commit 9bb40d1 into MeltanoLabs:main Nov 5, 2024
12 checks passed
edgarrmondragon added a commit that referenced this pull request Nov 6, 2024
Related:

- Reverts #528
- Closes #535

This reverts commit 9bb40d1.
edgarrmondragon added a commit that referenced this pull request Nov 7, 2024
@edgarrmondragon
Copy link
Member

edgarrmondragon commented Nov 7, 2024

This seems to have broken at least stream maps (#535), so I reverted it in #536 until I can investigate and come up with a patch.

inspected = sa.inspect(engine)
for schema_name in self.get_schema_names(engine, inspected):
# Use get_multi_* data here instead of pulling per-table
table_data = inspected.get_multi_columns(schema=schema_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ignores views, thus the regression. See https://github.com/meltano/sdk/pull/2793/files#r1867006461.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discovery Performance
2 participants