Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache IsExtensionRegistered #115

Merged
merged 3 commits into from
Aug 13, 2024
Merged

Cache IsExtensionRegistered #115

merged 3 commits into from
Aug 13, 2024

Conversation

JelteF
Copy link
Collaborator

@JelteF JelteF commented Aug 13, 2024

Doing a btree lookup in pg_extension for every query is not free, and an extension introducing one can (based on my experience) impact performance for single row lookups significantly.

This introduces a cache which tracks if the extension is registered. I plan to use this cache in later PRs for more things, such as oids of our DuckDB functions.

This cache is invalidated whenever an invalidation message is sent for the "duckdb" schema, which just so happens to be whenever CREATE/DROP EXTENSION pg_duckdb is executed. These invalidation messages are the mechanism that Postgres uses to let other backends know that some DDL has occured. We'll probably want to do something else for ALTER EXTENSION pg_duckdb UPDATE, but for now this is good enough (since there's no upgrades yet). It would have been best if we could receive invalidation messages for rows in pg_extension. But sadly this is not possible. I submitted a patch to upstream Postgres to hopefully allow for this in the future (but that will only be in PG18+):
https://www.postgresql.org/message-id/flat/CAGECzQTWm9sex719Hptbq4j56hBGUti7J9OWjeMobQ1ccRok9w%40mail.gmail.com

Doing a btree lookup in `pg_extension` for every query is not free, and
an extension introducing one can (based on my experience) impact
performance for single row lookups significantly.

This introduces a cache which tracks if the extension is registered. I
plan to use this cache in later PRs for more things, such as oids of our
DuckDB functions.

This cache is invalidated whenever an invalidation message is sent for
the "duckdb" schema, which just so happens to be whenever `CREATE/DROP
EXTENSION pg_duckdb` is executed. These invalidation messages are the
mechanism that Postgres uses to let other backends know that some DDL
has occured. We'll probably want to do something else for `ALTER
EXTENSION pg_duckdb UPDATE`, but for now this is good enough (since
there's no upgrades yet). It would have been best if we receive
invalidation messages for rows in `pg_extension`. But sadly this is not
possible. I submitted a patch to upstream Postgres to hopefully allow
for this in the future (but that will only be in PG18+):
https://www.postgresql.org/message-id/flat/CAGECzQTWm9sex719Hptbq4j56hBGUti7J9OWjeMobQ1ccRok9w%40mail.gmail.com
@hlinnaka
Copy link
Collaborator

Hmm, why do we check whether the extension has been installed? I think if the extension is loaded in shared_preload_libraries, it should work without running the CREATE EXTENSION command. Looking at the install script, there isn't anything critical there AFAICS.

Instead of checking whether the extension is installed, you could look up the tables and functions that the install script installs. There are existing syscaches for those.

@JelteF
Copy link
Collaborator Author

JelteF commented Aug 13, 2024

Hmm, why do we check whether the extension has been installed? I think if the extension is loaded in shared_preload_libraries, it should work without running the CREATE EXTENSION command.

While currently not strictly necessary indeed. In the near future it's probably going to be needed (or at least significantly harder to support both situations).

Also, it seems quite nice from a user perspective that people can disable the extension completely by running DROP EXTENSION. Without actually having to change shared_preload_libraries and restart their postgres server (which they might only want to do later during non-peak hours).

Instead of checking whether the extension is installed, you could look up the tables and functions that the install script installs. There are existing syscaches for those.

That's indeed possible. But if the function or table exists, it doesn't mean that it's created by the extension. Another user/extension could have created it.

@wuputah wuputah self-requested a review August 13, 2024 16:59
@JelteF JelteF merged commit 177b12b into main Aug 13, 2024
2 checks passed
@JelteF JelteF deleted the cache-extension-installation branch August 13, 2024 17:00
@hlinnaka
Copy link
Collaborator

hlinnaka commented Aug 13, 2024

Also, it seems quite nice from a user perspective that people can disable the extension completely by running DROP EXTENSION. Without actually having to change shared_preload_libraries and restart their postgres server (which they might only want to do later during non-peak hours).

The duckdb.execution GUC is the primary way of enabling/disabling it, right?

@JelteF
Copy link
Collaborator Author

JelteF commented Aug 14, 2024

The duckdb.execution GUC is the primary way of enabling/disabling it, right?

Currently yes, but I'm trying to add some automatic detection of when this execution is needed. Specifically when one of the functions used is a duckdb function like read_parquet.

Also, just for completeness I tried removing the IsExtensionRegistered check from the code just now. And that causes qeuries to fail when setting duckdb.execution=true with an error like this:

ERROR:  3F000: schema "duckdb" does not exist
LOCATION:  get_namespace_oid, namespace.c:3093

The reason for that is that the extension relies on the duckdb schema and its tables to exist, for loading extensions and secrets. Ofcourse we could change the code to assume no extensions/secrets need to be loaded if the respective tables don't exist. But all of such checks add more complexity than they are worth imho.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants