fix(models.core:Database): Allocate a SQLA engine once per process per URL. #27899

pedro-r-marques · 2024-04-04T12:41:12Z

SUMMARY

As per issue #27897, SqlAlchemy recommends that

"""
The typical usage of create_engine() is once per particular database URL, held globally for the lifetime of a single application process. A single Engine manages many individual DBAPI connections on behalf of the process and is intended to be called upon in a concurrent fashion. The Engine is not synonymous to the DBAPI connect() function, which represents just one connection resource - the Engine is most efficient when created just once at the module level of an application, not per-object or per-function call.
"""

The superset.models.core::Database class is currently allocating Engine objects whenever a Database object is instantiated which happens twice(?) per 'api/v1/chart/data' API access. This is not the intended usage of the SQL Alchemy API and causes mechanisms such as connection pooling not to work correctly.

Connection pooling is an important feature in order to control access to databases. For instance when using 'duckdb', it is recommended that one uses a small number of concurrent requests (or a single concurrent request) and instead take advantage of the inherent parallelism in the database engine. Other engines will have other ideal settings.

TESTING INSTRUCTIONS

Configure a connection pool using the DB_CONNECTION_MUTATOR hook.

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

github-actions

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️

We hope to see you in our Slack community too! Not signed up? Use our Slack App to self-register.

betodealmeida

This is great, @pedro-r-marques, thanks for the PR!

There's been some discussion in the past regarding improving the engines and connection pooling (#8574), and your PR seems to be the first step in that direction.

betodealmeida · 2024-04-04T14:34:36Z

superset/models/core.py

+            tpl = cls._sqla_engines.get(sqlalchemy_url)
+            if tpl is not None:
+                engine, cparams = tpl
+                if cparams == params:


Can we have params be part of the key as well, in some deterministic serialized form? A few databases that support user impersonation will add the user info to params, and in that case the engine manager would not be very useful.

Transform potentially nested dict into a tuple.

pedro-r-marques · 2024-04-08T15:30:46Z

The CI failures above:

Translations was a CI error (network failure).
pre-commit check wants to reformat lines of code in a file that I touched but have nothing to do with the PR. What is the policy for the project ? should I add the change to make 'black' happy ? Or ignore it since it is unrelated to the intended change ?

superset/models/core.py

eschutho · 2024-04-10T22:53:08Z

@pedro-r-marques I restarted CI and fixed the formatting issue. We'll see if it passes now.

pedro-r-marques · 2024-04-12T13:37:26Z

@eschutho The formatting issue is fixed. Unfortunatly the test-mysql test seems to be failing in test_chart_data_async. I've attempted to run the test multiple times in my setup (using the same mysql version and redis cache configuration as the CI) with no luck.

Any hints ?

From the failure code, I'm under the impression that the test is expecting result code 202 [ which would mean that the job has been submitted to a cellery worker ? ] and is instead getting a 200 [ meaning the result is being served from cache ? ]

I've tried running the test both at the version of the code in the PR as well as at HEAD. It always passes for me.

ShubhamDalmia · 2024-07-08T06:27:54Z

Any updates on this?

villebro · 2024-07-08T09:26:49Z

From the failure code, I'm under the impression that the test is expecting result code 202 [ which would mean that the job has been submitted to a cellery worker ? ] and is instead getting a 200 [ meaning the result is being served from cache ? ]

@pedro-r-marques the particular test is hopelessly flaky (we will likely need to remove it, as fixing it has proved to be difficult for the MySQL integration test). Can you rebase the PR to fix the conflict?

pull-request-size bot added the size/S label Apr 4, 2024

github-actions bot reviewed Apr 4, 2024

View reviewed changes

Allocate a SQLA engine once per process per URL.

32bb4d1

pedro-r-marques force-pushed the create_engine branch from aaf2b35 to 32bb4d1 Compare April 4, 2024 14:08

pull-request-size bot added size/M and removed size/S labels Apr 4, 2024

rusackas requested review from villebro, john-bodley and betodealmeida April 4, 2024 14:22

betodealmeida reviewed Apr 4, 2024

View reviewed changes

Use params as key.

85fe8b0

Transform potentially nested dict into a tuple.

michael-s-molina linked an issue Apr 5, 2024 that may be closed by this pull request

sqlalchemy engine should be created once per process #27897

Open

3 tasks

pedro-r-marques changed the title ~~fix(models.core::Database) Allocate a SQLA engine once per process per URL.~~ fix(models.core:Database): Allocate a SQLA engine once per process per URL. Apr 8, 2024

pedro-r-marques added 2 commits April 8, 2024 10:07

Fix type annotations.

f0e5cce

Flush the engine cache when running unittests.

725c75a

eschutho reviewed Apr 10, 2024

View reviewed changes

superset/models/core.py Outdated Show resolved Hide resolved

fix CI

16dc0f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(models.core:Database): Allocate a SQLA engine once per process per URL. #27899

fix(models.core:Database): Allocate a SQLA engine once per process per URL. #27899

pedro-r-marques commented Apr 4, 2024

github-actions bot left a comment

betodealmeida left a comment

betodealmeida Apr 4, 2024

pedro-r-marques Apr 4, 2024

pedro-r-marques commented Apr 8, 2024 •

edited

Loading

eschutho commented Apr 10, 2024

pedro-r-marques commented Apr 12, 2024

ShubhamDalmia commented Jul 8, 2024

villebro commented Jul 8, 2024

fix(models.core:Database): Allocate a SQLA engine once per process per URL. #27899

Are you sure you want to change the base?

fix(models.core:Database): Allocate a SQLA engine once per process per URL. #27899

Conversation

pedro-r-marques commented Apr 4, 2024

SUMMARY

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

github-actions bot left a comment

Choose a reason for hiding this comment

betodealmeida left a comment

Choose a reason for hiding this comment

betodealmeida Apr 4, 2024

Choose a reason for hiding this comment

pedro-r-marques Apr 4, 2024

Choose a reason for hiding this comment

pedro-r-marques commented Apr 8, 2024 • edited Loading

eschutho commented Apr 10, 2024

pedro-r-marques commented Apr 12, 2024

ShubhamDalmia commented Jul 8, 2024

villebro commented Jul 8, 2024

pedro-r-marques commented Apr 8, 2024 •

edited

Loading