-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(models.core:Database): Allocate a SQLA engine once per process per URL. #27899
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aaf2b35
to
32bb4d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, @pedro-r-marques, thanks for the PR!
There's been some discussion in the past regarding improving the engines and connection pooling (#8574), and your PR seems to be the first step in that direction.
superset/models/core.py
Outdated
tpl = cls._sqla_engines.get(sqlalchemy_url) | ||
if tpl is not None: | ||
engine, cparams = tpl | ||
if cparams == params: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have params
be part of the key as well, in some deterministic serialized form? A few databases that support user impersonation will add the user info to params
, and in that case the engine manager would not be very useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL
Transform potentially nested dict into a tuple.
The CI failures above:
|
@pedro-r-marques I restarted CI and fixed the formatting issue. We'll see if it passes now. |
@eschutho The formatting issue is fixed. Unfortunatly the test-mysql test seems to be failing in Any hints ? From the failure code, I'm under the impression that the test is expecting result code 202 [ which would mean that the job has been submitted to a cellery worker ? ] and is instead getting a 200 [ meaning the result is being served from cache ? ] I've tried running the test both at the version of the code in the PR as well as at HEAD. It always passes for me. |
Any updates on this? |
@pedro-r-marques the particular test is hopelessly flaky (we will likely need to remove it, as fixing it has proved to be difficult for the MySQL integration test). Can you rebase the PR to fix the conflict? |
SUMMARY
As per issue #27897, SqlAlchemy recommends that
"""
The typical usage of create_engine() is once per particular database URL, held globally for the lifetime of a single application process. A single Engine manages many individual DBAPI connections on behalf of the process and is intended to be called upon in a concurrent fashion. The Engine is not synonymous to the DBAPI connect() function, which represents just one connection resource - the Engine is most efficient when created just once at the module level of an application, not per-object or per-function call.
"""
The superset.models.core::Database class is currently allocating Engine objects whenever a Database object is instantiated which happens twice(?) per 'api/v1/chart/data' API access. This is not the intended usage of the SQL Alchemy API and causes mechanisms such as connection pooling not to work correctly.
Connection pooling is an important feature in order to control access to databases. For instance when using 'duckdb', it is recommended that one uses a small number of concurrent requests (or a single concurrent request) and instead take advantage of the inherent parallelism in the database engine. Other engines will have other ideal settings.
TESTING INSTRUCTIONS
Configure a connection pool using the
DB_CONNECTION_MUTATOR
hook.ADDITIONAL INFORMATION