Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Postgres driver #407

Closed
c0c0n3 opened this issue Dec 3, 2020 · 8 comments · Fixed by #586
Closed

Upgrade Postgres driver #407

c0c0n3 opened this issue Dec 3, 2020 · 8 comments · Fixed by #586

Comments

@c0c0n3
Copy link
Member

c0c0n3 commented Dec 3, 2020

Is your feature request related to a problem? Please describe.
Starting from version 1.16.6, pg8000 has dropped the PGJsonb class. This breaks the Timescale translator so we pinned the pg8000 version in the Pipfile to 1.16.5---the last version to ship with the PGJsonb class. Also, pg8000 doesn't come with connection pooling.

Describe the solution you'd like
See how difficult it'd be to switch to Psycopg. It shouldn't be hard actually. Notice Psycopg has built-in connection pooling.

Describe alternatives you've considered
Stick with pg8000, just replace PGJsonb with json.dump. Quick solution, but then we'd still need a solution for connection pooling...

Additional context
See also: crate/crate-python#397

@chicco785
Copy link
Contributor

@chicco785
Copy link
Contributor

@c0c0n3 c0c0n3 mentioned this issue Dec 3, 2020
8 tasks
@amotl
Copy link

amotl commented Dec 3, 2020

Dear @c0c0n3 and @chicco785,

on this matter, and maybe also related to crate/crate-python#397 as well as crate/sqlalchemy-cratedb#90, I would like to encourage you to use psycopg2 instead of pg8000. At crate/crate-operator#117 (comment), I've investigated the situation about connection pooling:

On the psycopg2 side, this has been discussed within psycopg/psycopg2#113 and psycopg/psycopg2#561. On aiopg, aio-libs/aiopg#491 might be related.

In order to mitigate that, efforts on psycopg2 coming from psycopg/psycopg2#565 and psycopg/psycopg2#974 apparently have been wrapped into psycopg2-pool, see also liberapay/postgres.py#80.

If you don't want to depend on PostgreSQL headers for building psycopg2, I found psycopg2-binary to be very useful in the past.

With kind regards,
Andreas.

P.S.: When looking at implementations for asyncio and friends, asyncpg.create_pool() might be of interest.

@c0c0n3
Copy link
Member Author

c0c0n3 commented Dec 4, 2020

@amotl excellent, thank you so much for this, very valuable info indeed.

Do you know by any chance what the Crate client does w/r/t connection pooling? We're using the HTTP client, so perhaps connection pooling gets delegated to urllib3? Well, that's what I thought, but I've never actually verified it. Also, is this the best client to use? Since Crate supports the Postgres wire protocol, in principle we might be able to use Psycopg? Or?

@amotl
Copy link

amotl commented Dec 4, 2020

Dear @c0c0n3,

Do you know by any chance what the CrateDB client does w/r/t connection pooling? We're using the HTTP client, so perhaps connection pooling gets delegated to urllib3?

Exactly. I've shared some thoughts about this at crate/sqlalchemy-cratedb#90, coming from a question by @chicco785.

Also, is this the best client to use? Since CrateDB supports the PostgreSQL wire protocol, in principle we might be able to use psycopg2?

That's true. CrateDB support for the PostgreSQL wire protocol is there [1] and we are continuously working on improving on the SQL syntax side and other details in this regard which happen on top of the wire protocol.

With kind regards,
Andreas.

[1] https://crate.io/docs/crate/reference/en/4.3/interfaces/postgres.html

@amotl
Copy link

amotl commented Dec 4, 2020

P.S.: As I've learned you might be looking into SQLAlchemy: Because of the document-oriented features of CrateDB and probably other details, the integration with SQLAlchemy needs a CrateDB SQLAlchemy dialect. I've just created crate/sqlalchemy-cratedb#100 for exploring the aspect of using it together with psycopg2.

Regarding your comments at:
https://github.com/smartsdk/ngsi-timeseries-api/blob/607f16d6109c9b0db5e5193de9345a7dcbb4d9f4/src/translators/sql_translator.py#L105-L114

Performance

An ORM will always impose some overhead. So, for raw insert performance, one might always choose to do direct inserts. However, even when using the SQLAlchemy engine for connecting to the database [1], you can skip the ORM layer by directly executing SQL statements, as outlined at [2].

Apart from that, as I've stated at crate/crate-python#391 (comment), the upcoming SQLAlchemy 1.4 will apparently improve significantly on the performance aspects:

Geospatial type support

While @chicco785 referenced crate/sqlalchemy-cratedb#118 at #315 (comment) the other day, crate-python[sqlalchemy]>=0.25.0 supports the geo_point and geo_shape data types [3,4] within its SQLAlchemy dialect already, thanks to crate/crate-python#360 by @autophagy.

[1] https://docs.sqlalchemy.org/en/13/core/engines.html
[2] https://gist.github.com/amotl/575aae129beb17b120b848e7632365dc#file-cratedb-parallel-py-L56-L61
[3] https://crate.io/docs/crate/reference/en/4.3/general/ddl/data-types.html#geo-point
[4] https://crate.io/docs/crate/reference/en/4.3/general/ddl/data-types.html#geo-shape

@c0c0n3
Copy link
Member Author

c0c0n3 commented Dec 4, 2020

@amotl thank you so much for the update! sorry i missed crate/crate-python#391. you're giving us a lot of very valuable feedback and advice, we're really grateful for that. also im very excited about recent development on the sql alchemy front, good stuff!!

@github-actions
Copy link
Contributor

Stale issue message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants