Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): Ensure that read_database takes advantage of Arrow return from a duckdb_engine connection when using a SQLAlchemy Selectable #19255

Merged
merged 2 commits into from
Oct 16, 2024

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Oct 16, 2024

Closes #19221.

Improves read_database integration with DuckDB, when using a SQLAlchemy-based duckdb_engine connection.

Previously we accessed the raw connection directly in order to take advantage of native Arrow return, but this didn't work when using a SQLAlchemy Selectable as the query.

Now we handle this case properly, as it turns out that duckdb_engine supports transparent pass-through to the underlying raw connection (via __getitem__ on the Cursor), allowing us to take advantage of DuckDB's Arrow return in all cases, for maximum performance, without having to interact with the raw connection ourselves.

Example

Setup (assuming duckdb_engine installed):

import polars as pl
from sqlalchemy import MetaData, Table, create_engine, select

df_test_data = pl.DataFrame({
    "key": ["aa", "bb", None, "cc", "dd"], 
    "value": range(5),
})
engine = create_engine("duckdb:///:memory:")
metadata = MetaData()
table_name = "tbl"

with engine.connect() as conn:
    df_test_data.write_database(table_name, connection=conn)
    conn.commit()
    
metadata.reflect(engine)
table = Table(table_name, metadata)

Provide read_database query as a SQLAlchemy Selectable object:
(previously this raised an error - now it returns data to Polars via Arrow)

with engine.connect() as conn:
    stmt = select(table).where(table.c.key.is_not(None))
    df = pl.read_database(stmt, connection=conn)
    
    # shape: (4, 2)
    # ┌─────┬───────┐
    # │ key ┆ value │
    # │ --- ┆ ---   │
    # │ str ┆ i64   │
    # ╞═════╪═══════╡
    # │ aa  ┆ 0     │
    # │ bb  ┆ 1     │
    # │ cc  ┆ 3     │
    # │ dd  ┆ 4     │
    # └─────┴───────┘

…urn from a `duckdb_engine` connection when using a SQLAlchemy `Selectable`
@github-actions github-actions bot added fix Bug fix python Related to Python Polars labels Oct 16, 2024
@alexander-beedie alexander-beedie added the A-io-database Area: reading/writing to databases label Oct 16, 2024
Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 80.02%. Comparing base (d89fdcd) to head (2e207ae).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
py-polars/polars/io/database/_executor.py 0.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #19255      +/-   ##
==========================================
- Coverage   80.04%   80.02%   -0.02%     
==========================================
  Files        1528     1528              
  Lines      209564   209566       +2     
  Branches     2415     2416       +1     
==========================================
- Hits       167741   167715      -26     
- Misses      41275    41302      +27     
- Partials      548      549       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46 ritchie46 merged commit 20ead46 into pola-rs:main Oct 16, 2024
22 checks passed
@alexander-beedie alexander-beedie deleted the duckdb-arrow-selectable branch October 16, 2024 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-database Area: reading/writing to databases fix Bug fix python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_database with sqlalchemy Selectable object does not work when duckdb is the backend database
2 participants