fix(python): Ensure that `read_database` takes advantage of Arrow return from a `duckdb_engine` connection when using a SQLAlchemy `Selectable` #19255

alexander-beedie · 2024-10-16T06:40:08Z

Closes #19221.

Improves read_database integration with DuckDB, when using a SQLAlchemy-based duckdb_engine connection.

Previously we accessed the raw connection directly in order to take advantage of native Arrow return, but this didn't work when using a SQLAlchemy Selectable as the query.

Now we handle this case properly, as it turns out that duckdb_engine supports transparent pass-through to the underlying raw connection (via __getitem__ on the Cursor), allowing us to take advantage of DuckDB's Arrow return in all cases, for maximum performance, without having to interact with the raw connection ourselves.

Example

Setup (assuming duckdb_engine installed):

import polars as pl
from sqlalchemy import MetaData, Table, create_engine, select

df_test_data = pl.DataFrame({
    "key": ["aa", "bb", None, "cc", "dd"], 
    "value": range(5),
})
engine = create_engine("duckdb:///:memory:")
metadata = MetaData()
table_name = "tbl"

with engine.connect() as conn:
    df_test_data.write_database(table_name, connection=conn)
    conn.commit()
    
metadata.reflect(engine)
table = Table(table_name, metadata)

Provide read_database query as a SQLAlchemy Selectable object:
(previously this raised an error - now it returns data to Polars via Arrow)

with engine.connect() as conn:
    stmt = select(table).where(table.c.key.is_not(None))
    df = pl.read_database(stmt, connection=conn)
    
    # shape: (4, 2)
    # ┌─────┬───────┐
    # │ key ┆ value │
    # │ --- ┆ ---   │
    # │ str ┆ i64   │
    # ╞═════╪═══════╡
    # │ aa  ┆ 0     │
    # │ bb  ┆ 1     │
    # │ cc  ┆ 3     │
    # │ dd  ┆ 4     │
    # └─────┴───────┘

…urn from a `duckdb_engine` connection when using a SQLAlchemy `Selectable`

codecov · 2024-10-16T07:17:37Z

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 80.02%. Comparing base (d89fdcd) to head (2e207ae).
Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
py-polars/polars/io/database/_executor.py	0.00%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #19255      +/-   ##
==========================================
- Coverage   80.04%   80.02%   -0.02%     
==========================================
  Files        1528     1528              
  Lines      209564   209566       +2     
  Branches     2415     2416       +1     
==========================================
- Hits       167741   167715      -26     
- Misses      41275    41302      +27     
- Partials      548      549       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fix(python): Ensure that read_database takes advantage of Arrow ret…

89ca378

…urn from a `duckdb_engine` connection when using a SQLAlchemy `Selectable`

alexander-beedie requested review from ritchie46, c-peters, MarcoGorelli and reswqa as code owners October 16, 2024 06:40

github-actions bot added fix Bug fix python Related to Python Polars labels Oct 16, 2024

alexander-beedie added the A-io-database Area: reading/writing to databases label Oct 16, 2024

fix line-length (lint)

2e207ae

ritchie46 merged commit 20ead46 into pola-rs:main Oct 16, 2024
22 checks passed

alexander-beedie deleted the duckdb-arrow-selectable branch October 16, 2024 08:29

alexander-beedie mentioned this pull request Oct 17, 2024

fix(python): Make the SQLAlchemy connection check more robust #19270

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(python): Ensure that `read_database` takes advantage of Arrow return from a `duckdb_engine` connection when using a SQLAlchemy `Selectable` #19255

fix(python): Ensure that `read_database` takes advantage of Arrow return from a `duckdb_engine` connection when using a SQLAlchemy `Selectable` #19255

alexander-beedie commented Oct 16, 2024 •

edited

Loading

codecov bot commented Oct 16, 2024 •

edited

Loading

fix(python): Ensure that read_database takes advantage of Arrow return from a duckdb_engine connection when using a SQLAlchemy Selectable #19255

fix(python): Ensure that read_database takes advantage of Arrow return from a duckdb_engine connection when using a SQLAlchemy Selectable #19255

Conversation

alexander-beedie commented Oct 16, 2024 • edited Loading

Example

codecov bot commented Oct 16, 2024 • edited Loading

Codecov Report

fix(python): Ensure that `read_database` takes advantage of Arrow return from a `duckdb_engine` connection when using a SQLAlchemy `Selectable` #19255

fix(python): Ensure that `read_database` takes advantage of Arrow return from a `duckdb_engine` connection when using a SQLAlchemy `Selectable` #19255

alexander-beedie commented Oct 16, 2024 •

edited

Loading

codecov bot commented Oct 16, 2024 •

edited

Loading