[Bug]: CursorResult cannot be used to fetch rows of original query after connection is used for other executions #1087

tboddyspargo · 2024-09-03T13:39:02Z

What happened?

Situation: I execute a query which will eventually be used to read rows from a table. Then, I execute some queries to get column information from INFORMATION_SCHEMA for that same table (using the same connection that was used for the original CursorResult object). Finally, I read the rows from the cursor, however, unexpectedly there are no longer any rows to be found... If I use the cursor to the the columns and types, all rows are returned as expected. However, this doesn't meet my needs because the DBAPI cursor types are insufficiently granular (See #10495).

I suspect some buffer reader/generator on theCursorResult or its DBAPI cursor object is being "clobbered" by the subsequent queries that are fired so that the original cursor object can no longer be used to fetch the expected rows. Possibly a side-effect of the duckdb "cursor" object being DuckDBPyConnection and not being unique to each execution... All other dialects I've tested have what I'm calling "expected behavior", where the original CursorResult can be used to fetch the intended rows even after other queries are executed within the same connection. Maybe they use unique cursors for each execution?

import sqlalchemy as sa


def setup(conn):
    # Create table and insert data.
    if not conn.execute("SELECT 1 FROM information_schema.tables WHERE table_name = 'test_table'").fetchone():
        conn.execute("CREATE TABLE test_table (col1 INTEGER, col2 FLOAT)")
        conn.execute("INSERT INTO test_table (col1, col2) VALUES (1, 2.0), (3, 4.0), (5, 6.0)")


def metadata_from_query(conn, query: str) -> list:
    conn.execute(f"CREATE TEMPORARY VIEW temp_view AS {query}")
    # get column types from information schema
    res = conn.execute("SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'temp_view'")
    return res.fetchall()


def metadata_from_table(conn, table_name: str) -> list:
    res = conn.execute(
        f"SELECT column_name, data_type FROM information_schema.columns WHERE table_name = '{table_name}'",
    )
    return res.fetchall()


def main():
    engine = sa.create_engine("duckdb:////tmp/test.duckdb")

    with engine.connect() as conn:
        setup(conn)
    sa_query = sa.select(sa.text("*")).select_from(sa.text("test_table"))

    # Test 1 (SUCCESS): Execute the query, access the cursor description, then fetch all rows.
    # EXPECTED: fetching rows from the original CursorResult should return 3 rows.
    # ACTUAL: fetching rows from the original CursorResult returns 3 rows.
    with engine.connect() as conn:
        res = conn.execute(sa_query)
        metadata = res.cursor.description
        print(metadata)
        rows = res.fetchall()
        print(rows)
        assert len(rows) == 3, f"Expected 3 rows, got {len(rows)}"

    # Test 2 (FAIL): Execute the query, create a temporary view (limit 0), query the view for column types, then fetch all rows from the original query
    # EXPECTED: fetching rows from the original CursorResult should return 3 rows.
    # ACTUAL: fetching rows from the original CursorResult returns 0 rows.
    with engine.connect() as conn:
        res = conn.execute(sa_query)
        metadata = metadata_from_query(conn, str(sa_query.limit(0).compile(compile_kwargs={"literal_binds": True})))
        print(metadata)
        rows = res.fetchall()
        print(rows)
        assert len(rows) == 3, f"Expected 3 rows, got {len(rows)}"

    # Test 3 (FAIL): Execute the query, query information_schema for column types, then fetch all rows from the original query
    # EXPECTED: fetching rows from the original CursorResult should return 3 rows.
    # ACTUAL: fetching rows from the original CursorResult returns 0 rows.
    with engine.connect() as conn:
        res = conn.execute(sa_query)
        metadata = metadata_from_table(conn, "test_table")
        print(metadata)
        rows = res.fetchall()
        print(rows)
        assert len(rows) == 3, f"Expected 3 rows, got {len(rows)}"


if __name__ == "__main__":
    main()

DuckDB Engine Version

0.13.1

DuckDB Version

1.0.0

SQLAlchemy Version

1.4.53

Relevant log output

No response

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

Mause · 2024-09-04T10:34:51Z

DuckDB doesn't really have a concept of cursors at all, hence why the CursorWrapper, etc, exist in the duckdb_engine codebase. If you want to be able to have multiple queries executing simultaneously, you'll need to open a connection to the database per query

Does that help?

tboddyspargo · 2024-09-04T14:46:05Z

The challenge for us is that this issue represents a meaningful difference with all other sqlalchemy dialects that we've tested. It forces us to have special handling for duckdb in situations where we'd ideally have consistent behavior (trusting the dialects and DBAPI interfaces to behave similarly). I can certainly appreciate that different DB engines have different capabilities, making it difficult to achieve consistent behavior (and as pointed out in duckdb/duckdb#13705 (comment), the DBAPI spec doesn't clarify expectations on this particular behavioral aspect).

Is there a way that ConnectionWrapper.execute() could return a copy of DuckDBPyConnection that only points to the relevant result set? This way the following two query results could be accessed and materialized separately, using the same sqlalchemy "connection" object, without interfering with each other:

with sa.create_engine("duckdb:///:memory:").connect() as conn:
    res1 = conn.execute("SELECT 1 as col1")
    res2 = conn.execute("SELECT 2 as col2")
    assert a == [(1,)] and b == [(2,)], f"{a=};{b=}"

tboddyspargo added the bug Something isn't working label Sep 3, 2024

tboddyspargo assigned Mause Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: CursorResult cannot be used to fetch rows of original query after connection is used for other executions #1087

[Bug]: CursorResult cannot be used to fetch rows of original query after connection is used for other executions #1087

tboddyspargo commented Sep 3, 2024

Mause commented Sep 4, 2024

tboddyspargo commented Sep 4, 2024

[Bug]: CursorResult cannot be used to fetch rows of original query after connection is used for other executions #1087

[Bug]: CursorResult cannot be used to fetch rows of original query after connection is used for other executions #1087

Comments

tboddyspargo commented Sep 3, 2024

What happened?

DuckDB Engine Version

DuckDB Version

SQLAlchemy Version

Relevant log output

Code of Conduct

Mause commented Sep 4, 2024

tboddyspargo commented Sep 4, 2024