Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: CursorResult cannot be used to fetch rows of original query after connection is used for other executions #1087

Open
1 task done
tboddyspargo opened this issue Sep 3, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@tboddyspargo
Copy link

What happened?

Situation: I execute a query which will eventually be used to read rows from a table. Then, I execute some queries to get column information from INFORMATION_SCHEMA for that same table (using the same connection that was used for the original CursorResult object). Finally, I read the rows from the cursor, however, unexpectedly there are no longer any rows to be found... If I use the cursor to the the columns and types, all rows are returned as expected. However, this doesn't meet my needs because the DBAPI cursor types are insufficiently granular (See #10495).

I suspect some buffer reader/generator on theCursorResult or its DBAPI cursor object is being "clobbered" by the subsequent queries that are fired so that the original cursor object can no longer be used to fetch the expected rows. Possibly a side-effect of the duckdb "cursor" object being DuckDBPyConnection and not being unique to each execution... All other dialects I've tested have what I'm calling "expected behavior", where the original CursorResult can be used to fetch the intended rows even after other queries are executed within the same connection. Maybe they use unique cursors for each execution?

import sqlalchemy as sa


def setup(conn):
    # Create table and insert data.
    if not conn.execute("SELECT 1 FROM information_schema.tables WHERE table_name = 'test_table'").fetchone():
        conn.execute("CREATE TABLE test_table (col1 INTEGER, col2 FLOAT)")
        conn.execute("INSERT INTO test_table (col1, col2) VALUES (1, 2.0), (3, 4.0), (5, 6.0)")


def metadata_from_query(conn, query: str) -> list:
    conn.execute(f"CREATE TEMPORARY VIEW temp_view AS {query}")
    # get column types from information schema
    res = conn.execute("SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'temp_view'")
    return res.fetchall()


def metadata_from_table(conn, table_name: str) -> list:
    res = conn.execute(
        f"SELECT column_name, data_type FROM information_schema.columns WHERE table_name = '{table_name}'",
    )
    return res.fetchall()


def main():
    engine = sa.create_engine("duckdb:////tmp/test.duckdb")

    with engine.connect() as conn:
        setup(conn)
    sa_query = sa.select(sa.text("*")).select_from(sa.text("test_table"))

    # Test 1 (SUCCESS): Execute the query, access the cursor description, then fetch all rows.
    # EXPECTED: fetching rows from the original CursorResult should return 3 rows.
    # ACTUAL: fetching rows from the original CursorResult returns 3 rows.
    with engine.connect() as conn:
        res = conn.execute(sa_query)
        metadata = res.cursor.description
        print(metadata)
        rows = res.fetchall()
        print(rows)
        assert len(rows) == 3, f"Expected 3 rows, got {len(rows)}"

    # Test 2 (FAIL): Execute the query, create a temporary view (limit 0), query the view for column types, then fetch all rows from the original query
    # EXPECTED: fetching rows from the original CursorResult should return 3 rows.
    # ACTUAL: fetching rows from the original CursorResult returns 0 rows.
    with engine.connect() as conn:
        res = conn.execute(sa_query)
        metadata = metadata_from_query(conn, str(sa_query.limit(0).compile(compile_kwargs={"literal_binds": True})))
        print(metadata)
        rows = res.fetchall()
        print(rows)
        assert len(rows) == 3, f"Expected 3 rows, got {len(rows)}"

    # Test 3 (FAIL): Execute the query, query information_schema for column types, then fetch all rows from the original query
    # EXPECTED: fetching rows from the original CursorResult should return 3 rows.
    # ACTUAL: fetching rows from the original CursorResult returns 0 rows.
    with engine.connect() as conn:
        res = conn.execute(sa_query)
        metadata = metadata_from_table(conn, "test_table")
        print(metadata)
        rows = res.fetchall()
        print(rows)
        assert len(rows) == 3, f"Expected 3 rows, got {len(rows)}"


if __name__ == "__main__":
    main()

DuckDB Engine Version

0.13.1

DuckDB Version

1.0.0

SQLAlchemy Version

1.4.53

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@tboddyspargo tboddyspargo added the bug Something isn't working label Sep 3, 2024
@Mause
Copy link
Owner

Mause commented Sep 4, 2024

DuckDB doesn't really have a concept of cursors at all, hence why the CursorWrapper, etc, exist in the duckdb_engine codebase. If you want to be able to have multiple queries executing simultaneously, you'll need to open a connection to the database per query

Does that help?

@tboddyspargo
Copy link
Author

The challenge for us is that this issue represents a meaningful difference with all other sqlalchemy dialects that we've tested. It forces us to have special handling for duckdb in situations where we'd ideally have consistent behavior (trusting the dialects and DBAPI interfaces to behave similarly). I can certainly appreciate that different DB engines have different capabilities, making it difficult to achieve consistent behavior (and as pointed out in duckdb/duckdb#13705 (comment), the DBAPI spec doesn't clarify expectations on this particular behavioral aspect).

Is there a way that ConnectionWrapper.execute() could return a copy of DuckDBPyConnection that only points to the relevant result set? This way the following two query results could be accessed and materialized separately, using the same sqlalchemy "connection" object, without interfering with each other:

with sa.create_engine("duckdb:///:memory:").connect() as conn:
    res1 = conn.execute("SELECT 1 as col1")
    res2 = conn.execute("SELECT 2 as col2")
    assert a == [(1,)] and b == [(2,)], f"{a=};{b=}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants