Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support adbc APIs in DB-API package #2015

Open
tswast opened this issue Sep 6, 2024 · 3 comments
Open

support adbc APIs in DB-API package #2015

tswast opened this issue Sep 6, 2024 · 3 comments
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@tswast
Copy link
Contributor

tswast commented Sep 6, 2024

Is your feature request related to a problem? Please describe.

Getting data out of the DB-API can be slow for large results, especially if the desired format is Arrow, as is the case in polars (pola-rs/polars#18547, pola-rs/polars#17326)

Describe the solution you'd like

I'd love if our DB-API provided the custom Python methods described in https://arrow.apache.org/adbc/current/python/quickstart.html, such as fetch_arrow_table(), adbc_ingest(), adbc_get_info(), and adbc_get_table_schema (). See: https://arrow.apache.org/adbc/current/python/api/adbc_driver_manager.html#adbc_driver_manager.dbapi.Connection and https://arrow.apache.org/adbc/current/python/api/adbc_driver_manager.html#adbc_driver_manager.dbapi.Cursor

Describe alternatives you've considered

A (better?) alternative would be for BigQuery to provide a real C/C++ ADBC driver, which could then be automatically usable from the Python adbc_driver_manager package.

Additional context

I'm not expecting fast movement here, as the ADBC standard/community still feels pretty early days but figured it was worth filing for visibility given that polars is integrated with it.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Sep 6, 2024
@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Sep 6, 2024
@henryharbeck
Copy link

A (better?) alternative would be for BigQuery to provide a real C/C++ ADBC driver, which could then be automatically usable from the Python adbc_driver_manager package.

I'm unsure if you're aware, but it looks like BigQuery does have ADBC drivers implemented in C# and Go. Python wheels were included for the first time in the recent release.

There is still a fair bit to be desired (including a page in the docs), with a to do list outlined here.

Also, according to the writing new drivers page:

Currently, new drivers can be written in C#, C/C++, Go, and Java. A driver written in C/C++ or Go can be used from either of those languages, as well as C#, Python, R, and Ruby. (C# can experimentally export drivers to the same set of languges as well.)

I haven't installed or tested anything - just observations. Just figured I would give a heads up on it.

Cheers

@tswast
Copy link
Contributor Author

tswast commented Jan 3, 2025

I was not aware. That's awesome news.

@henryharbeck
Copy link

Here is the PyPi package - https://pypi.org/project/adbc-driver-bigquery/

After some tinkering with the low-level API I've also figured out how to read data directly into Polars without PyArrow or the BigQuery client libraries.
This is thanks to the Arrow PyCapsule Interface, implemented by Polars and the ADBC driver manger (specifically on the ArrowArrayStreamHandle returned by stmt.execute_query()

import adbc_driver_bigquery
import adbc_driver_manager
import polars as pl

PROJECT = "my-project"
db_kwargs = {adbc_driver_bigquery.DatabaseOptions.PROJECT_ID.value: PROJECT}
query = f"SELECT * FROM `{PROJECT}.testing.test` LIMIT 10"

with (
    adbc_driver_bigquery.connect(db_kwargs) as db,
    adbc_driver_manager.AdbcConnection(db) as conn,
    adbc_driver_manager.AdbcStatement(conn) as stmt,
):
    stmt.set_sql_query(query)
    stream_handle, rows = stmt.execute_query()

    df = pl.DataFrame(stream_handle)
    print(df)
    print(f"{rows = }")

# shape: (1, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1   │
# └─────┘
# rows = 1
$ pip list
Package              Version
-------------------- -------
adbc-driver-bigquery 1.3.0
adbc-driver-manager  1.3.0
importlib_resources  6.5.2
pip                  24.3.1
polars               1.19.0
setuptools           65.5.0
typing_extensions    4.12.2

The DBAPI (which is where adbc_ingest also is) currently requires PyArrow. I think I'll ask the ADBC people if this can be relaxed/removed at all. adbc_ingest also implements the Arrow PyCapsule Interface, so it could also be possible to write to BQ without PyArrow (with ADBC)

Perhaps the BigQuery client / BigQuery storage client could also implement Arrow PyCapsule Interface 😉, there is a growing list
Could this be possible for methods that already return results in the Arrow memory format? (like google.cloud.bigquery.table.RowIterator.to_arrow)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

3 participants