-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support adbc APIs in DB-API package #2015
Comments
I'm unsure if you're aware, but it looks like BigQuery does have ADBC drivers implemented in C# and Go. Python wheels were included for the first time in the recent release. There is still a fair bit to be desired (including a page in the docs), with a to do list outlined here. Also, according to the writing new drivers page:
I haven't installed or tested anything - just observations. Just figured I would give a heads up on it. Cheers |
I was not aware. That's awesome news. |
Here is the PyPi package - https://pypi.org/project/adbc-driver-bigquery/ After some tinkering with the low-level API I've also figured out how to read data directly into Polars without PyArrow or the BigQuery client libraries. import adbc_driver_bigquery
import adbc_driver_manager
import polars as pl
PROJECT = "my-project"
db_kwargs = {adbc_driver_bigquery.DatabaseOptions.PROJECT_ID.value: PROJECT}
query = f"SELECT * FROM `{PROJECT}.testing.test` LIMIT 10"
with (
adbc_driver_bigquery.connect(db_kwargs) as db,
adbc_driver_manager.AdbcConnection(db) as conn,
adbc_driver_manager.AdbcStatement(conn) as stmt,
):
stmt.set_sql_query(query)
stream_handle, rows = stmt.execute_query()
df = pl.DataFrame(stream_handle)
print(df)
print(f"{rows = }")
# shape: (1, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1 │
# └─────┘
# rows = 1
The DBAPI (which is where Perhaps the BigQuery client / BigQuery storage client could also implement Arrow PyCapsule Interface 😉, there is a growing list |
Is your feature request related to a problem? Please describe.
Getting data out of the DB-API can be slow for large results, especially if the desired format is Arrow, as is the case in polars (pola-rs/polars#18547, pola-rs/polars#17326)
Describe the solution you'd like
I'd love if our DB-API provided the custom Python methods described in https://arrow.apache.org/adbc/current/python/quickstart.html, such as
fetch_arrow_table()
,adbc_ingest()
,adbc_get_info()
, andadbc_get_table_schema ()
. See: https://arrow.apache.org/adbc/current/python/api/adbc_driver_manager.html#adbc_driver_manager.dbapi.Connection and https://arrow.apache.org/adbc/current/python/api/adbc_driver_manager.html#adbc_driver_manager.dbapi.CursorDescribe alternatives you've considered
A (better?) alternative would be for BigQuery to provide a real C/C++ ADBC driver, which could then be automatically usable from the Python
adbc_driver_manager
package.Additional context
I'm not expecting fast movement here, as the ADBC standard/community still feels pretty early days but figured it was worth filing for visibility given that polars is integrated with it.
The text was updated successfully, but these errors were encountered: