Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: BigQuery Create table with partition_by argument is not working #10022

Open
ruiyang2015 opened this issue Sep 4, 2024 · 7 comments
Open
Labels
bigquery The BigQuery backend bug Incorrect behavior inside of ibis

Comments

@ruiyang2015
Copy link

What happened?

I am trying to use the BigQuery backend to create a table with a partition by expression, ibis is raising exception.
here is a simple code to reproduce the error:

import ibis
c = ibis.bigquery.connect(...)
schema = ibis.schema([ ("c1", "date"), ("c2", "int")])
c.create_table('test_tb', partition_by='DATE_TRUNC(c1, MONTH)') # -> this line is throwing exception 

What version of ibis are you using?

9.4.0

What backend(s) are you using, if any?

BigQuery

Relevant log output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ruiyang/Go/src/github.com/ascend-io/ascend-core/.venv/lib/python3.11/site-packages/ibis/backends/bigquery/__init__.py", line 1007, in create_table
    raise com.IbisError("One of the `schema` or `obj` parameter is required")
ibis.common.exceptions.IbisError: One of the `schema` or `obj` parameter is required
>>> c.create_table('test_tb', partition_by='DATE_TRUNC(c1, MONTH)', schema=schema)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ruiyang/Go/src/github.com/ascend-io/ascend-core/.venv/lib/python3.11/site-packages/ibis/backends/bigquery/__init__.py", line 1099, in create_table
    self.raw_sql(sql)
  File "/Users/ruiyang/Go/src/github.com/ascend-io/ascend-core/.venv/lib/python3.11/site-packages/ibis/backends/bigquery/__init__.py", line 710, in raw_sql
    return self.client.query_and_wait(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ruiyang/Go/src/github.com/ascend-io/ascend-core/.venv/lib/python3.11/site-packages/google/cloud/bigquery/client.py", line 3601, in query_and_wait
    return _job_helpers.query_and_wait(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ruiyang/Go/src/github.com/ascend-io/ascend-core/.venv/lib/python3.11/site-packages/google/cloud/bigquery/_job_helpers.py", line 414, in query_and_wait
    return _wait_or_cancel(
           ^^^^^^^^^^^^^^^^
  File "/Users/ruiyang/Go/src/github.com/ascend-io/ascend-core/.venv/lib/python3.11/site-packages/google/cloud/bigquery/_job_helpers.py", line 562, in _wait_or_cancel
    return job.result(
           ^^^^^^^^^^^
  File "/Users/ruiyang/Go/src/github.com/ascend-io/ascend-core/.venv/lib/python3.11/site-packages/google/cloud/bigquery/job/query.py", line 1676, in result
    while not is_job_done():
              ^^^^^^^^^^^^^
  File "/Users/ruiyang/Go/src/github.com/ascend-io/ascend-core/.venv/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 293, in retry_wrapped_func
    return retry_target(
           ^^^^^^^^^^^^^
  File "/Users/ruiyang/Go/src/github.com/ascend-io/ascend-core/.venv/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 153, in retry_target
    _retry_error_helper(
  File "/Users/ruiyang/Go/src/github.com/ascend-io/ascend-core/.venv/lib/python3.11/site-packages/google/api_core/retry/retry_base.py", line 212, in _retry_error_helper
    raise final_exc from source_exc
  File "/Users/ruiyang/Go/src/github.com/ascend-io/ascend-core/.venv/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 144, in retry_target
    result = target()
             ^^^^^^^^
  File "/Users/ruiyang/Go/src/github.com/ascend-io/ascend-core/.venv/lib/python3.11/site-packages/google/cloud/bigquery/job/query.py", line 1625, in is_job_done
    raise job_failed_exception
google.api_core.exceptions.BadRequest: 400 Unrecognized name: D at [1:111]; reason: invalidQuery, location: query, message: Unrecognized name: D at [1:111]


### Code of Conduct

- [X] I agree to follow this project's Code of Conduct
@ruiyang2015 ruiyang2015 added the bug Incorrect behavior inside of ibis label Sep 4, 2024
@cpcloud
Copy link
Member

cpcloud commented Sep 9, 2024

It looks like we're not handling the single partition case here. Can you try putting the partition expression in a list, like ["DATE_TRUNC(c1, MONTH)"]?

Long term, we'll continue to support strings here, so the current implementation probably doesn't need to be adjusted except to handle the single-string input case.

We could perhaps discuss supporting deferreds here.

@cpcloud cpcloud added the bigquery The BigQuery backend label Sep 9, 2024
@ruiyang2015
Copy link
Author

It looks like we're not handling the single partition case here. Can you try putting the partition expression in a list, like ["DATE_TRUNC(c1, MONTH)"]?

Long term, we'll continue to support strings here, so the current implementation probably doesn't need to be adjusted except to handle the single-string input case.

We could perhaps discuss supporting deferreds here.

i tried to pass in as a list, also getting error, looks like this feature is not working.

@gforsyth
Copy link
Member

gforsyth commented Sep 9, 2024

Ok, yeah, this looks like something isn't getting formatted properly (I think) -- because this works to create an empty table:

[nav] In [6]: con.raw_sql(f"""CREATE TABLE
         ...:   {con.dataset}.newtable (transaction_id INT64, transaction_date DATE)
         ...: PARTITION BY
         ...:   DATE_TRUNC(transaction_date, MONTH)""")
Out[6]: <google.cloud.bigquery.table._EmptyRowIterator at 0x78220076a9b0>

@gforsyth
Copy link
Member

gforsyth commented Sep 9, 2024

Ok, this fails because we call sg.to_identifier on the inputs to partition_by which leaves us with this SQL:

CREATE TABLE `ibis-gbq`.`_92ff9aec474dffb9d08ce7a2d670d6e934152e1b`.`hi` (`c1` DATE, `c2` INT64) PARTITION BY (`DATE_TRUNC(c1, MONTH)`)

The above would work if we didn't have the function quoted, but unclear to me how we want to handle denoting which arguments to partition_by are identifiers and which aren't.

@cpcloud
Copy link
Member

cpcloud commented Oct 23, 2024

I haven't started working on this yet, but I should be able to get to it for the next release (10.0).

@everdark
Copy link

everdark commented Oct 23, 2024

I realized pass in list works indeed.
So for example:

conn.create_table("name", table, partition_by=["refresh_date"])

But may not work with function like DATE_TRUNC in the expression, as shown above.

@cpcloud
Copy link
Member

cpcloud commented Oct 23, 2024

Yep, the thing that doesn't work (reported in the OP) is passing a more complex expression for partitioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery The BigQuery backend bug Incorrect behavior inside of ibis
Projects
Status: backlog
Development

No branches or pull requests

4 participants