-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for CREATE TABLE AS #130
Conversation
WARNING: (DuckDB) Binder Error: Referenced table "t" not found! | ||
Candidate tables: "s.t" | ||
LINE 3: FROM ( SELECT t."?column?" | ||
^ | ||
?column? | ?column? | ||
----------+---------- | ||
21 | 21 | ||
21 | 42 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous DuckDB pushdown was actually returning the wrong results here. So the DuckDB failure that this introduces is better than the alternative imho. Because now it warns and automatically falls back to Postgres, instead of returning a wrong result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I noticed that as well, the postgres catalog extension solved the same problem
CREATE TABLE t(a INT, b VARCHAR); | ||
INSERT INTO t SELECT g % 1000, MD5(g::VARCHAR) FROM generate_series(1,20000) g; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SELECT
in the CREATE MATERIALIZED VIEW
command now also goes through DuckDB. This is a good thing imho, because that means that we can use read_csv
and read_parquet
in those queries too.
However, on PG16 our type mapping results in the following error:
CREATE MATERIALIZED VIEW tv AS SELECT * FROM t WHERE a % 100 = 0;
+ERROR: SELECT rule's target entry 2 has different type from column "b"
+DETAIL: SELECT target entry has type text, but column has type character varying.
Surprisingly in PG17 this works just fine.
While suboptimal that we now need to use VARCHAR, I think this is an improvement overall. And probably this means that we need to improve our type mapping a bit and actually return TEXT
instead of VARCHAR
from queries where the original table uses TEXT
.
return true; | ||
return false; | ||
/* We don't support modifying statements yet */ | ||
if (query->commandType != CMD_SELECT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already check this condition in DuckdbPlannerHook
if statement but this is better place to contain this check commandType == SELECT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved a few more checks from the DuckDBPlannerHook to IsAllowedStatement now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this sounds like a good change to me 👍
The base branch was changed.
This adds support for queries like: ``` CREATE TABLE mytable AS SELECT column00 from read_csv("my_data.csv"); ``` The table will be then be created in Postgres (not in DuckDB). The main change that supports this is that we now start "deparsing" the parsed `Query` type from Postgres to an actual query string, instead of sending the original string that Postgres received to DuckDB. This also accidentally fixes some differences such as the DuckDB `count(*)` aggregate creating a `count_star` vs Postgres its `count` column. So it reduces the need for #36 a bit. The reason this happens is because it is deparsed very explicitly like `count(*) AS count`. For similar reasons this also changes the behaviour of the following query to only return the `id` and `name` columns, instead of all the columns. It's arguable whether that's a good thing or not. ``` SELECT * FROM read_parquet(...) AS (id int, name text); ```
90f2282
to
7af2b52
Compare
Postgres contains code in the `src/backend/utils/adt/ruleutils.c` file to build a query string from a `Query *`. We also need to do this to be able to send a query string to DuckDB, and we started using this functionality in #130. Sadly this part of the Postgres code is not extendable at all, and we do need to make some small changes. So this vendors in the files, one for each supported postgres version. This commit does not make any changes yet.
Postgres contains code in the `src/backend/utils/adt/ruleutils.c` file to build a query string from a `Query *`. We also need to do this to be able to send a query string to DuckDB, and we started using this functionality in #130. Sadly this part of the Postgres code is not extendable at all, and we do need to make some small changes. So this vendors in the files, one for each supported postgres version. This commit does not make any changes yet.
This adds support for queries like:
The table will be then be created in Postgres (not in DuckDB). The main
change that supports this is that we now start "deparsing" the parsed
Query
type from Postgres to an actual query string, instead of sendingthe original string that Postgres received to DuckDB.
This also accidentally fixes some differences such as the DuckDB
count(*)
aggregate creating acount_star
vs Postgres itscount
column. So it reduces the need for #36 a bit. The reason this happens is
because it is deparsed very explicitly like
count(*) AS count
.For similar reasons this also changes the behaviour of the following
query to only return the
id
andname
columns, instead of all thecolumns. It's arguable whether that's a good thing or not: it's more
verbose if you want those columns but it's also less surprising.
One especially surprising thing with the above query is that
if you used a column name that was not in the parquet file,
then your query would succeed, but you would not actually
get that column back in the result (only the actual columns
in the file).