Add support for CREATE TABLE AS #130

JelteF · 2024-08-16T12:38:52Z

This adds support for queries like:

CREATE TABLE mytable AS SELECT column00 from read_csv("my_data.csv") AS (column00 int)

The table will be then be created in Postgres (not in DuckDB). The main
change that supports this is that we now start "deparsing" the parsed
Query type from Postgres to an actual query string, instead of sending
the original string that Postgres received to DuckDB.

This also accidentally fixes some differences such as the DuckDB
count(*) aggregate creating a count_star vs Postgres its count
column. So it reduces the need for #36 a bit. The reason this happens is
because it is deparsed very explicitly like count(*) AS count.

For similar reasons this also changes the behaviour of the following
query to only return the id and name columns, instead of all the
columns. It's arguable whether that's a good thing or not: it's more
verbose if you want those columns but it's also less surprising.

SELECT * FROM read_parquet(...) AS (id int, name text);

One especially surprising thing with the above query is that
if you used a column name that was not in the parquet file,
then your query would succeed, but you would not actually
get that column back in the result (only the actual columns
in the file).

JelteF · 2024-08-16T12:41:17Z

test/regression/expected/views.out

+WARNING:  (DuckDB) Binder Error: Referenced table "t" not found!
+Candidate tables: "s.t"
+LINE 3:    FROM ( SELECT t."?column?"
+                         ^
 ?column? | ?column? 
 ----------+----------
-       21 |       21
+       21 |       42


The previous DuckDB pushdown was actually returning the wrong results here. So the DuckDB failure that this introduces is better than the alternative imho. Because now it warns and automatically falls back to Postgres, instead of returning a wrong result.

Yea I noticed that as well, the postgres catalog extension solved the same problem

JelteF · 2024-08-16T12:55:29Z

test/regression/sql/materialized_view.sql

+CREATE TABLE t(a INT, b VARCHAR);
+INSERT INTO t SELECT g % 1000, MD5(g::VARCHAR) FROM generate_series(1,20000) g;


The SELECT in the CREATE MATERIALIZED VIEW command now also goes through DuckDB. This is a good thing imho, because that means that we can use read_csv and read_parquet in those queries too.

However, on PG16 our type mapping results in the following error:

CREATE MATERIALIZED VIEW tv AS SELECT * FROM t WHERE a % 100 = 0; +ERROR: SELECT rule's target entry 2 has different type from column "b" +DETAIL: SELECT target entry has type text, but column has type character varying.

Surprisingly in PG17 this works just fine.

While suboptimal that we now need to use VARCHAR, I think this is an improvement overall. And probably this means that we need to improve our type mapping a bit and actually return TEXT instead of VARCHAR from queries where the original table uses TEXT.

mkaruza · 2024-08-16T13:42:06Z

src/pgduckdb_hooks.cpp

-		return true;
-	return false;
+	/* We don't support modifying statements yet */
+	if (query->commandType != CMD_SELECT) {


We already check this condition in DuckdbPlannerHook if statement but this is better place to contain this check commandType == SELECT.

I moved a few more checks from the DuckDBPlannerHook to IsAllowedStatement now.

Tishj

Thanks this sounds like a good change to me 👍

The base branch was changed.

This adds support for queries like: ``` CREATE TABLE mytable AS SELECT column00 from read_csv("my_data.csv"); ``` The table will be then be created in Postgres (not in DuckDB). The main change that supports this is that we now start "deparsing" the parsed `Query` type from Postgres to an actual query string, instead of sending the original string that Postgres received to DuckDB. This also accidentally fixes some differences such as the DuckDB `count(*)` aggregate creating a `count_star` vs Postgres its `count` column. So it reduces the need for #36 a bit. The reason this happens is because it is deparsed very explicitly like `count(*) AS count`. For similar reasons this also changes the behaviour of the following query to only return the `id` and `name` columns, instead of all the columns. It's arguable whether that's a good thing or not. ``` SELECT * FROM read_parquet(...) AS (id int, name text); ```

The issue in #119 was already fixed by #130, but this PR adds a test for it. Fixes #119

Postgres contains code in the `src/backend/utils/adt/ruleutils.c` file to build a query string from a `Query *`. We also need to do this to be able to send a query string to DuckDB, and we started using this functionality in #130. Sadly this part of the Postgres code is not extendable at all, and we do need to make some small changes. So this vendors in the files, one for each supported postgres version. This commit does not make any changes yet.

JelteF requested a review from Tishj August 16, 2024 12:38

JelteF commented Aug 16, 2024

View reviewed changes

mkaruza previously approved these changes Aug 16, 2024

View reviewed changes

Tishj previously approved these changes Aug 16, 2024

View reviewed changes

Base automatically changed from disallow-modifying-ctes to main August 16, 2024 14:01

An error occurred while trying to automatically change base from disallow-modifying-ctes to main August 16, 2024 14:01

JelteF added 5 commits August 16, 2024 16:01

Fix materialized view error on PG16

2c8ed3b

Remove duplicate CMD_SELECT check

16244b6

Move more checks to IsAllowedStatement

04ebbf3

Fix condition...

7af2b52

JelteF force-pushed the create-table-as branch from 90f2282 to 7af2b52 Compare August 16, 2024 14:01

mkaruza self-requested a review August 16, 2024 14:02

mkaruza approved these changes Aug 16, 2024

View reviewed changes

JelteF merged commit 8dfa127 into main Aug 16, 2024
3 checks passed

JelteF deleted the create-table-as branch August 16, 2024 14:03

This was referenced Aug 19, 2024

Add test for standard_conforming_string=off #135

Merged

standards_conforming_strings=off confuses duckdb parser #119

Closed

JelteF added a commit that referenced this pull request Aug 20, 2024

Add test for standard_conforming_string=off (#135)

e2cb31e

The issue in #119 was already fixed by #130, but this PR adds a test for it. Fixes #119

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for CREATE TABLE AS #130

Add support for CREATE TABLE AS #130

JelteF commented Aug 16, 2024 •

edited

Loading

JelteF Aug 16, 2024

Tishj Aug 16, 2024

JelteF Aug 16, 2024 •

edited

Loading

mkaruza Aug 16, 2024

JelteF Aug 16, 2024

Tishj left a comment

		CREATE TABLE t(a INT, b VARCHAR);
		INSERT INTO t SELECT g % 1000, MD5(g::VARCHAR) FROM generate_series(1,20000) g;

Add support for CREATE TABLE AS #130

Add support for CREATE TABLE AS #130

Conversation

JelteF commented Aug 16, 2024 • edited Loading

JelteF Aug 16, 2024

Choose a reason for hiding this comment

Tishj Aug 16, 2024

Choose a reason for hiding this comment

JelteF Aug 16, 2024 • edited Loading

Choose a reason for hiding this comment

mkaruza Aug 16, 2024

Choose a reason for hiding this comment

JelteF Aug 16, 2024

Choose a reason for hiding this comment

Tishj left a comment

Choose a reason for hiding this comment

JelteF commented Aug 16, 2024 •

edited

Loading

JelteF Aug 16, 2024 •

edited

Loading