-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Dynamic Table. #725
Conversation
a2c9b3c
to
63d3383
Compare
Hi! some initial review:
When trying to modify dynamic table
Maybe some better error message here? we can check for relisdynamic and provide "cannot directly change dynamic table " with some errdetail, like "DETAIL: DYNAMIC TABLE data is automatically populated from its source query."
td here is dynamic table. This all comes from fact that dynamic table relkind is 'm' (mat. view). I'm not entirely sure if this design is good. Although the implementation is undoubtedly simpler in this manner, it is at least perplexing that changing a dynamic table requires M.V. SQL syntax. Another case here:
Both succeeds. Maybe we should create this relation as In any case, we should add this ALTER pattern to regression tests and have document them in some form. I will take another look later. |
Yes, it's a materialized view actually.
Dynamic Table have all operations from Materialized Views, and it must do. create external table ext1;
drop external table ext1;
drop foreign table ext1; Both will succeed, external table is actually a foreign table. We provide gram sugar for that CREATE/DROP, but not all command or all message infos, if users want to use Materialized Views command, that's no problem.
We follow Snowflake, customers want to have Snowflake Please refer to Snowflake https://docs.snowflake.com/en/user-guide/dynamic-tables-intro and our discussion #706 for more details. |
Request for discuss: #706 (comment) |
Oo, seems you'r big fans of tab-completion. It's ok to add them. |
exmaple: CREATE READABLE EXTERNAL TABLE ext_r(id int)
LOCATION('demoprot://dynamic_table_text_file.txt')
FORMAT 'text';
CREATE EXTERNAL TABLE
\d
List of relations
Schema | Name | Type | Owner | Storage
--------+-------+---------------+---------+---------
public | ext_r | foreign table | gpadmin |
(1 row)
drop external table ext_r;
DROP FOREIGN TABLE
CREATE READABLE EXTERNAL TABLE ext_r(id int)
LOCATION('demoprot://dynamic_table_text_file.txt')
FORMAT 'text';
CREATE EXTERNAL TABLE
\d
List of relations
Schema | Name | Type | Owner | Storage
--------+-------+---------------+---------+---------
public | ext_r | foreign table | gpadmin |
(1 row)
drop foreign table ext_r;
DROP FOREIGN TABLE In summary, we'r doing the similar thing like external table with foreign table, for dynamic tables and materialized views. |
Im not suggesting this, the thing i propose is to change CREATE DYNAMIC TABLE syntax to CREATE DYNAMIC MATERIALIZED VIEW for convenience |
63d3383
to
d289246
Compare
How about we keep CREATE DYNAMIC TABLE? Since snowflake has this feature already and we follow the same term?. |
Will provide DOC soon. |
d289246
to
25faf4a
Compare
Done, see |
With |
Woo..that will make CBDB version grow up quickly, as we have lot's of codes to be open source with catalog changes. Ask @my-ship-it for help about this topic, that should be another discussion outside of this PR. When developers develop codes, they don't need to consider what the version should be, a catalog_change label is necessary if there was. That will help release team to decide version numbers and don't need to sync with authors which PR changed the catalog. |
Yes, it is. |
25faf4a
to
d9cd5e0
Compare
When mirror is promoted, should we launch auto task for Dynamic Table? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
d9cd5e0
to
ef1af11
Compare
Synced with my colleague who is familiar with pg_task extension, and confirmed that pg_task could work on mirrors. |
ef1af11
to
22de9dc
Compare
According to various feedback, I will enable this. |
5a2dab1
to
80bfc73
Compare
f1e78e4
to
7a13989
Compare
Add commit: 7a13989 |
7a13989
to
314c09b
Compare
Rebased and fixed: Bump catversion in dadeb84 |
Dynamic Table is a an auto-refreshing materialized view which could be constructed by base tables, external tables, materialized views and dynamic tables. And it could be used to answer query by AQUMV. As normal tables in CBDB, dynamic tables could also have distribution keys. The purpose of Dynamic Tables is to solve the problem often raised by customers who are big fans of a lakehouse architecture: how can we run queries on external tables as fast as internal tables? CREATE DYNAMIC TABLE: CREATE DYNAMIC TABLE dt0 SCHEDULE '5 * * * *' AS SELECT a, b, sum(c) FROM t1 GROUP BY a, b WITH NO DATA DISTRIBUTED BY(b); CREATE DYNAMIC TABLE \d List of relations Schema | Name | Type | Owner | Storage --------+------+---------------+---------+--------- public | dt0 | dynamic table | gpadmin | heap public | t1 | table | gpadmin | heap (2 rows) CREATE DYNAMIC TABLE xxx AS Query The Query allows any valid SELECT SQL of Materialized Views: from single or multiple relations, base tables, materialized views, and dynamic tables as well, joins, subquery, aggregation, group by and etc. SCHEDULE: A string used to schedule background job which auto-refreshes the dynamic table. We follow the valid string of pg_cron extension which supports linux crontab, refer https://crontab.guru ┌───────────── min (0 - 59) │ ┌────────────── hour (0 - 23) │ │ ┌─────────────── day of month (1 - 31) or last day of the month ($) │ │ │ ┌──────────────── month (1 - 12) │ │ │ │ ┌───────────────── day of week (0 - 6) (0 to 6 are Sunday to │ │ │ │ │ Saturday, or use names; 7 is also Sunday) │ │ │ │ │ │ │ │ │ │ * * * * * You can also use '[1-59] seconds' to schedule a job based on an interval. The example creates a cron job refreshing the dynamic table at minute 5 of each hour. For convenience, SCHEDULE is optional. If user didn't specific it, a default schedule is provided: at every 5th minute. WITH NO DATA: Same as Materialized View, will create an empty Dynamic Table if specified. DISTRIBUTED BY: Same as normal tables in CBDB, Dynamic Tables could support distribution keys as materialized views. Refresh Dynamic Table As seen in pg_task, we put a command to auto-refresh dynamic tables. However, if users want to do a REFRESH manually, exec command REFRESH DYNAMIC TABLE is also supported. REFRESH DYNAMIC TABLE dt0; REFRESH DYNAMIC TABLE Refresh WITH NO DATA Same as Materialized Views, Refresh with no data will truncate the Dynamic Table and make it unpopulated status. REFRESH DYNAMIC TABLE dt0 WITH NO DATA; REFRESH DYNAMIC TABLE Drop Dynamic Table Drop a Dynamic Table will drop its scheduler job automatically. DROP DYNAMIC TABLE dt0; DROP DYNAMIC TABLE Like Materialized Views, Dynamic Tables could be used to answer query too. This is limited by AQUMV. Authored-by: Zhang Mingli [email protected]
Doc for CREATE/DROP/REFRESH DYNAMIC TABLE. Authored-by: Zhang Mingli [email protected]
Use pg_dump to dump Dynamic Tables. Authored-by: Zhang Mingli [email protected]
Add tab-complete of CREATE/DROP/REFRESH DYNAMIC TABLE. Authored-by: Zhang Mingli [email protected]
Add a case of Dynamic Table speeding up query on external tables of lakehouse architecture. Instead of quering on external table, query on dynamic table and compute results automatically. The example is built on techniques of dynamic tables(materialized view with auto refreshing process), ability of materializd view could have external tables and AQUMV (Answer Query Using Materialized Views). CREATE READABLE EXTERNAL TABLE ext_r(id int) LOCATION('demoprot://dynamic_table_text_file.txt') FORMAT 'text'; EXPLAIN(COSTS OFF, VERBOSE) SELECT sum(id) FROM ext_r where id > 5; QUERY PLAN -------------------------------------------------------------- Finalize Aggregate Output: sum(id) -> Gather Motion 3:1 (slice1; segments: 3) Output: (PARTIAL sum(id)) -> Partial Aggregate Output: PARTIAL sum(id) -> Foreign Scan on dynamic_table_schema.ext_r Output: id Filter: (ext_r.id > 5) CREATE DYNAMIC TABLE dt_external AS SELECT * FROM ext_r where id > 5; ANALYZE dt_external; SET optimizer = OFF; SET LOCAL enable_answer_query_using_materialized_views = ON; SET LOCAL aqumv_allow_foreign_table = ON; EXPLAIN(COSTS OFF, VERBOSE) SELECT sum(id) FROM ext_r where id > 5; QUERY PLAN --------------------------------------------------------------- Finalize Aggregate Output: sum(id) -> Gather Motion 3:1 (slice1; segments: 3) Output: (PARTIAL sum(id)) -> Partial Aggregate Output: PARTIAL sum(id) -> Seq Scan on dynamic_table_schema.dt_external Output: id Settings: enable_answer_query_using_materialized_views = 'on', optimizer = 'off' Optimizer: Postgres query optimizer (10 rows) Authored-by: Zhang Mingli [email protected]
Add function to get the SCHEDULE info of job in pg_task go hand in hand with a Dynamic Table. Authored-by: Zhang Mingli [email protected]
7516c41
to
75fbfb4
Compare
Dynamic Table's SCHEDULE clause is stored in pg_task jobs. Add it when a Dynamic Table is dumped. Since the SCHEDULE clause is optional, there would be no error if we forget it when dump a Dynamic Table info. And a default SCHEDULE is added with the value of Macro: DYNAMIC_TABLE_DEFAULT_REFRESH_INTERVAL Authored-by: Zhang Mingli [email protected]
75fbfb4
to
b215893
Compare
Update: two commits added to resolve the missed SCHEDULE clause when pg_dump DYNAMIC TABLE. b215893 Add it when a Dynamic Table is dumped. Dynamic Table's SCHEDULE clause is stored in pg_task jobs. 17c9fed provides a function pg_get_dynamic_table_schedule() to get SCHEDULE info for pg_dump. |
Dynamic Table is a an auto-refreshing materialized view which could be constructed by base tables, external tables, materialized views and dynamic tables.
And it could be used to answer query by AQUMV.
As normal tables in CBDB, dynamic tables could also have distribution keys.
The purpose of Dynamic Tables is to solve the problem often raised by customers who are big fans of a lakehouse architecture: how can we run queries on external tables as fast as internal tables?
See details in discussion #706 and more cases in tests.
Create Dynamic Table:
CREATE DYNAMIC TABLE xxx AS
Query
The
Query
allows any valid SELECT SQL of Materialized Views: from single or multiple relations, base tables, materialized views, and dynamic tables as well, joins, subquery, aggregation, group by and etc.However, if you want to use it to Answer Query, that is limited by AQUMV: currently we allow Select from single base table, aggregation on it or aggregation SQL replace directly #705
SCHEDULE:
A string used to schedule background job which auto-refreshes the dynamic table.
We follow the valid string of pg_cron extension which supports linux crontab, refer https://crontab.guru/ .
You can also use '[1-59] seconds' to schedule a job based on an interval.
The example creates a cron job refreshing the dynamic table at minute 5 of each hour.
User don't need to consider the auto-refresh job, however query on pg_task catalog if we want to see the task:
And a function pg_get_dynamic_table_schedule is provided for users to see the SCHEDULE info easily:
As Snowflake, Dynamic Tables should always have a auto-refresh process.
However, for convenience, I make SCHEDULE optional. If user didn't specific it, a default schedule is provided: maybe at every 5th minute(snowflake limit at most 5 minutes for dynamic table auto-refresh, not sure)?
WITH NO DATA:
Same as Materialized View, will create an empty Dynamic Table if specified.
DISTRIBUTED BY:
Same as normal tables in CBDB, Dynamic Tables could support distribution keys as materialized views.
Use \d+ to see the distribution keys and the Query SQL of Dynamic Tables.
Refresh Dynamic Table
As seen in pg_task, we put a command to auto-refresh dynamic tables.
However, if users want to do a REFRESH manually, exec command
REFRESH DYNAMIC TABLE
is also supported.REFRESH WITH NO DATA;
Same as Materialized Views, Refresh with no data will truncate the Dynamic Table and make it unpopulated status.
Drop Dynamic Table:
Drop a Dynamic Table will drop its scheduler job automatically.
Privileges
Same as Materialized Views in CBDB:
Use Dynamic Tables to answer query
Like Materialized Views, Dynamic Tables could be used to answer query too:
Authored-by: Zhang Mingli [email protected]
Fixes #ISSUE_Number
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheck
make -C src/test installcheck-cbdb-parallel
Impact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context