You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am writing functions in PL/Python in Greenplum (GP) with three main steps:
Using DuckDB to read GP tables.
Calculating a result table, "rst," in DuckDB, extracting the DDL of the table, and creating the table in GP (need to add distribution keys, compression, etc., which default CREATE does not include,use plpy.execute(ddl)).
tb="gto_skc_store_step_kpi_summary_duck"sql="""select sql from duckdb_tables() where table_name like 'tmp_rst' and schema_name = 'main';"""ddl=dd.execute(sql).fetchone()[0]
ddl=f"""drop table if exists tenant_{tenant}_adm.{tb};"""+ddl.replace('CREATE TABLE tmp_rst', f'CREATE TABLE tenant_{tenant}_adm.{tb}') \
.replace('DOUBLE', 'numeric').replace(';',"""WITH ( appendonly=true, orientation=column, compresslevel=6, compresstype=zlib)DISTRIBUTED BY (skc_sk);""")
plpy.notice(ddl)
plpy.execute(ddl)
Writing the "rst" result table to GP (using COPY or INSERT).
insert intogp.tenant_{tenant}_adm.{tb}
select*from tmp_rst
-- or
COPY gp.tenant_{tenant}_adm.gto_skc_store_step_kpi_summary_duckFROM'/home/gpadmin/{tb}.parquet';
The third step never completes, and checking pg_stat_activity shows a locked state.
However, if split it into two functions, the first completes the first two steps, and write the result to a Parquet file, and the second reads the Parquet file, writing to GP, finishing in just tens of seconds.
Why?
pg_stat_activity:
and pg_locks in following excel: pg_locks.xlsx
To Reproduce
importduckdbasddsql=f"""COPY gp.tenant_{tenant}_adm.gto_skc_store_step_kpi_summary_duck FROM '/home/gpadmin/{tb}.parquet';"""dd.execute(sql)
OS:
centos7
DuckDB Version:
0.10.1
DuckDB Client:
python3.9
Full Name:
wang cz
Affiliation:
Linezone
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
Yes, I have
The text was updated successfully, but these errors were encountered:
-- create extension plpython3u;
CREATE OR REPLACE FUNCTION test_ddl() RETURNS text AS $$
import duckdb
def exesql(sql):
plpy.notice(sql)
dd.execute(sql)
dd=duckdb.connect('gp.duckdb')
dd=duckdb.connect()
dbname="mdmaster_hggp7_dev"
tenant=dbname.split('_')[1]
dburl=f"dbname={dbname} user=gpadmin host=127.0.0.1 port=2345"
sql=f"load postgres;ATTACH '{dburl}' AS gp (TYPE postgres);"
exesql(sql)
sql="drop table if exists tmp_rst;create table tmp_rst(id int,name text);insert into tmp_rst values (1,'a');"
exesql(sql)
sql="""select sql from duckdb_tables() where table_name like 'tmp_rst' and schema_name = 'main';"""
sql="drop table if exists public.tmp_rst;"+(dd.execute(sql).fetchone()[0]).replace('tmp_rst','public.tmp_rst')
plpy.notice(sql)
plpy.execute(sql)
sql=f"detach gp;ATTACH '{dburl}' AS gp (TYPE postgres);"
#sql="CALL pg_clear_cache();"
#sql="call postgres_execute(sql)" #Table Function with name postgres_execute does not exist!
exesql(sql)
sql="insert into gp.public.tmp_rst select * from tmp_rst"
exesql(sql)
return 'done'
$$ LANGUAGE plpython3u;
select test_ddl();
It raise error, no table.
I guess duck got the gp tables at the ATTACH moment, it can't know the new table,
then can't exe insert sql,
but how to get the newest table from gp,
I test CALL pg_clear_cache() or re attatch , both NOT WORK!
What happens?
I am writing functions in PL/Python in Greenplum (GP) with three main steps:
The third step never completes, and checking pg_stat_activity shows a locked state.
![image](https://private-user-images.githubusercontent.com/25274931/321161996-e31aef4d-ab9f-4134-b508-2dd8276cb075.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0MDQ1MTQsIm5iZiI6MTczOTQwNDIxNCwicGF0aCI6Ii8yNTI3NDkzMS8zMjExNjE5OTYtZTMxYWVmNGQtYWI5Zi00MTM0LWI1MDgtMmRkODI3NmNiMDc1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDIzNTAxNFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWVjZjE5MGY4ZjEwMGQyYTI5MDNkZDkxMTIzZDcyOWIzNzEwOGE3NGYyZDYzMTk2ZGYwNGY3OWMzM2QxYzA1ZmYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.crfXwYgcyEZGVOnDG20DNDBHwH2H1XVcXq5AQZApWqA)
However, if split it into two functions, the first completes the first two steps, and write the result to a Parquet file, and the second reads the Parquet file, writing to GP, finishing in just tens of seconds.
Why?
pg_stat_activity:
and pg_locks in following excel:
pg_locks.xlsx
To Reproduce
OS:
centos7
DuckDB Version:
0.10.1
DuckDB Client:
python3.9
Full Name:
wang cz
Affiliation:
Linezone
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
The text was updated successfully, but these errors were encountered: