Cherry-pick commits from GPDB, for psycopg2 #571

reshke · 2024-08-13T07:54:12Z

The main goal is to upgrade python management scripts to use psycopg2.
In GPDB, this was done by b5920e061b4a347523899838f6027c947107306e

To achieve this, I have actually cherry-picked more commits from [1]. It was done to reduce conflicts-resolution job pain.

This is WIP PR to get initial feedback. I need to double check my work (if these is any conflict-resolving related bugs), but for our deployment this works fine.

[1] https://github.com/greenplum-db/gpdb-archive/commits/main/gpMgmt/bin/gpexpand

CLAassistant · 2024-08-13T07:54:20Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 3 committers have signed the CLA.

❌ Annu149
❌ shrakesh
❌ kainwen
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

edespino · 2024-09-19T08:42:50Z

@reshke I believe this is a supporting PR that you may want to consider for this work: greenplum-db/gpdb-archive@6f9d85b

jiaqizho · 2024-09-25T01:53:51Z

Fix the CI pls :)

Issue: when one or more cluster is down or not in their preferred role, the assignment of port on expanded nodes are going wrong. the reason behind that the calculation of port using function get_max_primary_port() and get_max_mirror_port(). one of the suggested fix is to check if clusters are in good state as soon as we run gpexpand. this will also avoid any further issues which can be cause due to nodes are not in their preferred role. Added feature test for bot the scenario when either of node is down or they are not in preferred role.

This reverts commit 9374dac019f02eeb2c1565b67c7e9de1fdc3afdf.

Issue: when one or more cluster is down or not in their preferred role, the assignment of port on expanded nodes are going wrong. the reason behind that the calculation of port using function get_max_primary_port() and get_max_mirror_port(). one of the suggested fix is to check if clusters are in good state as soon as we run gpexpand. this will also avoid any further issues which can be cause due to nodes are not in their preferred role. Added feature test for bot the scenario when either of node is down or they are not in preferred role.

…te (#13822) Why : In previous PR 13757 where the plan was to check the health of the cluster before starting the expansion in gpexpand. while backporting in 6x we realised that hard exit after the health check will create problem for the customer when due to some reason they will not able to make the cluster up in balance state. this will block customer to expand. Issue : we found the root cause as when the cluster is not in balance state then the max/min port is returning wrong port values and as result system is expanding at the wrong port which is giving multi port error. Fix: updated the function to work in imbalance cluster state scenario. we are collecting all the port for primary and mirror based on preferred role of the segment, and if the role is switched we are considering segmentPairs.mirrorDB for primary port and segmentPairs.primaryDB for mirror port. in this we are collecting the port which is based on the preferred role. another change in the pr is, instead of error exit in imbalance scenario we are just logging the warning. Added unit test cases for the positive/negative and exception scenario for the newly added functions. Updated feature test case where we are converting error to warning.

During gpexpand, it will record all the tables that need to do data redistribution and in order to show the progress the table size is also saved in the status_details table. Previously, we directly invoke pg_relation_size to compute that value in the target list of a SQL involving only catalog tables. This will lead to very bad performance because for each table it will do a dispatch to compute its size and the whole progress is done in serial. This commit rewrite the SQL generating the status_details information. The skills used to MPP compute table size is: 1. we use gp_dist_random('pg_class') to get a force random RangeTable 2. since pg_relations_size is volatile, wrap the computation in a subquery before group-by to prevent optimizer to pull up subquery and force the pg_relation_size is evaluated before motion then sum it together 3. join with the main SQL We add test cases in regress/subselect_gp and regress/db_size_functions to guard the above SQL skill. We have done local test to show the performance gain. In a local 8*8 64-segments cluster, we create 100 partition tables, each partition table contains 2 middle level partitions, each middle level partition contains 365 leaf partitions. Then in this database, we compare the speed of the old populate SQL (evaluate pg_relation_size on QD via dispatching, a tuple a dispatch) and the new SQL (a single SQL and only need to dispatch once). This patch's SQL only takes about 3.8 sec, however, the old SQL takes about 50 sec. Authored-by: Miao Chen <[email protected]> Authored-by: Zhenghua Lyu <[email protected]>

gprecoverseg, gpmovemirrors should show warning and gpexpand should raise an exception if there is already a running instance of pg_basebackup for the segment that needs to be recovered/moved. Fix: Create a list segments_with_running_basebackup of segment dbids for which pg_basebackup is running and while creating the recovery triplets, check if the segment to be recovered/moved/added(expansion segments) is already present in the segments_with_running_basebackup list. If that's the case give warning or raise exception and terminate the utility execution. * Behaviour of gprecoverseg - Warning message is logged containing list of segment contentIds with running basebackup and for the rest of the segments gprecoverseg will proceed with recovery * Behaviour of gpmovemirrors - Warning message is logged containing list of segment contentIds with running basebackup and for the rest of the mirrors gpmovemirrors will proceed with movement * Behaviour of gpexpand - Exception is raised and execution is stopped immediately if segments to be added have running pg_basebackup * Behaviour of gpaddmirrors - As gpaddmirrors does not use force-overwrite flag with pg_basebackup to sync newly added mirrors with primary, the check is not required here. Have added behave test case to ensure it fails when mirror directories are not empty Limitation of above fix: To get the list of segments with running pg_basebackup we are querying gp_stat_replication table. At present this table shows only content ID of the source segment for any running pg_basebackup process and no information about the target segment/DataDirectory Added behave testcase for gprecoverseg, gpmovemirrors

Security vulnerability was found in the scp program shipped with the openssh-clients package and CVSS score for this security vulnerability is 7.8 (https://access.redhat.com/security/cve/cve-2020-15778). Recommended action was to replace scp with rsync so, * Replaced scp with rsync in gpdb cm utilities and utility files * Renamed gpscp utility to gpsync after replacing scp with rsync in utility code

The table gpexpand.status_detail used by gpexpand should be distributed by "table_oid", to avoid the updating workload burden of tens thounsand tables in a segment, concentrated on only one single segment with the prior distribution policy (distributed by dbname). Authored-by: Peng Han <[email protected]>

It is distributed by table_oid column, put it the 1st column for better performance.

This is the last patch for replacing pygresql with psycopg2 in Greenplum. This patch mainly targets the gpMgmt tools. Benefits for replacing pygresql with psycopg2. - Psycopg2 is maintained actively we have encountered bugs that haven't been fixed by the upstream yet, e.g., https://github.com/greenplum-db/gpdb/pull/13953. - Psycopg2 is provided by Rocky Linux and Ubuntu. That is to say, we don't need to vendor it ourselves. - Last but not least, we got a chance to clean up leacy codes during the removal process, e.g., https://github.com/greenplum-db/gpdb/pull/15983. After this patch, we need to do the following things. - Add psycopg2 as a dependency of the rpm/deb package. - Remove the pygresql source code tarball from the gpdb repo. - Tidy up READMEs and requirements.txt files. --------- Co-authored-by: Chen Mulong <[email protected]> Co-authored-by: Xiaoxiao He <[email protected]> Co-authored-by: zhrt123 <[email protected]> Co-authored-by: Piyush Chandwadkar <[email protected]> Co-authored-by: Praveen Kumar <[email protected]>

my-ship-it requested a review from jiaqizho August 20, 2024 01:30

reshke force-pushed the cbdb-py branch from 4ed7079 to 5e25156 Compare September 16, 2024 09:06

reshke changed the title ~~[WIP] Cherry-pick commits from GPDB, for psycopg2~~ Cherry-pick commits from GPDB, for psycopg2 Sep 16, 2024

shrakesh and others added 10 commits September 26, 2024 11:29

Revert "added cluster health check for gpexpand (#13695)" (#13752)

66a1f77

This reverts commit 9374dac019f02eeb2c1565b67c7e9de1fdc3afdf.

Refine struct of gpexpand.status_detail table.

5e7d120

It is distributed by table_oid column, put it the 1st column for better performance.

reshke force-pushed the cbdb-py branch from 5e25156 to b3f3217 Compare September 26, 2024 11:29

tuhaihe added the cherry-pick cherry-pick upstream commts label Oct 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry-pick commits from GPDB, for psycopg2 #571

Cherry-pick commits from GPDB, for psycopg2 #571

reshke commented Aug 13, 2024 •

edited

Loading

CLAassistant commented Aug 13, 2024 •

edited

Loading

edespino commented Sep 19, 2024

jiaqizho commented Sep 25, 2024

Cherry-pick commits from GPDB, for psycopg2 #571

Are you sure you want to change the base?

Cherry-pick commits from GPDB, for psycopg2 #571

Conversation

reshke commented Aug 13, 2024 • edited Loading

CLAassistant commented Aug 13, 2024 • edited Loading

edespino commented Sep 19, 2024

jiaqizho commented Sep 25, 2024

reshke commented Aug 13, 2024 •

edited

Loading

CLAassistant commented Aug 13, 2024 •

edited

Loading