Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADBDEV-4152: Optimize the Explicit Redistribute application logic #666

Merged
merged 85 commits into from
Mar 21, 2024

Conversation

bandetto
Copy link
Member

@bandetto bandetto commented Dec 26, 2023

Optimize the Explicit Redistribute application logic

Previously, if a plan contained any motions and a ModifyTable node, Explicit
Redistribute Motion would be added under the ModifyTable to send tuples back to
a specific segment based on gp_segment_id.

We only really need an Explicit Redistribute Motion if the tuples that we are
going to modify were sent to a different segment by a Motion, because DELETE and
UPDATE use ctid to find tuples that need to be modified, and ctid only makes
sense in its original segment.

This patch updates the logic: add Explicit Redistribute Motion only if there is
a motion that distributes the relation we are going to update and this motion is
between the scan and the ModifyTable.

UPDATE/DELETE operations are represented by ModifyTable node. We expand
ApplyMotionState with a list of resulting relations retrieved from the
ModifyTable node, and a motion counter. A counter is used to account for the
possibility of multiple motions before the scan.

When we encounter a scan node, we search for it's scanrelid (the index of
relation) in our list of resulting relations. If the list contains this
scanrelid, we have found the scan on the table that's going to be modified by
ModifyTable, and remember the motion count. We ignore scans under InitPlans,
since there will not be any scan nodes that perform a scan on the same range
table entry as ModifyTable under InitPlans.

When we encounter the Explicit Redistribute Motion itself, we add it based on
the counter. If the motion counter is not 0, set the we can't elide the Explicit
Redistribute Motion, since the tuples that are going to be modified were
previosly sent to a different segment by a motion. Otherwise, if motion counter
is 0, we can elide it.

The solution is based on a46400e.

@bandetto bandetto force-pushed the ADBDEV-4152 branch 2 times, most recently from 717f6fa to 887ba68 Compare December 26, 2023 08:17
@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/61021

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/925120

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/925121

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/61030

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/925681

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/925682

@BenderArenadata
Copy link

Failed job Regression tests with Postgres on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/925673

@BenderArenadata
Copy link

Failed job Regression tests with Postgres on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/925674

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/925675

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/925676

@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/61104

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/929114

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/929115

@bandetto bandetto force-pushed the ADBDEV-4152 branch 2 times, most recently from 7567bb8 to d4e44fc Compare December 27, 2023 10:36
@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/61113

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/929684

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/929685

@BenderArenadata
Copy link

Failed job Regression tests with Postgres on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/929677

@BenderArenadata
Copy link

Failed job Regression tests with Postgres on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/929676

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/929678

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/929679

@bandetto bandetto marked this pull request as ready for review December 27, 2023 13:45
@BenderArenadata
Copy link

Allure report https://allure-ee.adsw.io/launch/61149

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/931038

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/931032

@BenderArenadata
Copy link

Allure report https://allure.adsw.io/launch/66871

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1174233

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1174234

@RekGRpth

This comment was marked as resolved.

@BenderArenadata
Copy link

Allure report https://allure.adsw.io/launch/66908

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1175520

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1175521

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1175515

RekGRpth
RekGRpth previously approved these changes Mar 19, 2024
@KnightMurloc
Copy link

Why don't we turn off isChecking before enter into InitPlan anymore? And why was the initPlan-related test deleted?

@RekGRpth
Copy link
Member

Why don't we turn off isChecking before enter into InitPlan anymore? And why was the initPlan-related test deleted?

#666 (comment)

@KnightMurloc
Copy link

Okay It doesn't look like there will be any problems from checking InitPlan, but it still doesn't make sense. What's wrong with temporarily turning off isCheking and not checking what we don't need?

Skip InitPlans, but don't restore a useless test.

This reverts commit bb559fd236bda017152005fa8eebfc0b58751695.
@BenderArenadata
Copy link

Allure report https://allure.adsw.io/launch/66988

@BenderArenadata
Copy link

Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1180186

@BenderArenadata
Copy link

Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1180187

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1180181

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1184659

@RekGRpth
Copy link
Member

What happened to the tests again?
--- \/home\/gpadmin\/gpdb_src\/src\/test\/isolation2\/expected\/segwalrep\/fts_unblock_primary\.out	2024-03-20 13:33:20.298468416 +0000
+++ \/home\/gpadmin\/gpdb_src\/src\/test\/isolation2\/results\/segwalrep\/fts_unblock_primary\.out	2024-03-20 13:33:20.326467066 +0000
@@ -225,7 +249,30 @@
 (exited with code 0)
 2Uq: ... <quitting>
 2U: show gp_fts_mark_mirror_down_grace_period;
- gp_fts_mark_mirror_down_grace_period 
---------------------------------------
- 30s                                  
-(1 row)
+Process Process-6:
+Traceback (most recent call last):
+  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
+    self.run()
+  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
+    self._target(*self._args, **self._kwargs)
+  File "./sql_isolation_testcase.py", line 290, in session_process
+    self.mode, pipe, self.dbname, user=self.user, passwd=self.passwd)
+  File "./sql_isolation_testcase.py", line 363, in __init__
+    (hostname, port) = self.get_hostname_port(name, 'p')
+  File "./sql_isolation_testcase.py", line 442, in get_hostname_port
+    con = self.connectdb(self.dbname, given_opt="-c gp_session_role=utility")
+  File "./sql_isolation_testcase.py", line 412, in connectdb
+    passwd = given_passwd)
+InternalError: FATAL:  optimizer failed to init (CSyncHashtable.h:110)
+HINT:  Process xxx will wait for gp_debug_linger=120 seconds before termination.
+Note that its locks and other resources will not be released until then.
+
+FAILED:  [Errno 104] Connection reset by peer
+Traceback (most recent call last):
+  File "./sql_isolation_testcase.py", line 1087, in <module>
+    executor.process_isolation_file(sys.stdin, sys.stdout, options.initfile_prefix)
+  File "./sql_isolation_testcase.py", line 838, in process_isolation_file
+    process.stop()
+  File "./sql_isolation_testcase.py", line 335, in stop
+    self.pipe.send(("", False))
+IOError: [Errno 32] Broken pipe
======================================================================
--- \/home\/gpadmin\/gpdb_src\/src\/test\/isolation2\/expected\/segwalrep\/recoverseg_from_file\.out	2024-03-20 13:33:22.810347353 +0000
+++ \/home\/gpadmin\/gpdb_src\/src\/test\/isolation2\/results\/segwalrep\/recoverseg_from_file\.out	2024-03-20 13:33:22.834346197 +0000
@@ -9,79 +9,105 @@
 -- continous, the inconsistent issue will happen
 
 include: helpers/server_helpers.sql;
-CREATE
+Process Process-1:
+Traceback (most recent call last):
+  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
+    self.run()
+  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
+    self._target(*self._args, **self._kwargs)
+  File "./sql_isolation_testcase.py", line 290, in session_process
+    self.mode, pipe, self.dbname, user=self.user, passwd=self.passwd)
+  File "./sql_isolation_testcase.py", line 400, in __init__
+    self.con = self.connectdb(self.dbname)
+  File "./sql_isolation_testcase.py", line 412, in connectdb
+    passwd = given_passwd)
+InternalError: FATAL:  optimizer failed to init (CSyncHashtable.h:110)
+HINT:  Process xxx will wait for gp_debug_linger=120 seconds before termination.
+Note that its locks and other resources will not be released until then.
+
+FAILED:  [Errno 104] Connection reset by peer
 
 --
 -- generate_recover_config_file:
 --   generate config file used by recoverseg -i
 --
 create or replace function generate_recover_config_file(datadir text, port text) returns void as $$ import io import os myhost = os.uname()[1] inplaceConfig = myhost + '|' + port + '|' + datadir configStr = inplaceConfig + ' ' + inplaceConfig  f = open("/tmp/recover_config_file", "w") f.write(configStr) f.close() $$ language plpythonu;
-CREATE
+FAILED:  [Errno 32] Broken pipe
 
 SELECT dbid, role, preferred_role, content, mode, status FROM gp_segment_configuration order by dbid;
- dbid | role | preferred_role | content | mode | status 
-------+------+----------------+---------+------+--------
- 1    | p    | p              | -1      | n    | u      
- 2    | p    | p              | 0       | s    | u      
- 3    | p    | p              | 1       | s    | u      
- 4    | p    | p              | 2       | s    | u      
- 5    | m    | m              | 0       | s    | u      
- 6    | m    | m              | 1       | s    | u      
- 7    | m    | m              | 2       | s    | u      
- 8    | m    | m              | -1      | s    | u      
-(8 rows)
+FAILED:  [Errno 32] Broken pipe
 -- stop a primary in order to trigger a mirror promotion
 select pg_ctl((select datadir from gp_segment_configuration c where c.role='p' and c.content=1), 'stop');
- pg_ctl 
---------
- OK     
-(1 row)
+FAILED:  [Errno 32] Broken pipe
 
 -- trigger failover
 select gp_request_fts_probe_scan();
- gp_request_fts_probe_scan 
----------------------------
- t                         
-(1 row)
+FAILED:  [Errno 32] Broken pipe
 
 -- wait for content 1 (earlier mirror, now primary) to finish the promotion
 1U: select 1;
- ?column? 
-----------
- 1        
-(1 row)
+Process Process-2:
+Traceback (most recent call last):
+  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
+    self.run()
+  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
+    self._target(*self._args, **self._kwargs)
+  File "./sql_isolation_testcase.py", line 290, in session_process
+    self.mode, pipe, self.dbname, user=self.user, passwd=self.passwd)
+  File "./sql_isolation_testcase.py", line 363, in __init__
+    (hostname, port) = self.get_hostname_port(name, 'p')
+  File "./sql_isolation_testcase.py", line 442, in get_hostname_port
+    con = self.connectdb(self.dbname, given_opt="-c gp_session_role=utility")
+  File "./sql_isolation_testcase.py", line 412, in connectdb
+    passwd = given_passwd)
+InternalError: FATAL:  optimizer failed to init (CSyncHashtable.h:110)
+HINT:  Process xxx will wait for gp_debug_linger=120 seconds before termination.
+Note that its locks and other resources will not be released until then.
+
+FAILED:  [Errno 104] Connection reset by peer
 -- Quit this utility mode session, as need to start fresh one below
 1Uq: ... <quitting>
+FAILED:  [Errno 32] Broken pipe
 
 -- make the dbid in gp_segment_configuration not continuous
 -- dbid=2 corresponds to content 0 and role p, change it to dbid=9
 set allow_system_table_mods to true;
-SET
+FAILED:  [Errno 32] Broken pipe
 update gp_segment_configuration set dbid=9 where content=0 and role='p';
-UPDATE 1
+FAILED:  [Errno 32] Broken pipe
 
 -- trigger failover
 select gp_request_fts_probe_scan();
- gp_request_fts_probe_scan 
----------------------------
- t                         
-(1 row)
+FAILED:  [Errno 32] Broken pipe
 
 -- wait for content 0 (earlier mirror, now primary) to finish the promotion
 0U: select 1;
- ?column? 
-----------
- 1        
-(1 row)
+Process Process-3:
+Traceback (most recent call last):
+  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
+    self.run()
+  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
+    self._target(*self._args, **self._kwargs)
+  File "./sql_isolation_testcase.py", line 290, in session_process
+    self.mode, pipe, self.dbname, user=self.user, passwd=self.passwd)
+  File "./sql_isolation_testcase.py", line 363, in __init__
+    (hostname, port) = self.get_hostname_port(name, 'p')
+  File "./sql_isolation_testcase.py", line 442, in get_hostname_port
+    con = self.connectdb(self.dbname, given_opt="-c gp_session_role=utility")
+  File "./sql_isolation_testcase.py", line 412, in connectdb
+    passwd = given_passwd)
+InternalError: FATAL:  optimizer failed to init (CSyncHashtable.h:110)
+HINT:  Process xxx will wait for gp_debug_linger=120 seconds before termination.
+Note that its locks and other resources will not be released until then.
+
+FAILED:  [Errno 104] Connection reset by peer
 -- Quit this utility mode session, as need to start fresh one below
 0Uq: ... <quitting>
+FAILED:  [Errno 32] Broken pipe
 
 -- generate recover config file
 select generate_recover_config_file( (select datadir from gp_segment_configuration c where c.role='m' and c.content=1), (select port from gp_segment_configuration c where c.role='m' and c.content=1)::text);
- generate_recover_config_file 
-------------------------------
-                              
-(1 row)
+FAILED:  [Errno 32] Broken pipe
 
 -- recover from config file, only seg with content=1 will be recovered
 -- use udpifc to avoid motion hangs when running in ic_proxy envs due to change in dbid
@@ -87,28 +113,30 @@
 -- use udpifc to avoid motion hangs when running in ic_proxy envs due to change in dbid
 !\retcode PGOPTIONS='-c gp_interconnect_type=udpifc' gprecoverseg -a -i /tmp/recover_config_file;
 GP_IGNORE:-- start_ignore
+GP_IGNORE:20240320:13:33:21:112918 gprecoverseg:9f66aa2cacc2:gpadmin-[INFO]:-Starting gprecoverseg with args: -a -i /tmp/recover_config_file
+GP_IGNORE:20240320:13:33:21:112918 gprecoverseg:9f66aa2cacc2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.26.2_arenadata55+dev.86.gd00b0e3 build commit:d00b0e3598d4283d0a04d03d5bea01232053b4d0'
+GP_IGNORE:20240320:13:33:21:112918 gprecoverseg:9f66aa2cacc2:gpadmin-[CRITICAL]:-gprecoverseg failed. (Reason='FATAL:  optimizer failed to init (CSyncHashtable.h:110)
+GP_IGNORE:HINT:  Process 112943 will wait for gp_debug_linger=120 seconds before termination.
+GP_IGNORE:Note that its locks and other resources will not be released until then.
+GP_IGNORE:') exiting...
+GP_IGNORE:
 GP_IGNORE:-- end_ignore
-(exited with code 0)
+(exited with code 2)
 
 -- after gprecoverseg -i, the down segemnt should be up
 -- in mirror mode
 select status from gp_segment_configuration where role='m' and content=1;
- status 
---------
- u      
-(1 row)
+FAILED:  [Errno 32] Broken pipe
 
 -- recover should reuse the old dbid and not occupy dbid=2
 select dbid from gp_segment_configuration where dbid=2;
- dbid 
-------
-(0 rows)
+FAILED:  [Errno 32] Broken pipe
 
 update gp_segment_configuration set dbid=2 where dbid=9;
-UPDATE 1
+FAILED:  [Errno 32] Broken pipe
 
 set allow_system_table_mods to false;
-SET
+FAILED:  [Errno 32] Broken pipe
 
 -- we manually change dbid from 2 to 9, which causes the
 -- corresponding segment down as well, so recovery full
@@ -115,15 +143,19 @@
 -- at here
 !\retcode gprecoverseg -a;
 GP_IGNORE:-- start_ignore
+GP_IGNORE:20240320:13:33:21:112946 gprecoverseg:9f66aa2cacc2:gpadmin-[INFO]:-Starting gprecoverseg with args: -a
+GP_IGNORE:20240320:13:33:21:112946 gprecoverseg:9f66aa2cacc2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.26.2_arenadata55+dev.86.gd00b0e3 build commit:d00b0e3598d4283d0a04d03d5bea01232053b4d0'
+GP_IGNORE:20240320:13:33:22:112946 gprecoverseg:9f66aa2cacc2:gpadmin-[CRITICAL]:-gprecoverseg failed. (Reason='FATAL:  optimizer failed to init (CSyncHashtable.h:110)
+GP_IGNORE:HINT:  Process 112971 will wait for gp_debug_linger=120 seconds before termination.
+GP_IGNORE:Note that its locks and other resources will not be released until then.
+GP_IGNORE:') exiting...
+GP_IGNORE:
 GP_IGNORE:-- end_ignore
-(exited with code 0)
+(exited with code 2)
 
 -- loop while segments come in sync
 select wait_until_all_segments_synchronized();
- wait_until_all_segments_synchronized 
---------------------------------------
- OK                                   
-(1 row)
+FAILED:  [Errno 32] Broken pipe
 
 -- rebalance the cluster
 !\retcode gprecoverseg -ar;
@@ -128,29 +160,31 @@
 -- rebalance the cluster
 !\retcode gprecoverseg -ar;
 GP_IGNORE:-- start_ignore
+GP_IGNORE:20240320:13:33:22:112974 gprecoverseg:9f66aa2cacc2:gpadmin-[INFO]:-Starting gprecoverseg with args: -ar
+GP_IGNORE:20240320:13:33:22:112974 gprecoverseg:9f66aa2cacc2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.26.2_arenadata55+dev.86.gd00b0e3 build commit:d00b0e3598d4283d0a04d03d5bea01232053b4d0'
+GP_IGNORE:20240320:13:33:22:112974 gprecoverseg:9f66aa2cacc2:gpadmin-[CRITICAL]:-gprecoverseg failed. (Reason='FATAL:  optimizer failed to init (CSyncHashtable.h:110)
+GP_IGNORE:HINT:  Process 112999 will wait for gp_debug_linger=120 seconds before termination.
+GP_IGNORE:Note that its locks and other resources will not be released until then.
+GP_IGNORE:') exiting...
+GP_IGNORE:
 GP_IGNORE:-- end_ignore
-(exited with code 0)
+(exited with code 2)
 
 -- loop while segments come in sync
 select wait_until_all_segments_synchronized();
- wait_until_all_segments_synchronized 
---------------------------------------
- OK                                   
-(1 row)
+FAILED:  [Errno 32] Broken pipe
 
 -- recheck gp_segment_configuration after rebalance
 SELECT dbid, role, preferred_role, content, mode, status FROM gp_segment_configuration order by dbid;
- dbid | role | preferred_role | content | mode | status 
-------+------+----------------+---------+------+--------
- 1    | p    | p              | -1      | n    | u      
- 2    | p    | p              | 0       | s    | u      
- 3    | p    | p              | 1       | s    | u      
- 4    | p    | p              | 2       | s    | u      
- 5    | m    | m              | 0       | s    | u      
- 6    | m    | m              | 1       | s    | u      
- 7    | m    | m              | 2       | s    | u      
- 8    | m    | m              | -1      | s    | u      
-(8 rows)
+FAILED:  [Errno 32] Broken pipe
 
 -- remove the config file
 !\retcode rm /tmp/recover_config_file
+Traceback (most recent call last):
+  File "./sql_isolation_testcase.py", line 1087, in <module>
+    executor.process_isolation_file(sys.stdin, sys.stdout, options.initfile_prefix)
+  File "./sql_isolation_testcase.py", line 838, in process_isolation_file
+    process.stop()
+  File "./sql_isolation_testcase.py", line 335, in stop
+    self.pipe.send(("", False))
+IOError: [Errno 32] Broken pipe

@bandetto bandetto merged commit 8ddf4cc into adb-6.x-dev Mar 21, 2024
5 checks passed
@bandetto bandetto deleted the ADBDEV-4152 branch March 21, 2024 13:44
@Stolb27 Stolb27 mentioned this pull request May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants