-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADBDEV-4152: Optimize the Explicit Redistribute application logic #666
Conversation
717f6fa
to
887ba68
Compare
Allure report https://allure-ee.adsw.io/launch/61021 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/925120 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/925121 |
887ba68
to
5fbc4bd
Compare
Allure report https://allure-ee.adsw.io/launch/61030 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/925681 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/925682 |
Failed job Regression tests with Postgres on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/925673 |
Failed job Regression tests with Postgres on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/925674 |
Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/925675 |
Failed job Regression tests with ORCA on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/925676 |
Allure report https://allure-ee.adsw.io/launch/61104 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/929114 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/929115 |
7567bb8
to
d4e44fc
Compare
Allure report https://allure-ee.adsw.io/launch/61113 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/929684 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/929685 |
Failed job Regression tests with Postgres on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/929677 |
Failed job Regression tests with Postgres on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/929676 |
Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/929678 |
Failed job Regression tests with ORCA on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/929679 |
d4e44fc
to
15ba1cc
Compare
Allure report https://allure-ee.adsw.io/launch/61149 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/931038 |
Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/931032 |
Allure report https://allure.adsw.io/launch/66871 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1174233 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1174234 |
This comment was marked as resolved.
This comment was marked as resolved.
Allure report https://allure.adsw.io/launch/66908 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1175520 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1175521 |
Failed job Regression tests with ORCA on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1175515 |
Why don't we turn off isChecking before enter into InitPlan anymore? And why was the initPlan-related test deleted? |
|
Okay It doesn't look like there will be any problems from checking InitPlan, but it still doesn't make sense. What's wrong with temporarily turning off isCheking and not checking what we don't need? |
Skip InitPlans, but don't restore a useless test. This reverts commit bb559fd236bda017152005fa8eebfc0b58751695.
Allure report https://allure.adsw.io/launch/66988 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1180186 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1180187 |
Failed job Regression tests with ORCA on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1180181 |
Failed job Regression tests with ORCA on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1184659 |
What happened to the tests again?--- \/home\/gpadmin\/gpdb_src\/src\/test\/isolation2\/expected\/segwalrep\/fts_unblock_primary\.out 2024-03-20 13:33:20.298468416 +0000
+++ \/home\/gpadmin\/gpdb_src\/src\/test\/isolation2\/results\/segwalrep\/fts_unblock_primary\.out 2024-03-20 13:33:20.326467066 +0000
@@ -225,7 +249,30 @@
(exited with code 0)
2Uq: ... <quitting>
2U: show gp_fts_mark_mirror_down_grace_period;
- gp_fts_mark_mirror_down_grace_period
---------------------------------------
- 30s
-(1 row)
+Process Process-6:
+Traceback (most recent call last):
+ File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
+ self.run()
+ File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
+ self._target(*self._args, **self._kwargs)
+ File "./sql_isolation_testcase.py", line 290, in session_process
+ self.mode, pipe, self.dbname, user=self.user, passwd=self.passwd)
+ File "./sql_isolation_testcase.py", line 363, in __init__
+ (hostname, port) = self.get_hostname_port(name, 'p')
+ File "./sql_isolation_testcase.py", line 442, in get_hostname_port
+ con = self.connectdb(self.dbname, given_opt="-c gp_session_role=utility")
+ File "./sql_isolation_testcase.py", line 412, in connectdb
+ passwd = given_passwd)
+InternalError: FATAL: optimizer failed to init (CSyncHashtable.h:110)
+HINT: Process xxx will wait for gp_debug_linger=120 seconds before termination.
+Note that its locks and other resources will not be released until then.
+
+FAILED: [Errno 104] Connection reset by peer
+Traceback (most recent call last):
+ File "./sql_isolation_testcase.py", line 1087, in <module>
+ executor.process_isolation_file(sys.stdin, sys.stdout, options.initfile_prefix)
+ File "./sql_isolation_testcase.py", line 838, in process_isolation_file
+ process.stop()
+ File "./sql_isolation_testcase.py", line 335, in stop
+ self.pipe.send(("", False))
+IOError: [Errno 32] Broken pipe
======================================================================
--- \/home\/gpadmin\/gpdb_src\/src\/test\/isolation2\/expected\/segwalrep\/recoverseg_from_file\.out 2024-03-20 13:33:22.810347353 +0000
+++ \/home\/gpadmin\/gpdb_src\/src\/test\/isolation2\/results\/segwalrep\/recoverseg_from_file\.out 2024-03-20 13:33:22.834346197 +0000
@@ -9,79 +9,105 @@
-- continous, the inconsistent issue will happen
include: helpers/server_helpers.sql;
-CREATE
+Process Process-1:
+Traceback (most recent call last):
+ File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
+ self.run()
+ File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
+ self._target(*self._args, **self._kwargs)
+ File "./sql_isolation_testcase.py", line 290, in session_process
+ self.mode, pipe, self.dbname, user=self.user, passwd=self.passwd)
+ File "./sql_isolation_testcase.py", line 400, in __init__
+ self.con = self.connectdb(self.dbname)
+ File "./sql_isolation_testcase.py", line 412, in connectdb
+ passwd = given_passwd)
+InternalError: FATAL: optimizer failed to init (CSyncHashtable.h:110)
+HINT: Process xxx will wait for gp_debug_linger=120 seconds before termination.
+Note that its locks and other resources will not be released until then.
+
+FAILED: [Errno 104] Connection reset by peer
--
-- generate_recover_config_file:
-- generate config file used by recoverseg -i
--
create or replace function generate_recover_config_file(datadir text, port text) returns void as $$ import io import os myhost = os.uname()[1] inplaceConfig = myhost + '|' + port + '|' + datadir configStr = inplaceConfig + ' ' + inplaceConfig f = open("/tmp/recover_config_file", "w") f.write(configStr) f.close() $$ language plpythonu;
-CREATE
+FAILED: [Errno 32] Broken pipe
SELECT dbid, role, preferred_role, content, mode, status FROM gp_segment_configuration order by dbid;
- dbid | role | preferred_role | content | mode | status
-------+------+----------------+---------+------+--------
- 1 | p | p | -1 | n | u
- 2 | p | p | 0 | s | u
- 3 | p | p | 1 | s | u
- 4 | p | p | 2 | s | u
- 5 | m | m | 0 | s | u
- 6 | m | m | 1 | s | u
- 7 | m | m | 2 | s | u
- 8 | m | m | -1 | s | u
-(8 rows)
+FAILED: [Errno 32] Broken pipe
-- stop a primary in order to trigger a mirror promotion
select pg_ctl((select datadir from gp_segment_configuration c where c.role='p' and c.content=1), 'stop');
- pg_ctl
---------
- OK
-(1 row)
+FAILED: [Errno 32] Broken pipe
-- trigger failover
select gp_request_fts_probe_scan();
- gp_request_fts_probe_scan
----------------------------
- t
-(1 row)
+FAILED: [Errno 32] Broken pipe
-- wait for content 1 (earlier mirror, now primary) to finish the promotion
1U: select 1;
- ?column?
-----------
- 1
-(1 row)
+Process Process-2:
+Traceback (most recent call last):
+ File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
+ self.run()
+ File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
+ self._target(*self._args, **self._kwargs)
+ File "./sql_isolation_testcase.py", line 290, in session_process
+ self.mode, pipe, self.dbname, user=self.user, passwd=self.passwd)
+ File "./sql_isolation_testcase.py", line 363, in __init__
+ (hostname, port) = self.get_hostname_port(name, 'p')
+ File "./sql_isolation_testcase.py", line 442, in get_hostname_port
+ con = self.connectdb(self.dbname, given_opt="-c gp_session_role=utility")
+ File "./sql_isolation_testcase.py", line 412, in connectdb
+ passwd = given_passwd)
+InternalError: FATAL: optimizer failed to init (CSyncHashtable.h:110)
+HINT: Process xxx will wait for gp_debug_linger=120 seconds before termination.
+Note that its locks and other resources will not be released until then.
+
+FAILED: [Errno 104] Connection reset by peer
-- Quit this utility mode session, as need to start fresh one below
1Uq: ... <quitting>
+FAILED: [Errno 32] Broken pipe
-- make the dbid in gp_segment_configuration not continuous
-- dbid=2 corresponds to content 0 and role p, change it to dbid=9
set allow_system_table_mods to true;
-SET
+FAILED: [Errno 32] Broken pipe
update gp_segment_configuration set dbid=9 where content=0 and role='p';
-UPDATE 1
+FAILED: [Errno 32] Broken pipe
-- trigger failover
select gp_request_fts_probe_scan();
- gp_request_fts_probe_scan
----------------------------
- t
-(1 row)
+FAILED: [Errno 32] Broken pipe
-- wait for content 0 (earlier mirror, now primary) to finish the promotion
0U: select 1;
- ?column?
-----------
- 1
-(1 row)
+Process Process-3:
+Traceback (most recent call last):
+ File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
+ self.run()
+ File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
+ self._target(*self._args, **self._kwargs)
+ File "./sql_isolation_testcase.py", line 290, in session_process
+ self.mode, pipe, self.dbname, user=self.user, passwd=self.passwd)
+ File "./sql_isolation_testcase.py", line 363, in __init__
+ (hostname, port) = self.get_hostname_port(name, 'p')
+ File "./sql_isolation_testcase.py", line 442, in get_hostname_port
+ con = self.connectdb(self.dbname, given_opt="-c gp_session_role=utility")
+ File "./sql_isolation_testcase.py", line 412, in connectdb
+ passwd = given_passwd)
+InternalError: FATAL: optimizer failed to init (CSyncHashtable.h:110)
+HINT: Process xxx will wait for gp_debug_linger=120 seconds before termination.
+Note that its locks and other resources will not be released until then.
+
+FAILED: [Errno 104] Connection reset by peer
-- Quit this utility mode session, as need to start fresh one below
0Uq: ... <quitting>
+FAILED: [Errno 32] Broken pipe
-- generate recover config file
select generate_recover_config_file( (select datadir from gp_segment_configuration c where c.role='m' and c.content=1), (select port from gp_segment_configuration c where c.role='m' and c.content=1)::text);
- generate_recover_config_file
-------------------------------
-
-(1 row)
+FAILED: [Errno 32] Broken pipe
-- recover from config file, only seg with content=1 will be recovered
-- use udpifc to avoid motion hangs when running in ic_proxy envs due to change in dbid
@@ -87,28 +113,30 @@
-- use udpifc to avoid motion hangs when running in ic_proxy envs due to change in dbid
!\retcode PGOPTIONS='-c gp_interconnect_type=udpifc' gprecoverseg -a -i /tmp/recover_config_file;
GP_IGNORE:-- start_ignore
+GP_IGNORE:20240320:13:33:21:112918 gprecoverseg:9f66aa2cacc2:gpadmin-[INFO]:-Starting gprecoverseg with args: -a -i /tmp/recover_config_file
+GP_IGNORE:20240320:13:33:21:112918 gprecoverseg:9f66aa2cacc2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.26.2_arenadata55+dev.86.gd00b0e3 build commit:d00b0e3598d4283d0a04d03d5bea01232053b4d0'
+GP_IGNORE:20240320:13:33:21:112918 gprecoverseg:9f66aa2cacc2:gpadmin-[CRITICAL]:-gprecoverseg failed. (Reason='FATAL: optimizer failed to init (CSyncHashtable.h:110)
+GP_IGNORE:HINT: Process 112943 will wait for gp_debug_linger=120 seconds before termination.
+GP_IGNORE:Note that its locks and other resources will not be released until then.
+GP_IGNORE:') exiting...
+GP_IGNORE:
GP_IGNORE:-- end_ignore
-(exited with code 0)
+(exited with code 2)
-- after gprecoverseg -i, the down segemnt should be up
-- in mirror mode
select status from gp_segment_configuration where role='m' and content=1;
- status
---------
- u
-(1 row)
+FAILED: [Errno 32] Broken pipe
-- recover should reuse the old dbid and not occupy dbid=2
select dbid from gp_segment_configuration where dbid=2;
- dbid
-------
-(0 rows)
+FAILED: [Errno 32] Broken pipe
update gp_segment_configuration set dbid=2 where dbid=9;
-UPDATE 1
+FAILED: [Errno 32] Broken pipe
set allow_system_table_mods to false;
-SET
+FAILED: [Errno 32] Broken pipe
-- we manually change dbid from 2 to 9, which causes the
-- corresponding segment down as well, so recovery full
@@ -115,15 +143,19 @@
-- at here
!\retcode gprecoverseg -a;
GP_IGNORE:-- start_ignore
+GP_IGNORE:20240320:13:33:21:112946 gprecoverseg:9f66aa2cacc2:gpadmin-[INFO]:-Starting gprecoverseg with args: -a
+GP_IGNORE:20240320:13:33:21:112946 gprecoverseg:9f66aa2cacc2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.26.2_arenadata55+dev.86.gd00b0e3 build commit:d00b0e3598d4283d0a04d03d5bea01232053b4d0'
+GP_IGNORE:20240320:13:33:22:112946 gprecoverseg:9f66aa2cacc2:gpadmin-[CRITICAL]:-gprecoverseg failed. (Reason='FATAL: optimizer failed to init (CSyncHashtable.h:110)
+GP_IGNORE:HINT: Process 112971 will wait for gp_debug_linger=120 seconds before termination.
+GP_IGNORE:Note that its locks and other resources will not be released until then.
+GP_IGNORE:') exiting...
+GP_IGNORE:
GP_IGNORE:-- end_ignore
-(exited with code 0)
+(exited with code 2)
-- loop while segments come in sync
select wait_until_all_segments_synchronized();
- wait_until_all_segments_synchronized
---------------------------------------
- OK
-(1 row)
+FAILED: [Errno 32] Broken pipe
-- rebalance the cluster
!\retcode gprecoverseg -ar;
@@ -128,29 +160,31 @@
-- rebalance the cluster
!\retcode gprecoverseg -ar;
GP_IGNORE:-- start_ignore
+GP_IGNORE:20240320:13:33:22:112974 gprecoverseg:9f66aa2cacc2:gpadmin-[INFO]:-Starting gprecoverseg with args: -ar
+GP_IGNORE:20240320:13:33:22:112974 gprecoverseg:9f66aa2cacc2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.26.2_arenadata55+dev.86.gd00b0e3 build commit:d00b0e3598d4283d0a04d03d5bea01232053b4d0'
+GP_IGNORE:20240320:13:33:22:112974 gprecoverseg:9f66aa2cacc2:gpadmin-[CRITICAL]:-gprecoverseg failed. (Reason='FATAL: optimizer failed to init (CSyncHashtable.h:110)
+GP_IGNORE:HINT: Process 112999 will wait for gp_debug_linger=120 seconds before termination.
+GP_IGNORE:Note that its locks and other resources will not be released until then.
+GP_IGNORE:') exiting...
+GP_IGNORE:
GP_IGNORE:-- end_ignore
-(exited with code 0)
+(exited with code 2)
-- loop while segments come in sync
select wait_until_all_segments_synchronized();
- wait_until_all_segments_synchronized
---------------------------------------
- OK
-(1 row)
+FAILED: [Errno 32] Broken pipe
-- recheck gp_segment_configuration after rebalance
SELECT dbid, role, preferred_role, content, mode, status FROM gp_segment_configuration order by dbid;
- dbid | role | preferred_role | content | mode | status
-------+------+----------------+---------+------+--------
- 1 | p | p | -1 | n | u
- 2 | p | p | 0 | s | u
- 3 | p | p | 1 | s | u
- 4 | p | p | 2 | s | u
- 5 | m | m | 0 | s | u
- 6 | m | m | 1 | s | u
- 7 | m | m | 2 | s | u
- 8 | m | m | -1 | s | u
-(8 rows)
+FAILED: [Errno 32] Broken pipe
-- remove the config file
!\retcode rm /tmp/recover_config_file
+Traceback (most recent call last):
+ File "./sql_isolation_testcase.py", line 1087, in <module>
+ executor.process_isolation_file(sys.stdin, sys.stdout, options.initfile_prefix)
+ File "./sql_isolation_testcase.py", line 838, in process_isolation_file
+ process.stop()
+ File "./sql_isolation_testcase.py", line 335, in stop
+ self.pipe.send(("", False))
+IOError: [Errno 32] Broken pipe |
Optimize the Explicit Redistribute application logic
Previously, if a plan contained any motions and a ModifyTable node, Explicit
Redistribute Motion would be added under the ModifyTable to send tuples back to
a specific segment based on gp_segment_id.
We only really need an Explicit Redistribute Motion if the tuples that we are
going to modify were sent to a different segment by a Motion, because DELETE and
UPDATE use ctid to find tuples that need to be modified, and ctid only makes
sense in its original segment.
This patch updates the logic: add Explicit Redistribute Motion only if there is
a motion that distributes the relation we are going to update and this motion is
between the scan and the ModifyTable.
UPDATE/DELETE operations are represented by ModifyTable node. We expand
ApplyMotionState with a list of resulting relations retrieved from the
ModifyTable node, and a motion counter. A counter is used to account for the
possibility of multiple motions before the scan.
When we encounter a scan node, we search for it's scanrelid (the index of
relation) in our list of resulting relations. If the list contains this
scanrelid, we have found the scan on the table that's going to be modified by
ModifyTable, and remember the motion count. We ignore scans under InitPlans,
since there will not be any scan nodes that perform a scan on the same range
table entry as ModifyTable under InitPlans.
When we encounter the Explicit Redistribute Motion itself, we add it based on
the counter. If the motion counter is not 0, set the we can't elide the Explicit
Redistribute Motion, since the tuples that are going to be modified were
previosly sent to a different segment by a motion. Otherwise, if motion counter
is 0, we can elide it.
The solution is based on a46400e.