-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADBDEV-3999: ORCA generates incorrect plan with exists clause for partitioned table #576
Conversation
For subqueries with EXISTS, the first column is always selected by ORCA. But if this column does not participate anywhere else in the query and is neither a system nor a distribution key, then it is not marked as used. Therefore, when translating to DXL, such a column is skipped, and in a debug build, such a plan is generally not allowed! On a release build, when a column is skipped, that column is not added to table_descr. And therefore, its number is not inserted into the m_colid_to_attno_map hashmap of the correspondence of the column number to the attribute number. And now the attribute number is not found by the column number. And so a variable with an unknown attribute number is created. Such a variable with an unknown attribute number is treated as a whole-row. In this case, the oid of this variable does not correspond to the composite type and an error occurs. The solution to the problem is very simple - always mark the selected column as used.
@@ -1934,6 +1934,7 @@ CSubqueryHandler::FRemoveExistentialSubquery( | |||
// for existential subqueries, any column produced by inner expression | |||
// can be used to check for empty answers; we use first column for that | |||
CColRef *colref = pexprInner->DeriveOutputColumns()->PcrFirst(); | |||
colref->MarkAsUsed(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the same changes are needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the same changes are needed here?
It seems, yes, I added it, but I could not find a request for verification, because in the test request, it gets here already after the forced usage marking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch is overall OK.
Despite it looks very logical, we can't change the code without the understanding of what it changes.
Try to find the query, which use the same block of code. Then change it to reproduce an error. If you can't find appropriate query, try to add WARNING message to this block of code and run regression tests. The example queries will reveal in regression diffs.
If we can't find a query, or if target column is always already marked as used (or etc) it's better to not touch the code, but leave here a comment about why marking the column is not necessary.
In the latest version of patch, I removed CSubqueryHandler.cpp:1938 line and patch still works. So, it's probably questionable where we should place a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about another approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about another approach?
Any comment about why you decided to change your mind and why this solution is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What query can hit only CSubqueryHandler.cpp:1304? Did you test such query trying to modify and break it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What query can hit only CSubqueryHandler.cpp:1304?
diff --git a/src/backend/gporca/libgpopt/src/xforms/CSubqueryHandler.cpp b/src/backend/gporca/libgpopt/src/xforms/CSubqueryHandler.cpp
index 2ce541a472..bd454265c0 100644
--- a/src/backend/gporca/libgpopt/src/xforms/CSubqueryHandler.cpp
+++ b/src/backend/gporca/libgpopt/src/xforms/CSubqueryHandler.cpp
@@ -58,6 +58,8 @@
using namespace gpopt;
+static int hit = 0;
+
#ifdef GPOS_DEBUG
//---------------------------------------------------------------------------
// @function:
@@ -1302,6 +1304,8 @@ CSubqueryHandler::FCreateCorrelatedApplyForExistentialSubquery(
// for existential subqueries, any column produced by inner expression
// can be used to check for empty answers; we use first column for that
CColRef *colref = pexprInner->DeriveOutputColumns()->PcrFirst();
+ hit--;
+ GPOS_RTL_ASSERT(0 <= hit);
pexprInner->AddRef();
if (EsqctxtFilter == esqctxt)
@@ -1934,6 +1938,7 @@ CSubqueryHandler::FRemoveExistentialSubquery(
// for existential subqueries, any column produced by inner expression
// can be used to check for empty answers; we use first column for that
CColRef *colref = pexprInner->DeriveOutputColumns()->PcrFirst();
+ hit++;
if (COperator::EopScalarSubqueryExists == op_id)
{
shows that we can hit only CSubqueryHandler.cpp:1304 on query from unit test DynamicIndexGet-OuterRefs.mdp or PartTbl-AggWithExistentialSubquery.mdp or ExistentialSubquriesInsideScalarExpression.mdp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test such query trying to modify and break it?
Yes, I changed the queries from all three tests, got an unused column on CSubqueryHandler.cpp:1304, but the CTranslatorExprToDXL.cpp:7580 assert did not work in all three cases, because reqd_prop_plan did not contain this column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thus, another solution suggests itself: remove unused columns from reqd_prop_plan or do not place them into there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest another approach, which gets first used column for exist subquery
Allure report |
Allure report https://allure-ee.adsw.io/launch/53148 |
Allure report https://allure-ee.adsw.io/launch/53651 |
} | ||
} | ||
|
||
return PcrFirst(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand the current solution instead of selecting the first column of table (and marking it as used) returns the first used column of it.
If I'm right, the next question is here: for ex. next case (seems it's impossible) - there are no selected columns in table (no one of columns marked as used), this function would return unmarked column and the problem still here. Is it so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand the current solution instead of selecting the first column of table (and marking it as used) returns the first used column of it.
Exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are no selected columns in table (no one of columns marked as used), this function would return unmarked column and the problem still here. Is it so?
May be. In this case, I can mark the first column as used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Modifying the Modifications to index classes contain buggy alternatives, e.g. diff --git a/src/backend/gporca/libgpopt/include/gpopt/operators/CLogicalDynamicIndexGet.h b/src/backend/gporca/libgpopt/include/gpopt/operators/CLogicalDynamicIndexGet.h
index e9be53c814..d569754c23 100644
--- a/src/backend/gporca/libgpopt/include/gpopt/operators/CLogicalDynamicIndexGet.h
+++ b/src/backend/gporca/libgpopt/include/gpopt/operators/CLogicalDynamicIndexGet.h
@@ -134,6 +134,10 @@ public:
virtual COperator *PopCopyWithRemappedColumns(
CMemoryPool *mp, UlongToColRefMap *colref_mapping, BOOL must_exist);
+ // derive output columns
+ virtual CColRefSet *DeriveOutputColumns(CMemoryPool *mp,
+ CExpressionHandle &exprhdl);
+
//-------------------------------------------------------------------------------------
// Required Relational Properties
//-------------------------------------------------------------------------------------
diff --git a/src/backend/gporca/libgpopt/src/operators/CLogicalDynamicIndexGet.cpp b/src/backend/gporca/libgpopt/src/operators/CLogicalDynamicIndexGet.cpp
index 2c6b2919f9..2bafee3be6 100644
--- a/src/backend/gporca/libgpopt/src/operators/CLogicalDynamicIndexGet.cpp
+++ b/src/backend/gporca/libgpopt/src/operators/CLogicalDynamicIndexGet.cpp
@@ -180,6 +180,38 @@ CLogicalDynamicIndexGet::PopCopyWithRemappedColumns(
ppartcnstrRel);
}
+
+//---------------------------------------------------------------------------
+// @function:
+// CLogicalDynamicGet::DeriveOutputColumns
+//
+// @doc:
+// Derive output columns
+//
+//---------------------------------------------------------------------------
+CColRefSet *
+CLogicalDynamicIndexGet::DeriveOutputColumns(CMemoryPool *mp,
+ CExpressionHandle & // exprhdl
+)
+{
+ CColRefSet *pcrs = GPOS_NEW(mp) CColRefSet(mp);
+ for (ULONG i = 0; i < m_pdrgpcrOutput->Size(); i++)
+ {
+ // We want to limit the output columns to only those which are referenced in the query
+ // We will know the entire list of columns which are referenced in the query only after
+ // translating the entire DXL to an expression. Hence we should not limit the output columns
+ // before we have processed the entire DXL.
+ if ((*m_pdrgpcrOutput)[i]->GetUsage() == CColRef::EUsed ||
+ (*m_pdrgpcrOutput)[i]->GetUsage() == CColRef::EUnknown)
+ {
+ pcrs->Include((*m_pdrgpcrOutput)[i]);
+ }
+ }
+
+ return pcrs;
+}
+
+
// Checking if index is partial given the table descriptor and mdid of the index
BOOL
CLogicalDynamicIndexGet::IsPartialIndex(CTableDescriptor *ptabdesc, CREATE TABLE offers (
id int,
product int,
date date
) DISTRIBUTED BY (id);
INSERT INTO offers SELECT i/100, i, '2023-01-01'::date + (i||' min')::interval FROM generate_series(1, 1000) i;
CREATE TABLE contacts (
contact int,
id int,
date date
) DISTRIBUTED BY (id) PARTITION BY RANGE(date) (START (date '2023-01-01') INCLUSIVE END (date '2023-04-01') EXCLUSIVE EVERY (INTERVAL '1 month'));
INSERT INTO contacts SELECT i, i/100, '2023-01-01'::date + (i||' min')::interval FROM generate_series(1, 1000) i;
set optimizer_enforce_subplans = on;
set optimizer_enumerate_plans = on;
CREATE INDEX ON contacts USING btree(id);
set optimizer_plan_id = 13;
EXPLAIN (COSTS off, VERBOSE on)
SELECT id FROM offers WHERE EXISTS (
SELECT id FROM contacts WHERE id = 1
);
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Result
Output: offers.id
Filter: (SubPlan 1)
-> Gather Motion 3:1 (slice1; segments: 3)
Output: offers.id
-> Seq Scan on postgres.offers
Output: offers.id
SubPlan 1 (slice0)
-> Limit
Output: contacts.*
-> Materialize
Output: contacts.*
-> Gather Motion 3:1 (slice2; segments: 3)
Output: contacts.*
-> Limit
Output: contacts.*
-> Result
Output: contacts.*
-> Sequence
Output: contacts.*, contacts.id
-> Partition Selector for contacts (dynamic scan id: 1)
Partitions selected: 3 (out of 3)
-> Dynamic Seq Scan on postgres.contacts (dynamic scan id: 1)
Output: contacts.*, contacts.id
Filter: (contacts.id = 1)
Optimizer: Pivotal Optimizer (GPORCA)
Settings: optimizer=on
(27 rows)
set optimizer_plan_id = 14;
EXPLAIN (COSTS off, VERBOSE on)
SELECT id FROM offers WHERE EXISTS (
SELECT id FROM contacts WHERE id = 1
);
QUERY PLAN
--------------------------------------------------------------------------------------------------
Result
Output: offers.id
Filter: (SubPlan 1)
-> Gather Motion 3:1 (slice1; segments: 3)
Output: offers.id
-> Seq Scan on postgres.offers
Output: offers.id
SubPlan 1 (slice0)
-> Limit
Output: contacts.*
-> Materialize
Output: contacts.*
-> Gather Motion 3:1 (slice2; segments: 3)
Output: contacts.*
-> Result
Output: contacts.*
-> Sequence
Output: contacts.*, contacts.id
-> Partition Selector for contacts (dynamic scan id: 1)
Partitions selected: 3 (out of 3)
-> Dynamic Seq Scan on postgres.contacts (dynamic scan id: 1)
Output: contacts.*, contacts.id
Filter: (contacts.id = 1)
Optimizer: Pivotal Optimizer (GPORCA)
Settings: optimizer=on
(25 rows)
set optimizer_plan_id = 27;
EXPLAIN (COSTS off, VERBOSE on)
SELECT id FROM offers WHERE EXISTS (
SELECT id FROM contacts WHERE id = 1
);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice3; segments: 3)
Output: offers.id
-> Seq Scan on postgres.offers
Output: offers.id
Filter: (SubPlan 1)
SubPlan 1 (slice3; segments: 3)
-> Materialize
Output: contacts.*
-> Broadcast Motion 1:3 (slice2)
Output: contacts.*
-> Limit
Output: contacts.*
-> Gather Motion 3:1 (slice1; segments: 3)
Output: contacts.*
-> Limit
Output: contacts.*
-> Result
Output: contacts.*
-> Sequence
Output: contacts.*, contacts.id
-> Partition Selector for contacts (dynamic scan id: 1)
Partitions selected: 3 (out of 3)
-> Dynamic Seq Scan on postgres.contacts (dynamic scan id: 1)
Output: contacts.*, contacts.id
Filter: (contacts.id = 1)
Optimizer: Pivotal Optimizer (GPORCA)
Settings: optimizer=on
(27 rows)
set optimizer_plan_id = 28;
EXPLAIN (COSTS off, VERBOSE on)
SELECT id FROM offers WHERE EXISTS (
SELECT id FROM contacts WHERE id = 1
);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice3; segments: 3)
Output: offers.id
-> Seq Scan on postgres.offers
Output: offers.id
Filter: (SubPlan 1)
SubPlan 1 (slice3; segments: 3)
-> Materialize
Output: contacts.*
-> Broadcast Motion 1:3 (slice2)
Output: contacts.*
-> Limit
Output: contacts.*
-> Gather Motion 3:1 (slice1; segments: 3)
Output: contacts.*
-> Result
Output: contacts.*
-> Sequence
Output: contacts.*, contacts.id
-> Partition Selector for contacts (dynamic scan id: 1)
Partitions selected: 3 (out of 3)
-> Dynamic Seq Scan on postgres.contacts (dynamic scan id: 1)
Output: contacts.*, contacts.id
Filter: (contacts.id = 1)
Optimizer: Pivotal Optimizer (GPORCA)
Settings: optimizer=on
(25 rows) |
Allure report https://allure-ee.adsw.io/launch/62824 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1011576 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1011577 |
Failed job Regression tests with ORCA on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1011571 |
I'm ready to approve, but we don't have a proper commit message for this. Old message is related to old solution and probably bad formatted. |
For subqueries with EXISTS, the first column is always selected by ORCA. But if this column does not participate anywhere else in the query and is neither a system nor a distribution key, then it is not marked as used. Therefore, when translating to DXL, such a column is skipped, and in a debug build, such a plan is generally not allowed! On a release build, when a column is skipped, that column is not added to table_descr. And therefore, its number is not inserted into the m_colid_to_attno_map hashmap of the correspondence of the column number to the attribute number. And now the attribute number is not found by the column number. And so a variable with an unknown attribute number is created. Such a variable with an unknown attribute number is treated as a whole-row. In this case, the oid of this variable does not correspond to the composite type and an error occurs. The solution to this problem is to limit the output columns to only those referenced in the query for partitioned tables, as is done for non-partitioned tables. The Union-On-HJNs test changed because in its plan the unused column B of the partitioned table disappeared.
added empty commit |
Allure report https://allure-ee.adsw.io/launch/63286 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1033248 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1033249 |
Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1033242 |
Allure report https://allure-ee.adsw.io/launch/63349 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1036527 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1036528 |
Please, update the branch it's out of date |
Allure report https://allure-ee.adsw.io/launch/63746 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1051551 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1051552 |
ORCA generates incorrect plan with exists clause for partitioned table
For subqueries with EXISTS, the first column is always selected by ORCA.
But if this column does not participate anywhere else in the query and is neither a system nor a distribution key, then it is not marked as used.
Therefore, when translating to DXL, such a column is skipped, and in a debug build, such a plan is generally not allowed!
On a release build, when a column is skipped, that column is not added to table_descr.
And therefore, its number is not inserted into the m_colid_to_attno_map hashmap of the correspondence of the column number to the attribute number.
And now the attribute number is not found by the column number.
And so a variable with an unknown attribute number is created.
Such a variable with an unknown attribute number is treated as a whole-row.
In this case, the oid of this variable does not correspond to the composite type and an error occurs.
The solution to this problem is to limit the output columns to only those referenced in the query for partitioned tables, as is done for non-partitioned tables.
The Union-On-HJNs test changed because in its plan the unused column B of the partitioned table disappeared.
Reproduction on release build:
On a debug build, ORCA can't build a plan.
SIGSEGV, Segmentation fault on release build: