Fix use by ORCA of a newer index with HOT-chain in an older transacti…

…on. (#619) When heap tuple is updated by legacy planner and the updated tuple is placed at the same page (heap-only tuple, HOT), an update chain is created. It's a chain of updated tuples, in which each tuple's ctid points to the next tuple in the chain. HOT chains allow to store only one index entry, which points to the first tuple in the chain. And during Index Scan we pass through the chain, and the first tuple visible for the current transaction is taken (for more information, see src/backend/access/heap/README.HOT). If we create a second index on column that has been updated, it will store the ctid of the beginning of the existing HOT chain. If a repeatable read transaction started before the transaction in which the second index was created, then this index could be used in the query plan. As a result of the search for this index, a tuple could be found that does not meet the search condition (by a new value that is not visible to the transaction) In the case of the legacy planner, this problem is solved the following way: "To address this issue, regular (non-concurrent) CREATE INDEX makes the new index usable only by new transactions and transactions that don't have snapshots older than the CREATE INDEX command. This prevents queries that can see the inconsistent HOT chains from trying to use the new index and getting incorrect results. Queries that can see the index can only see the rows that were visible after the index was created, hence the HOT chains are consistent for them." But ORCA does not handle this case and can use an index with a broken HOT-chain. This patch resolves the issue for ORCA in the same way as legacy planner. During planning we ignore newly created indexes based on their xmin. Additionally, ORCA faced another related problem. Since ORCA has its own cache (MD Cache) and can cache a relation object without an index that cannot be used in the current snapshot (because MDCacheSetTransientState function returns true), we won't be able to use the index after the problematic snapshot changes. Therefore, we need to reset the cache after the snapshot changes in order to use index. This patch solves the problem in the following way: during index filtering, if we encounter an index that we cannot use, we save TransactionXmin in the mdcache_transaction_xmin variable. In the next queries, we check the saved xmin, and if it is valid and differs from the current one, we reset the cache. The create_index_hot test has also been changed. Now optimizer is turned off before the update. Since ORCA always uses Split Update, in which case HOT chains are not created and the problem is not reproduced. And that's why ORCA wasn't actually tested before.
arenadata · Apr 2, 2024 · 8f7cb88 · 8f7cb88
1 parent db94b62
commit 8f7cb88
Show file tree

Hide file tree

Showing 8 changed files with 82 additions and 7 deletions.
diff --git a/src/backend/gpopt/gpdbwrappers.cpp b/src/backend/gpopt/gpdbwrappers.cpp
@@ -36,6 +36,7 @@ extern "C" {
 #include "parser/parse_clause.h"
 #include "parser/parse_oper.h"
 #include "utils/memutils.h"
+#include "utils/snapmgr.h"
 }
 #define GP_WRAP_START                                            \
 	sigjmp_buf local_sigjmp_buf;                                 \
@@ -2514,6 +2515,13 @@ static bool mdcache_invalidation_counter_registered = false;
 static int64 mdcache_invalidation_counter = 0;
 static int64 last_mdcache_invalidation_counter = 0;
 
+// If we have cached a relation without an index, because that index cannot
+// be used in the current snapshot (for more info see
+// src/backend/access/heap/README.HOT), we save TransactionXmin. If
+// TransactionXmin changes later, the cache will be reset and the relation will
+// be reloaded with that index.
+static TransactionId mdcache_transaction_xmin = InvalidTransactionId;
+
 static void
 mdsyscache_invalidation_counter_callback(Datum arg, int cacheid,
 										 uint32 hashvalue)
@@ -2595,7 +2603,8 @@ register_mdcache_invalidation_callbacks(void)
 								  (Datum) 0);
 }
 
-// Has there been any catalog changes since last call?
+// We reset the cache in case of a catalog change or if TransactionXmin changed
+// from that we save in mdcache_transaction_xmin.
 bool
 gpdb::MDCacheNeedsReset(void)
 {
@@ -2607,7 +2616,11 @@ gpdb::MDCacheNeedsReset(void)
 			mdcache_invalidation_counter_registered = true;
 		}
 		if (last_mdcache_invalidation_counter == mdcache_invalidation_counter)
-			return false;
+		{
+			return TransactionIdIsValid(mdcache_transaction_xmin) &&
+				   !TransactionIdEquals(TransactionXmin,
+										mdcache_transaction_xmin);
+		}
 		else
 		{
 			last_mdcache_invalidation_counter = mdcache_invalidation_counter;
@@ -2619,6 +2632,42 @@ gpdb::MDCacheNeedsReset(void)
 	return true;
 }
 
+bool
+gpdb::MDCacheSetTransientState(Relation index_rel)
+{
+	GP_WRAP_START;
+	{
+		bool result =
+			index_rel->rd_index->indcheckxmin &&
+			!TransactionIdPrecedes(
+				HeapTupleHeaderGetXmin(index_rel->rd_indextuple->t_data),
+				TransactionXmin);
+		if (result)
+			mdcache_transaction_xmin = TransactionXmin;
+		return result;
+	}
+	GP_WRAP_END;
+	// ignore index if we can't check it visibility for some reason
+	return true;
+}
+
+void
+gpdb::MDCacheResetTransientState(void)
+{
+	mdcache_transaction_xmin = InvalidTransactionId;
+}
+
+bool
+gpdb::MDCacheInTransientState(void)
+{
+	GP_WRAP_START;
+	{
+		return TransactionIdIsValid(mdcache_transaction_xmin);
+	}
+	GP_WRAP_END;
+	return false;
+}
+
 // returns true if a query cancel is requested in GPDB
 bool
 gpdb::IsAbortRequested(void)

diff --git a/src/backend/gpopt/translate/CTranslatorDXLToPlStmt.cpp b/src/backend/gpopt/translate/CTranslatorDXLToPlStmt.cpp
@@ -310,6 +310,8 @@ CTranslatorDXLToPlStmt::GetPlannedStmtFromDXL(const CDXLNode *dxlnode,
 		}
 	}
 
+	planned_stmt->transientPlan = gpdb::MDCacheInTransientState();
+
 	return planned_stmt;
 }
 

diff --git a/src/backend/gpopt/translate/CTranslatorRelcacheToDXL.cpp b/src/backend/gpopt/translate/CTranslatorRelcacheToDXL.cpp
@@ -308,7 +308,11 @@ CTranslatorRelcacheToDXL::RetrieveRelIndexInfoForPartTable(CMemoryPool *mp,
 
 		GPOS_TRY
 		{
-			if (IsIndexSupported(index_rel))
+			// If the index is supported, but cannot yet be used, ignore it; but
+			// mark the plan we are generating and cache as transient.
+			// See src/backend/access/heap/README.HOT for discussion.
+			if (IsIndexSupported(index_rel) &&
+				!gpdb::MDCacheSetTransientState(index_rel))
 			{
 				CMDIdGPDB *mdid_index =
 					GPOS_NEW(mp) CMDIdGPDB(IMDId::EmdidInd, index_oid);
@@ -364,7 +368,11 @@ CTranslatorRelcacheToDXL::RetrieveRelIndexInfoForNonPartTable(CMemoryPool *mp,
 
 		GPOS_TRY
 		{
-			if (IsIndexSupported(index_rel))
+			// If the index is supported, but cannot yet be used, ignore it; but
+			// mark the plan we are generating and cache as transient.
+			// See src/backend/access/heap/README.HOT for discussion.
+			if (IsIndexSupported(index_rel) &&
+				!gpdb::MDCacheSetTransientState(index_rel))
 			{
 				CMDIdGPDB *mdid_index =
 					GPOS_NEW(mp) CMDIdGPDB(IMDId::EmdidInd, index_oid);

diff --git a/src/backend/gpopt/utils/COptTasks.cpp b/src/backend/gpopt/utils/COptTasks.cpp
@@ -498,11 +498,13 @@ COptTasks::OptimizeTask(void *ptr)
 	{
 		CMDCache::Init();
 		CMDCache::SetCacheQuota(optimizer_mdcache_size * 1024L);
+		gpdb::MDCacheResetTransientState();
 	}
 	else if (reset_mdcache)
 	{
 		CMDCache::Reset();
 		CMDCache::SetCacheQuota(optimizer_mdcache_size * 1024L);
+		gpdb::MDCacheResetTransientState();
 	}
 	else if (CMDCache::ULLGetCacheQuota() !=
 			 (ULLONG) optimizer_mdcache_size * 1024L)

diff --git a/src/backend/optimizer/plan/orca.c b/src/backend/optimizer/plan/orca.c
@@ -241,7 +241,6 @@ optimize_query(Query *parse, ParamListInfo boundParams)
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->oneoffPlan = glob->oneoffPlan;
-	result->transientPlan = glob->transientPlan;
 
 	return result;
 }
diff --git a/src/include/gpopt/gpdbwrappers.h b/src/include/gpopt/gpdbwrappers.h
@@ -684,9 +684,20 @@ FaultInjectorType_e InjectFaultInOptTasks(const char *fault_name);
 gpos::ULONG CountLeafPartTables(Oid oidRelation);
 
 // Does the metadata cache need to be reset (because of a catalog
-// table has been changed?)
+// table has been changed or TransactionXmin changed from that we saved)?
 bool MDCacheNeedsReset(void);
 
+// Check that the index is usable in the current snapshot and if not, save the
+// xmin of the current snapshot. Returns true if the index is not usable and
+// should be skipped.
+bool MDCacheSetTransientState(Relation index_rel);
+
+// reset TransactionXmin value that we saved
+void MDCacheResetTransientState(void);
+
+// returns true if cache is in transient state
+bool MDCacheInTransientState(void);
+
 // returns true if a query cancel is requested in GPDB
 bool IsAbortRequested(void);
 

diff --git a/src/test/isolation/expected/create_index_hot.out b/src/test/isolation/expected/create_index_hot.out
@@ -1,13 +1,14 @@
 Parsed test spec with 2 sessions
 
-starting permutation: s2begin s2select s1update s1createindexonc s2select s2forceindexscan s2select
+starting permutation: s2begin s2select s1optimizeroff s1update s1createindexonc s2select s2forceindexscan s2select
 step s2begin: BEGIN ISOLATION LEVEL SERIALIZABLE;
 step s2select: select '#' as expected, c from hot where c = '#'
                   union all
                   select '$', c from hot where c = '$';
 expected       c              
 
 #              #              
+step s1optimizeroff: set optimizer = off;
 step s1update: update hot set c = '$' where c = '#';
 step s1createindexonc: create index idx_c on hot (c);
 step s2select: select '#' as expected, c from hot where c = '#'

diff --git a/src/test/isolation/specs/create_index_hot.spec b/src/test/isolation/specs/create_index_hot.spec
@@ -23,7 +23,9 @@ teardown
 
 # Update a row, and create an index on the updated column. This produces
 # a broken HOT chain.
+#FIXME do not turn off the optimizer when ORCA stops always using Split Update.
 session "s1"
+step "s1optimizeroff" { set optimizer = off; }
 step "s1update" { update hot set c = '$' where c = '#'; }
 step "s1createindexonc" { create index idx_c on hot (c); }
 
@@ -39,6 +41,7 @@ permutation
   "s2begin"
   "s2select"
 
+  "s1optimizeroff"
   "s1update"
   "s1createindexonc"
-Original file line number
+Diff line change
@@ Expand Up @@
     		}
     	}
+    	planned_stmt->transientPlan = gpdb::MDCacheInTransientState();
     	return planned_stmt;
     }
@@ Expand Down @@