Skip to content

Commit

Permalink
Fix use by ORCA of a newer index with HOT-chain in an older transacti…
Browse files Browse the repository at this point in the history
…on. (#619)

When heap tuple is updated by legacy planner and the updated tuple is
placed at the same page (heap-only tuple, HOT), an update chain is created.
It's a chain of updated tuples, in which each tuple's ctid points to the next
tuple in the chain.

HOT chains allow to store only one index entry, which points to the first tuple
in the chain. And during Index Scan we pass through the chain, and the first
tuple visible for the current transaction is taken (for more information, see
src/backend/access/heap/README.HOT).

If we create a second index on column that has been updated, it will store the
ctid of the beginning of the existing HOT chain. If a repeatable read
transaction started before the transaction in which the second index was
created, then this index could be used in the query plan. As a result of the
search for this index, a tuple could be found that does not meet the search
condition (by a new value that is not visible to the transaction)

In the case of the legacy planner, this problem is solved the following way:

"To address this issue, regular (non-concurrent) CREATE INDEX makes the
new index usable only by new transactions and transactions that don't
have snapshots older than the CREATE INDEX command. This prevents
queries that can see the inconsistent HOT chains from trying to use the
new index and getting incorrect results. Queries that can see the index
can only see the rows that were visible after the index was created,
hence the HOT chains are consistent for them."

But ORCA does not handle this case and can use an index with a broken HOT-chain.

This patch resolves the issue for ORCA in the same way as legacy planner. During
planning we ignore newly created indexes based on their xmin.

Additionally, ORCA faced another related problem. Since ORCA has its own cache
(MD Cache) and can cache a relation object without an index that cannot be used
in the current snapshot (because MDCacheSetTransientState function returns true),
we won't be able to use the index after the problematic snapshot changes.
Therefore, we need to reset the cache after the snapshot changes in order to use
index.

This patch solves the problem in the following way: during index filtering, if
we encounter an index that we cannot use, we save TransactionXmin in the
mdcache_transaction_xmin variable. In the next queries, we check the saved xmin,
and if it is valid and differs from the current one, we reset the cache.

The create_index_hot test has also been changed. Now optimizer is turned off
before the update. Since ORCA always uses Split Update, in which case HOT chains
are not created and the problem is not reproduced. And that's why ORCA wasn't
actually tested before.
  • Loading branch information
KnightMurloc authored and bandetto committed Apr 1, 2024
1 parent ec9afd9 commit d883ea8
Show file tree
Hide file tree
Showing 8 changed files with 82 additions and 7 deletions.
53 changes: 51 additions & 2 deletions src/backend/gpopt/gpdbwrappers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ extern "C" {
#include "parser/parse_clause.h"
#include "parser/parse_oper.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
}
#define GP_WRAP_START \
sigjmp_buf local_sigjmp_buf; \
Expand Down Expand Up @@ -2514,6 +2515,13 @@ static bool mdcache_invalidation_counter_registered = false;
static int64 mdcache_invalidation_counter = 0;
static int64 last_mdcache_invalidation_counter = 0;

// If we have cached a relation without an index, because that index cannot
// be used in the current snapshot (for more info see
// src/backend/access/heap/README.HOT), we save TransactionXmin. If
// TransactionXmin changes later, the cache will be reset and the relation will
// be reloaded with that index.
static TransactionId mdcache_transaction_xmin = InvalidTransactionId;

static void
mdsyscache_invalidation_counter_callback(Datum arg, int cacheid,
uint32 hashvalue)
Expand Down Expand Up @@ -2595,7 +2603,8 @@ register_mdcache_invalidation_callbacks(void)
(Datum) 0);
}

// Has there been any catalog changes since last call?
// We reset the cache in case of a catalog change or if TransactionXmin changed
// from that we save in mdcache_transaction_xmin.
bool
gpdb::MDCacheNeedsReset(void)
{
Expand All @@ -2607,7 +2616,11 @@ gpdb::MDCacheNeedsReset(void)
mdcache_invalidation_counter_registered = true;
}
if (last_mdcache_invalidation_counter == mdcache_invalidation_counter)
return false;
{
return TransactionIdIsValid(mdcache_transaction_xmin) &&
!TransactionIdEquals(TransactionXmin,
mdcache_transaction_xmin);
}
else
{
last_mdcache_invalidation_counter = mdcache_invalidation_counter;
Expand All @@ -2619,6 +2632,42 @@ gpdb::MDCacheNeedsReset(void)
return true;
}

bool
gpdb::MDCacheSetTransientState(Relation index_rel)
{
GP_WRAP_START;
{
bool result =
index_rel->rd_index->indcheckxmin &&
!TransactionIdPrecedes(
HeapTupleHeaderGetXmin(index_rel->rd_indextuple->t_data),
TransactionXmin);
if (result)
mdcache_transaction_xmin = TransactionXmin;
return result;
}
GP_WRAP_END;
// ignore index if we can't check it visibility for some reason
return true;
}

void
gpdb::MDCacheResetTransientState(void)
{
mdcache_transaction_xmin = InvalidTransactionId;
}

bool
gpdb::MDCacheInTransientState(void)
{
GP_WRAP_START;
{
return TransactionIdIsValid(mdcache_transaction_xmin);
}
GP_WRAP_END;
return false;
}

// returns true if a query cancel is requested in GPDB
bool
gpdb::IsAbortRequested(void)
Expand Down
2 changes: 2 additions & 0 deletions src/backend/gpopt/translate/CTranslatorDXLToPlStmt.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,8 @@ CTranslatorDXLToPlStmt::GetPlannedStmtFromDXL(const CDXLNode *dxlnode,
}
}

planned_stmt->transientPlan = gpdb::MDCacheInTransientState();

return planned_stmt;
}

Expand Down
12 changes: 10 additions & 2 deletions src/backend/gpopt/translate/CTranslatorRelcacheToDXL.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,11 @@ CTranslatorRelcacheToDXL::RetrieveRelIndexInfoForPartTable(CMemoryPool *mp,

GPOS_TRY
{
if (IsIndexSupported(index_rel))
// If the index is supported, but cannot yet be used, ignore it; but
// mark the plan we are generating and cache as transient.
// See src/backend/access/heap/README.HOT for discussion.
if (IsIndexSupported(index_rel) &&
!gpdb::MDCacheSetTransientState(index_rel))
{
CMDIdGPDB *mdid_index =
GPOS_NEW(mp) CMDIdGPDB(IMDId::EmdidInd, index_oid);
Expand Down Expand Up @@ -364,7 +368,11 @@ CTranslatorRelcacheToDXL::RetrieveRelIndexInfoForNonPartTable(CMemoryPool *mp,

GPOS_TRY
{
if (IsIndexSupported(index_rel))
// If the index is supported, but cannot yet be used, ignore it; but
// mark the plan we are generating and cache as transient.
// See src/backend/access/heap/README.HOT for discussion.
if (IsIndexSupported(index_rel) &&
!gpdb::MDCacheSetTransientState(index_rel))
{
CMDIdGPDB *mdid_index =
GPOS_NEW(mp) CMDIdGPDB(IMDId::EmdidInd, index_oid);
Expand Down
2 changes: 2 additions & 0 deletions src/backend/gpopt/utils/COptTasks.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -498,11 +498,13 @@ COptTasks::OptimizeTask(void *ptr)
{
CMDCache::Init();
CMDCache::SetCacheQuota(optimizer_mdcache_size * 1024L);
gpdb::MDCacheResetTransientState();
}
else if (reset_mdcache)
{
CMDCache::Reset();
CMDCache::SetCacheQuota(optimizer_mdcache_size * 1024L);
gpdb::MDCacheResetTransientState();
}
else if (CMDCache::ULLGetCacheQuota() !=
(ULLONG) optimizer_mdcache_size * 1024L)
Expand Down
1 change: 0 additions & 1 deletion src/backend/optimizer/plan/orca.c
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,6 @@ optimize_query(Query *parse, ParamListInfo boundParams)
result->relationOids = glob->relationOids;
result->invalItems = glob->invalItems;
result->oneoffPlan = glob->oneoffPlan;
result->transientPlan = glob->transientPlan;

return result;
}
13 changes: 12 additions & 1 deletion src/include/gpopt/gpdbwrappers.h
Original file line number Diff line number Diff line change
Expand Up @@ -684,9 +684,20 @@ FaultInjectorType_e InjectFaultInOptTasks(const char *fault_name);
gpos::ULONG CountLeafPartTables(Oid oidRelation);

// Does the metadata cache need to be reset (because of a catalog
// table has been changed?)
// table has been changed or TransactionXmin changed from that we saved)?
bool MDCacheNeedsReset(void);

// Check that the index is usable in the current snapshot and if not, save the
// xmin of the current snapshot. Returns true if the index is not usable and
// should be skipped.
bool MDCacheSetTransientState(Relation index_rel);

// reset TransactionXmin value that we saved
void MDCacheResetTransientState(void);

// returns true if cache is in transient state
bool MDCacheInTransientState(void);

// returns true if a query cancel is requested in GPDB
bool IsAbortRequested(void);

Expand Down
3 changes: 2 additions & 1 deletion src/test/isolation/expected/create_index_hot.out
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
Parsed test spec with 2 sessions

starting permutation: s2begin s2select s1update s1createindexonc s2select s2forceindexscan s2select
starting permutation: s2begin s2select s1optimizeroff s1update s1createindexonc s2select s2forceindexscan s2select
step s2begin: BEGIN ISOLATION LEVEL SERIALIZABLE;
step s2select: select '#' as expected, c from hot where c = '#'
union all
select '$', c from hot where c = '$';
expected c

# #
step s1optimizeroff: set optimizer = off;
step s1update: update hot set c = '$' where c = '#';
step s1createindexonc: create index idx_c on hot (c);
step s2select: select '#' as expected, c from hot where c = '#'
Expand Down
3 changes: 3 additions & 0 deletions src/test/isolation/specs/create_index_hot.spec
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@ teardown

# Update a row, and create an index on the updated column. This produces
# a broken HOT chain.
#FIXME do not turn off the optimizer when ORCA stops always using Split Update.
session "s1"
step "s1optimizeroff" { set optimizer = off; }
step "s1update" { update hot set c = '$' where c = '#'; }
step "s1createindexonc" { create index idx_c on hot (c); }

Expand All @@ -39,6 +41,7 @@ permutation
"s2begin"
"s2select"

"s1optimizeroff"
"s1update"
"s1createindexonc"

Expand Down

0 comments on commit d883ea8

Please sign in to comment.