Execute coordinator piece of DDJLobs immediately #7325

onurctirtir · 2023-11-06T11:51:06Z

No description provided.

codecov · 2023-11-06T11:57:42Z

Codecov Report

Merging #7325 (214ea8b) into main (240313e) will decrease coverage by 0.57%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #7325      +/-   ##
==========================================
- Coverage   89.58%   89.01%   -0.57%     
==========================================
  Files         275      275              
  Lines       59613    59664      +51     
  Branches     7428     7438      +10     
==========================================
- Hits        53403    53109     -294     
- Misses       4079     4423     +344     
- Partials     2131     2132       +1

JelteF · 2023-11-06T12:01:02Z

src/backend/distributed/commands/utility_hook.c

@@ -632,6 +637,32 @@ citus_ProcessUtilityInternal(PlannedStmt *pstmt,
 		{
 			ddlJobs = ops->preprocess(parsetree, queryString, context);
 		}
+
+		if (ddlJobs != NIL)


I guess this isn't a completely generic solution at this point, because it does not work when postprocess returns the ddljobs. So for DDL where postprocess is the one that's necessary we'd need some other solution (e.g. only taking the locks on the coordinator and not executing the SQL yet). I think this PR seems like a good first step though, and the postprocess one could be solved in a follow up PR.

I guess this isn't a completely generic solution at this point, because it does not work when postprocess returns the ddljobs.

Correct.

So for DDL where postprocess is the one that's necessary we'd need some other solution (e.g. only taking the locks on the coordinator and not executing the SQL yet).

Right, we either need this^ or need to move Postprocess pieces to Preprocess stage for the commands we want to support from any-node.

onderkalaci · 2023-11-06T12:21:04Z

src/backend/distributed/commands/utility_hook.c

+
+	if (*coordinatorDDLJob && *nonCoordinatorNodesDDLJob)
+	{
+		/* defer the metadata sync to "other nodes" phase */


The PR doesn't have any description, so I cannot follow the goal.

But, isn't this complicated to reason about? E.g., now, some of the DDLJobs are different.

If the goal is to support DDL from any node, wouldn't it be simpler to have something like: #7016

Afaict #7016 would not be enough to solve the problem. My thinking is that if we execute standardProcessUtility first on the current node, that will actually take locks on distributed shell table on the current node first (which could be a worker). And instead we want those locks to be taken on the coordinator first to avoid deadlocks. Sorting the workers where to send DDL would prevent deadlocks when actually changing the shards, but it will not prevent deadlocks that occur when changing the distributed shell table.

This PR or #7016 or #6021 are both prone to distributed deadlocks. I think we should seek for simplicity at this point unless we want to implement DDL from any node properly -- which we didn't know how to do at the time

I think it could even be more acceptable to generalize #6021, it is at least easier to reason about.

To avoid the deadlocks, we could be more aggressive such that we acquire a global advisory lock on the first worker node per OID of the object changed. Such approach differs from PG in terms of concurrency, but at least simple and easy to explain to user (e.g., we allow one DDL from workers at a time).

Well, this is a tough problem, and hope doesn't create too much complexity in the code & architecture.

To avoid the deadlocks, we could be more aggressive such that we acquire a global advisory lock on the first worker node per OID of the object changed.

This would help avoiding from the deadlocks for the cases we acquire this adv lock. However, say if distributed_ddl (that's allowed from any node) can get into a distributed deadlock with citus_operation() (that's only allowed from coordinator), then we need to not forget about acquiring this adv lock in those operations too.

onderkalaci · 2023-11-06T12:26:24Z

src/backend/distributed/commands/utility_hook.c

+
+				if (coordinatorJob)
+				{
+					ExecuteDistributedDDLJob(coordinatorJob);


Just fyi:

There are some good reasons for running DDL commands via standardProcessUtility locally at first. For example, if a command fails, you get the error on the table instead of on the shards -- which has been very confusing.

Similarly, if there are permission problems or any other problems, Citus does not use any resources (e.g., connections) before doing things locally.

if a command fails, you get the error on the table instead of on the shards -- which has been very confusing.

That's indeed a good goal, but I don't think that's a big issue as long as we send the actual metadata updating DDL command first to the coordinator (instead of a shard command). You'd still get the "while running on coordinator.some.host.com:5432" DETAIL, but that seems rather harmless.

Similarly, if there are permission problems or any other problems, Citus does not use any resources (e.g., connections) before doing things locally.

I don't think this is an issue we really need to worry about. If an attacker is trying to exhaust resources, then this does not seem like a very smart way of doing so (e.g. run_command_on_workers + generate_series + cross join seems a much easier way to bring stuff down). And if it's by accident I don't think opening a connection is a huge resource hog, and probably a connection is already cached anyway.

That's indeed a good goal, but I don't think that's a big issue as long as we send the actual metadata updating DDL command first to the coordinator (instead of a shard command). You'd still get the "while running on coordinator.some.host.com:5432", but that seems rather harmless.

That caused quite a bit of confusion for the users in the past, so I would not underestimate impact of that UX change

onderkalaci · 2023-11-06T12:29:02Z

src/backend/distributed/commands/utility_hook.c

+	(*nonCoordinatorNodesDDLJob)->taskList = NIL;
+
+	Task *task = NULL;
+	foreach_ptr(task, ddlJob->taskList)


I have not thought enough about this, but I sense some possible deadlock scenarios if we run DDL from workers concurrently, so let's be careful about that (e.g., run DDL commands from workers via pgbench and make sure there are no deadlocks),

DESCRIPTION: Adds support from issuing role management commands from worker nodes It's unlikely to get into a distributed deadlock with role commands, we don't care much about them at the moment. There were several attempts to reduce the chances of a deadlock but we didn't any of them merged into main branch yet, see: #7325 #7016 #7009

onurctirtir marked this pull request as draft November 6, 2023 11:56

JelteF reviewed Nov 6, 2023

View reviewed changes

onderkalaci reviewed Nov 6, 2023

View reviewed changes

onurctirtir force-pushed the create_db_from_any_node branch from 2f1d068 to 78be00b Compare November 10, 2023 09:41

onurctirtir mentioned this pull request Nov 10, 2023

Support role commands from any node #7278

Merged

Base automatically changed from create_db_from_any_node to main November 10, 2023 09:58

Execute coordinator tasks immediately

214ea8b

onurctirtir force-pushed the prevent-deadlocks branch from 6bab65a to 214ea8b Compare November 10, 2023 17:31

onurctirtir closed this Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execute coordinator piece of DDJLobs immediately #7325

Execute coordinator piece of DDJLobs immediately #7325

onurctirtir commented Nov 6, 2023

codecov bot commented Nov 6, 2023 •

edited

Loading

JelteF Nov 6, 2023

onurctirtir Nov 6, 2023

onderkalaci Nov 6, 2023

JelteF Nov 6, 2023

onderkalaci Nov 6, 2023

onurctirtir Nov 6, 2023

onderkalaci Nov 6, 2023

JelteF Nov 6, 2023 •

edited

Loading

onderkalaci Nov 6, 2023

onderkalaci Nov 6, 2023

Execute coordinator piece of DDJLobs immediately #7325

Execute coordinator piece of DDJLobs immediately #7325

Conversation

onurctirtir commented Nov 6, 2023

codecov bot commented Nov 6, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JelteF Nov 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 6, 2023 •

edited

Loading

JelteF Nov 6, 2023 •

edited

Loading