Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute coordinator piece of DDJLobs immediately #7325

Closed
wants to merge 1 commit into from

Conversation

onurctirtir
Copy link
Member

No description provided.

@onurctirtir onurctirtir marked this pull request as draft November 6, 2023 11:56
Copy link

codecov bot commented Nov 6, 2023

Codecov Report

Merging #7325 (214ea8b) into main (240313e) will decrease coverage by 0.57%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #7325      +/-   ##
==========================================
- Coverage   89.58%   89.01%   -0.57%     
==========================================
  Files         275      275              
  Lines       59613    59664      +51     
  Branches     7428     7438      +10     
==========================================
- Hits        53403    53109     -294     
- Misses       4079     4423     +344     
- Partials     2131     2132       +1     

@@ -632,6 +637,32 @@ citus_ProcessUtilityInternal(PlannedStmt *pstmt,
{
ddlJobs = ops->preprocess(parsetree, queryString, context);
}

if (ddlJobs != NIL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this isn't a completely generic solution at this point, because it does not work when postprocess returns the ddljobs. So for DDL where postprocess is the one that's necessary we'd need some other solution (e.g. only taking the locks on the coordinator and not executing the SQL yet). I think this PR seems like a good first step though, and the postprocess one could be solved in a follow up PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this isn't a completely generic solution at this point, because it does not work when postprocess returns the ddljobs.

Correct.

So for DDL where postprocess is the one that's necessary we'd need some other solution (e.g. only taking the locks on the coordinator and not executing the SQL yet).

Right, we either need this^ or need to move Postprocess pieces to Preprocess stage for the commands we want to support from any-node.


if (*coordinatorDDLJob && *nonCoordinatorNodesDDLJob)
{
/* defer the metadata sync to "other nodes" phase */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR doesn't have any description, so I cannot follow the goal.

But, isn't this complicated to reason about? E.g., now, some of the DDLJobs are different.

If the goal is to support DDL from any node, wouldn't it be simpler to have something like: #7016

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaict #7016 would not be enough to solve the problem. My thinking is that if we execute standardProcessUtility first on the current node, that will actually take locks on distributed shell table on the current node first (which could be a worker). And instead we want those locks to be taken on the coordinator first to avoid deadlocks. Sorting the workers where to send DDL would prevent deadlocks when actually changing the shards, but it will not prevent deadlocks that occur when changing the distributed shell table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR or #7016 or #6021 are both prone to distributed deadlocks. I think we should seek for simplicity at this point unless we want to implement DDL from any node properly -- which we didn't know how to do at the time

I think it could even be more acceptable to generalize #6021, it is at least easier to reason about.

To avoid the deadlocks, we could be more aggressive such that we acquire a global advisory lock on the first worker node per OID of the object changed. Such approach differs from PG in terms of concurrency, but at least simple and easy to explain to user (e.g., we allow one DDL from workers at a time).

Well, this is a tough problem, and hope doesn't create too much complexity in the code & architecture.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the deadlocks, we could be more aggressive such that we acquire a global advisory lock on the first worker node per OID of the object changed.

This would help avoiding from the deadlocks for the cases we acquire this adv lock. However, say if distributed_ddl (that's allowed from any node) can get into a distributed deadlock with citus_operation() (that's only allowed from coordinator), then we need to not forget about acquiring this adv lock in those operations too.


if (coordinatorJob)
{
ExecuteDistributedDDLJob(coordinatorJob);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just fyi:

There are some good reasons for running DDL commands via standardProcessUtility locally at first. For example, if a command fails, you get the error on the table instead of on the shards -- which has been very confusing.

Similarly, if there are permission problems or any other problems, Citus does not use any resources (e.g., connections) before doing things locally.

Copy link
Contributor

@JelteF JelteF Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a command fails, you get the error on the table instead of on the shards -- which has been very confusing.

That's indeed a good goal, but I don't think that's a big issue as long as we send the actual metadata updating DDL command first to the coordinator (instead of a shard command). You'd still get the "while running on coordinator.some.host.com:5432" DETAIL, but that seems rather harmless.

Similarly, if there are permission problems or any other problems, Citus does not use any resources (e.g., connections) before doing things locally.

I don't think this is an issue we really need to worry about. If an attacker is trying to exhaust resources, then this does not seem like a very smart way of doing so (e.g. run_command_on_workers + generate_series + cross join seems a much easier way to bring stuff down). And if it's by accident I don't think opening a connection is a huge resource hog, and probably a connection is already cached anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's indeed a good goal, but I don't think that's a big issue as long as we send the actual metadata updating DDL command first to the coordinator (instead of a shard command). You'd still get the "while running on coordinator.some.host.com:5432", but that seems rather harmless.

That caused quite a bit of confusion for the users in the past, so I would not underestimate impact of that UX change

(*nonCoordinatorNodesDDLJob)->taskList = NIL;

Task *task = NULL;
foreach_ptr(task, ddlJob->taskList)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not thought enough about this, but I sense some possible deadlock scenarios if we run DDL from workers concurrently, so let's be careful about that (e.g., run DDL commands from workers via pgbench and make sure there are no deadlocks),

@onurctirtir onurctirtir force-pushed the create_db_from_any_node branch from 2f1d068 to 78be00b Compare November 10, 2023 09:41
Base automatically changed from create_db_from_any_node to main November 10, 2023 09:58
onurctirtir added a commit that referenced this pull request Nov 10, 2023
DESCRIPTION: Adds support from issuing role management commands from worker nodes

It's unlikely to get into a distributed deadlock with role commands, we
don't care much about them at the moment.
There were several attempts to reduce the chances of a deadlock but we
didn't any of them merged into main branch yet, see:
#7325
#7016
#7009
francisjodi pushed a commit that referenced this pull request Nov 13, 2023
DESCRIPTION: Adds support from issuing role management commands from worker nodes

It's unlikely to get into a distributed deadlock with role commands, we
don't care much about them at the moment.
There were several attempts to reduce the chances of a deadlock but we
didn't any of them merged into main branch yet, see:
#7325
#7016
#7009
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants