replicator: fixed infinity loop on sync replication #223

catap · 2024-09-25T09:06:59Z

Before this fix, replicator adds the same lookup into callbacks over and over, until it reaches out of memory, and crashes.

This regression was introduced 447e086 with initial implementation of replicator.

cmouse · 2024-09-25T09:15:26Z

Thank you for the merge request, we'll take it into consideration.

catap · 2024-09-25T09:28:37Z

@cmouse thanks for the very quick response!

I understand that replicator was quite unstable.

Frankly speaking, my thread where I debug this issue for about 10 months is good explanation why you removed it from 2.4.

But, I have cluster and will to make it stable... may I ask you to consider to put replicator back in 2.4? Or at least restore some API that allows me to build it as a dedicated plugin.

As you can see, I'm quite motivated to make it stable enough to work :)

cmouse · 2024-09-25T09:37:49Z

Yeah. The only problem is that there is no replicator in 2.4. But lets see if we can have this in 2.3

catap · 2024-09-25T09:43:23Z

@cmouse because you had removed it at 4c04e4c :)

Maybe recover it when it finally works? For some setups, this feature can be a deal breaker.

catap · 2024-09-25T10:37:07Z

@cmouse this code contains a condition:

		if (lookups[i].wait_for_next_push) {
			/* another sync request came while user was being
			   replicated */
			i_assert(user->priority == REPLICATION_PRIORITY_SYNC);
			lookups[i].wait_for_next_push = FALSE;
		}

if I understand the replicator code base right, when timeout is happened it may switch the priority into something else.

So, this abort will be triggered. Probably the right code is something like:

		if (lookups[i].wait_for_next_push) {
			/* another sync request came while user was being
			   replicated */
			if (user->priority != REPLICATION_PRIORITY_SYNC)
				continue;
			lookups[i].wait_for_next_push = FALSE;
		}

catap · 2024-09-25T10:54:52Z

BTW I deployed a version with changes from #223 (comment) and I let you posted how it behaves

catap · 2024-09-25T11:54:51Z

Well, it survives network issue without any crash:

Sep 25 13:42:20 mx1 dovecot: lmtp([email protected])<74439><nFEoIhL382bHIgEAeeMcAQ>: Warning: replication([email protected]): Sync failure: Timeout in 10 secs

Before this fix, replicator adds the same lookup into callbacks over and over, until it reaches out of memory, and crashes. This regression was introduced 447e086 with initial implementation of replicator.

catap · 2024-09-25T16:47:03Z

@cmouse after some aditional thought I had simplified this changes to fixing a typo: missed count--. If we delete element, we need to iterate for one element less over array.

catap · 2024-09-27T13:03:40Z

@cmouse I continue track an issue where order UID inside vurtual folders are messed up.

Probably the cause of it is... What virtual forlder should be excluded from synchronisation, because otherwise synchronisation fails with hardcoded error: Can't create virtual mailboxes (see: virtual_mailbox_create).

So, here it seems that I don't understand how should it works. With -x virtual. synchronisation works, but it will mess UID inside virtual folders, and without it... it simple fails.

Have I missed something?

catap · 2024-09-29T22:01:01Z

After reading https://dovecot.org/mailman3/archives/list/[email protected]/message/5LYQGL2ZOMDDUFUOJU2TUBLLQUWEPZSR/ and near here.... I understand that replicator won't be recoverd.

@cmouse anyway, may I kindly ask you to merge this small typo? With hope that it will be included into the next release of 2.3 branch.

cmouse · 2024-11-01T08:58:58Z

We will consider it, there is no plan for next 2.3 release at the moment.

catap · 2024-11-27T08:37:30Z

@cmouse thus, if you plan to release 2.3, may I ask you to consider to include this one as well?

replicator: fixed infinity loop on sync replication

f2bd4ec

Before this fix, replicator adds the same lookup into callbacks over and over, until it reaches out of memory, and crashes. This regression was introduced 447e086 with initial implementation of replicator.

catap force-pushed the release-2.3 branch from 44803b8 to f2bd4ec Compare September 25, 2024 16:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replicator: fixed infinity loop on sync replication #223

replicator: fixed infinity loop on sync replication #223

catap commented Sep 25, 2024

cmouse commented Sep 25, 2024 •

edited

Loading

catap commented Sep 25, 2024

cmouse commented Sep 25, 2024 •

edited

Loading

catap commented Sep 25, 2024

catap commented Sep 25, 2024

catap commented Sep 25, 2024

catap commented Sep 25, 2024 •

edited

Loading

catap commented Sep 25, 2024

catap commented Sep 27, 2024

catap commented Sep 29, 2024

cmouse commented Nov 1, 2024

catap commented Nov 27, 2024

replicator: fixed infinity loop on sync replication #223

Are you sure you want to change the base?

replicator: fixed infinity loop on sync replication #223

Conversation

catap commented Sep 25, 2024

cmouse commented Sep 25, 2024 • edited Loading

catap commented Sep 25, 2024

cmouse commented Sep 25, 2024 • edited Loading

catap commented Sep 25, 2024

catap commented Sep 25, 2024

catap commented Sep 25, 2024

catap commented Sep 25, 2024 • edited Loading

catap commented Sep 25, 2024

catap commented Sep 27, 2024

catap commented Sep 29, 2024

cmouse commented Nov 1, 2024

catap commented Nov 27, 2024

cmouse commented Sep 25, 2024 •

edited

Loading

cmouse commented Sep 25, 2024 •

edited

Loading

catap commented Sep 25, 2024 •

edited

Loading