-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replicator: fixed infinity loop on sync replication #223
base: release-2.3
Are you sure you want to change the base?
Conversation
Thank you for the merge request, we'll take it into consideration. |
@cmouse thanks for the very quick response! I understand that replicator was quite unstable. Frankly speaking, my thread where I debug this issue for about 10 months is good explanation why you removed it from 2.4. But, I have cluster and will to make it stable... may I ask you to consider to put replicator back in 2.4? Or at least restore some API that allows me to build it as a dedicated plugin. As you can see, I'm quite motivated to make it stable enough to work :) |
Yeah. The only problem is that there is no replicator in 2.4. But lets see if we can have this in 2.3 |
@cmouse this code contains a condition: if (lookups[i].wait_for_next_push) {
/* another sync request came while user was being
replicated */
i_assert(user->priority == REPLICATION_PRIORITY_SYNC);
lookups[i].wait_for_next_push = FALSE;
} if I understand the replicator code base right, when timeout is happened it may switch the So, this abort will be triggered. Probably the right code is something like: if (lookups[i].wait_for_next_push) {
/* another sync request came while user was being
replicated */
if (user->priority != REPLICATION_PRIORITY_SYNC)
continue;
lookups[i].wait_for_next_push = FALSE;
} |
BTW I deployed a version with changes from #223 (comment) and I let you posted how it behaves |
Well, it survives network issue without any crash:
|
Before this fix, replicator adds the same lookup into callbacks over and over, until it reaches out of memory, and crashes. This regression was introduced 447e086 with initial implementation of replicator.
@cmouse after some aditional thought I had simplified this changes to fixing a typo: missed |
@cmouse I continue track an issue where order UID inside vurtual folders are messed up. Probably the cause of it is... What virtual forlder should be excluded from synchronisation, because otherwise synchronisation fails with hardcoded error: So, here it seems that I don't understand how should it works. With Have I missed something? |
After reading https://dovecot.org/mailman3/archives/list/[email protected]/message/5LYQGL2ZOMDDUFUOJU2TUBLLQUWEPZSR/ and near here.... I understand that replicator won't be recoverd. @cmouse anyway, may I kindly ask you to merge this small typo? With hope that it will be included into the next release of 2.3 branch. |
We will consider it, there is no plan for next 2.3 release at the moment. |
@cmouse thus, if you plan to release 2.3, may I ask you to consider to include this one as well? |
Before this fix, replicator adds the same lookup into callbacks over and over, until it reaches out of memory, and crashes.
This regression was introduced 447e086 with initial implementation of replicator.