Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of order call to state_machine::create_snapshot() when manually triggering a snapshot #478

Open
adotsch opened this issue Nov 13, 2023 · 6 comments

Comments

@adotsch
Copy link

adotsch commented Nov 13, 2023

Hi,

I only create manual snapshots in my application calling raft_server::create_snapshot().
I have seen state_machine::create_snapshot() called with a snapshot index that is smaller(!) than the last committed index, i.e. when the state machine was already ahead of the point it should have created a snapshot for.
I guess this is a bug.
I have my workarounds but it would be nice to have this fixed so that I can clean up my code.

Regards,
Andras

@greensky00
Copy link
Contributor

Hi @adotsch,
Thanks for bringing this issue.

Creating snapshot is done by a separate background commit thread according to the current "state machine's commit index", so the snapshot index can be lagging behind the last commit index of log store. State machine is always catching up the Raft's commit index.

Nonetheless, the current behavior is obviously a bug. If there is more recent snapshot that is manually created, auto creation should skip any log index smaller than that. I will provide the fix for it soon.

@adotsch
Copy link
Author

adotsch commented Dec 8, 2023

Correct me if I am wrong, but I think you just added an extra check to ignore manual create_snapshot() calls when inconvenient, i.e. create_snapshot() will work when we are lucky, but won't be reliable.
Could you rather trigger an event in the commit thread to create a snapshot correctly and reliably or just use a new/existing mutex to avoid the race condition?

@greensky00
Copy link
Contributor

@adotsch
create_snapshot() is a best effort API and does not guarantee the success of creation, as multi-thread racing is not the only cause of failure. Also we can't use a mutex as snapshot creation is asynchronous task. Basically applications need to do retry upon the result of the API call.

If what you want is to schedule the snapshot creation on next earliest available log index, then I'd rather add a new API, for instance schedule_snapshot_creation(). But still, being a universal library, below case may happen:

  • At log index 100, the commit thread initiates snapshot creation. It is running.
  • At log index 101, schedule_snapshot_creation() is invoked by the user. But it is postponed as the previous creation is in progress.
  • ... The previous creation takes a few hours to finish ...
  • At log index 99999, the previous creation is done. Now the commit thread initiates the postponed snapshot creation request, on log index 99999.

Please let me know if it is ok for your case.

@adotsch
Copy link
Author

adotsch commented Dec 13, 2023

Thanks for your response!
This is exactly what I do, I retry creating the snapshot if it failed. It just doesn't feel right.
Yes, I think there should be a schedule_snapshot_creation() API that creates a snapshot at the earliest possible index. I don't think create_snapshot() in it's current form is a very good API, it requires a lot of logic at the user's end to get it working due to it's (best effort) nature.

@antonio2368
Copy link
Contributor

antonio2368 commented Dec 5, 2024

@adotsch if I understood your problem correctly, it should be similar to what I noticed:

  • create_snapshot is called and current commit_idx is taken, e.g. 100
  • index 101 is committed in parallel
  • snapshot_and_compact is called because of create_snapshot so now we create a snapshot with index 100 but state machine has 101 committed

I think a valid fix is to take commit_lock_ when running create_snapshot so it behaves same as when it's called from commit thread
As a quick fix I just inherited raft_server and exposed commit_lock_ which I take before calling create_snapshot https://github.com/ClickHouse/ClickHouse/blob/master/src/Coordination/KeeperServer.cpp#L329

@greensky00 WDYT?

@greensky00
Copy link
Contributor

greensky00 commented Jan 6, 2025

@antonio2368
If the situation you described above is (manually) creating snapshot at 100, but in the meantime the state machine commits 101, that is fine. What we've discussed in this thread was about inversion (commit thread creates snapshot at 101, but user invokes manual snapshot creation at 100, and overwrites the already created 101).

But I think it will be better to add an option to create_snapshot API, like create_snapshot(const options& opt = options()), so as to help you not to use such a hack.

The option can define the behavior, like whether acquiring lock or not, and what to do if the lock is already acquired, etc. The default is the same as the current one (best effort).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants