Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Support commit isolation level #3805

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

BsoBird
Copy link
Contributor

@BsoBird BsoBird commented Jul 24, 2024

Purpose

For the file system catalog, there are many situations where we need to support stricter commit policies.This is especially true for third-party databases that use paimon as their underlying storage.

We define the commit isolation level option to control the behavior.

Linked issue: none

  1. Disable retry after failed commits in strict mode
  2. In strict mode, reject and clean up potentially dirty commits.
  3. Categorized possible exceptions and adjusted the exception handling logic.
  4. Supports concurrency control.
  5. In strict mode, do not use any hint file.

Tests

See FileStoreCommitTest.java.

API and Format

None

Documentation

Glossary:

  • Dirty-commit: commitSnapshotId < CurrentLatestSnapshotId - snapshot.num-retained.max && commit-success=True && snapshotManager.exists(commitSnapshotId)=True && (commitSnapshotId not belong any tag/branch)

Flow chat (use strict mode):

    graph TD
START[startCommit] -->USE-LOCK{USE-LOCK?}
USE-LOCK -->|YES| COMMIT_WITH_LOCK{COMMIT_WITH_LOCK_SUCCESS}
USE-LOCK -->|NO| CHECK-BEFORE{CHECK-BEFORE-SUCCESS?}
CHECK-BEFORE -->|NO| COMMIT_FAILED
CHECK-BEFORE -->|YES| COMMIT{COMMIT_SUCCESS?}
COMMIT --> |commit state unknown| COMMIT_FAILED
COMMIT --> |not success| COMMIT_FAILED
COMMIT --> |YES| IS_DIRTY_COMMIT{IS_DIRTY_COMMIT?}
IS_DIRTY_COMMIT --> |YES| COMMIT_FAILED
IS_DIRTY_COMMIT --> |NO| COMMIT_SUCCESS
COMMIT_WITH_LOCK --> |failed-normal| RETRY
COMMIT_WITH_LOCK --> |failed - strict mode| COMMIT_FAILED
COMMIT_WITH_LOCK --> |success| COMMIT_SUCCESS
RETRY --> END
COMMIT_SUCCESS --> END
COMMIT_FAILED --> END
Loading

@BsoBird
Copy link
Contributor Author

BsoBird commented Jul 24, 2024

@JingsongLi Hi. Can you check this? Tks.

@BsoBird BsoBird changed the title Support commit isolation level [core] Support commit isolation level Jul 24, 2024
@@ -238,6 +238,9 @@ default boolean tryToWriteAtomic(Path path, String content) throws IOException {
try {
writeFile(tmp, content, false);
success = rename(tmp, path);
} catch (IOException e) {
// try check once
success = exists(path) && !exists(tmp);
Copy link
Contributor Author

@BsoBird BsoBird Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After throwing an IO exception, it's a good idea to check it again.Currently, this checking strategy is not fully applicable to object storage systems and may result in false positives.However, as the industry evolves, object storage systems are starting to support mutex operations.In the long run, I think we need to add check operations.

However, please note that we can only check for IO exceptions after they are caught, because they are thrown by the server, and we should not do any checking for client-side exceptions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a separate PR?

sure

// otherwise, we throw an exception.
if (!success && useSerializableIsolation) {
throw new CommitFailedException(e.getMessage(), e);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the rename operation succeeds, we should ignore any exceptions thrown after that.

if (useSerializableIsolation && !useLock) {
// If the useSerializableIsolation is turned on,And we not use lock service,
// we will remove all HINTs,as it may provide incorrect information.
snapshotManager.removeSnapshotEarliestHint();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Serializable is turned on, and the locking service is not used, we should delete all HINT files, as they are always unreliable.

@@ -284,7 +284,7 @@ public void fastForward(String branchName) {
.collect(Collectors.toList());

// Delete latest snapshot hint
snapshotManager.deleteLatestHint();
snapshotManager.removeSnapshotLatestHint();
Copy link
Contributor Author

@BsoBird BsoBird Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous method name wasn't clear enough, so I've replaced it

snapshots.add(snapshotManager.snapshot(id));
if (snapshotManager.snapshotExists(id)) {
snapshots.add(snapshotManager.snapshot(id));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a dirty commit occurs, the previous processing logic throws an exception, which affects the running of the test case.

}
return committed;
} catch (Exception e) {
throw new CommitStateUnknownException(e.getMessage(), e);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any exception occurs during the commit process, we should throw a CommitStateUnknownException and not clean up any data.

throw e;
} catch (DirtyCommitException e) {
// We need to clean up all the metadata information generated by this commit.
cleanUpTmpManifests(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe cleaning up dirty commits will interfere with the creation of branch/tag. But since we are always cleaning up snapshots that exceed snapshot.num-retained.max.This shouldn't have much of an impact. After all, if we keep a snapshot that matches the characteristics of a dirty commit, we may never find the right time to delete it.

latestSnapshotIdAfterCommit > maxSaveSnapshotNum
? latestSnapshotIdAfterCommit - maxSaveSnapshotNum
: -1;
fastFailIfDirtyCommit(newSnapshotId, waterMark, latestSnapshotIdAfterCommit);
Copy link
Contributor Author

@BsoBird BsoBird Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need to add the logic of whether the snapshot belongs to a branch or tag.PR 3787 (#3787) provides this functionality, I'm not sure if I need to merge it with that PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I'm starting by porting some of the code from 3787 to the current PR.

@JingsongLi
Copy link
Contributor

Thanks @BsoBird ! I need time to look this, next week will give your feedback.

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @BsoBird , I have no idea why the changes is so big...

In my imagination, we only need one branch to complete the new commit mode.
Please note that this is the core code, and any modifications may result in serious consequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants