Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement atomic multi-table transactions #238

Merged

Conversation

dennishuo
Copy link
Contributor

@dennishuo dennishuo commented Sep 2, 2024

Description

To reduce code-duplication, introduce a new TransactionWorkspaceMetaStoreManager to be injected into BasePolarisCatalog within the context of a multi-table transaction which throws errors on unexpected operations and collects single-entity updates to be committed atomically together.

Add some atomicity test scenarios that expose non-atomic implementations, some borrowed from Iceberg's TestRESTCatalog until that class can be refactored to inherit its tests in-place.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Documentation update
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Test A
  • Test B

Test Configuration:

  • Hardware:
  • Toolchain:
  • SDK:

Checklist:

Please delete options that are not relevant.

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • If adding new functionality, I have discussed my implementation with the community using the linked GitHub issue

To reduce code-duplication, introduce a new TransactionWorkspaceMetaStoreManager to
be injected into BasePolarisCatalog within the context of a multi-table transaction
which throws errors on unexpected operations and collects single-entity updates
to be committed atomically together.

Add some atomicity test scenarios that expose non-atomic implementations, some
borrowed from Iceberg's TestRESTCatalog until that class can be refactored to
inherit its tests in-place.
* <p>Not thread-safe; instances should only be used within a single request context and should not
* be reused between requests.
*/
public class TransactionWorkspaceMetaStoreManager implements PolarisMetaStoreManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like an awfully restrictive way to support transactional writes. I get the reasoning and the chances someone might expect reads to reflect the uncommitted writes is high. On the other hand, this seems extremely inflexible and I have a hard time seeing how this would allow future work that might need to do lookups after the initial entity resolution.

On the other hand, we don't really use the Transaction interface, which I think could be used to hold the entity updates prior to commit. It might be possible to move the entity persistence into the Transaction interface and support a multi-layer transaction so that a global transaction applies all of the underlying transaction entity updates atomically, whereas a single-layer transcation can just apply immediately on commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was actually adding this specifically to be the more flexible option vs injecting only an updates-collector. Right now this may seem rigid by not supporting reads, but the idea would be that the transaction workspace represents the mutated state within the transaction, so if reads need to happen in the workspace in the future, we can easily intercept those reads and return the state after the prior uncommitted writes in the same transaction or else read-through to the real persistence layer or require reading a statically resolved entity state from the beginning of the transaction (if we're sticking to SNAPSHOT isolation semantics); subsequent writes would be able to still condition on the entityVersion from those reads.

It seems the current Transaction interface in Iceberg is somewhat specific to single-table transactions and is geared towards packing different table-update types into the common interface, whereas for our server-side commitTransaction we already receive nicely packed TableUpdates that we don't need to manually multiplex back out into the different update types.

By putting our transaction container here in the PolarisMetaStoreManager layer, we don't need to care about whether it's an Iceberg table transaction or something non-Iceberg entirely (e.g. a multi-PrincipalRole transaction that would atomically update multiple PrincipalRoles). Anything which knows how to write to a PolarisMetaStoreManager doesn't need to be aware of transactions happening at all -- a BEGIN TRANSACTION just means we inject one of these TransactionWorkspaceMetaStoreManager instances as the impl, the core logic happily performs its mutations into it as needed, and then the outer layer gets to commit all the queued updates atomically.

Granted, to get to that point we need some additional features in here regarding the tracking and overriding of reads of entities that have been modified in the same transaction, and some refactoring of the PolarisEntityManager/PolarisResolutionManifest, but then we can add options in here to configure the desired isolation semantics of the reads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. This makes a ton of sense - I hadn't really been thinking about non-Iceberg entities, so supporting updates for Principals, etc. makes sense. I do like the premise that we can update this tx workspace to support uncommitted reads later on (maybe worth including as a comment in the code?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment with a rough sketch of how this class would evolve to support more advanced transaction scenarios.

…or more advanced transactions,

and fix formatting.
@RussellSpitzer
Copy link
Member

Great to see this start for Multi-Table Tranasactions, we should definitely spawn some other issues for all the TODO's in this PR. I think a lot of these would be good first issues for folks getting into the project.

@RussellSpitzer RussellSpitzer enabled auto-merge (squash) September 19, 2024 16:15
@RussellSpitzer RussellSpitzer merged commit 9e26fb6 into apache:main Sep 19, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants