Skip to content

Conversation

rkistner
Copy link
Contributor

When doing checksum pre-calculations after the initial snapshot, the process can take a very long time if there are millions of tiny buckets. The pre-calculation has very limited usefulness for small buckets - it is primarily an optimization for large buckets.

This replaces the index from #375 with one that allows sorting by number of changes in the bucket. We then skip buckets with < 10 operations in the checksum pre-calculation.

The same logic could be used for compacting buckets in the future.

This also refactors the bucket compact logic to explicitly only compact a single bucket at a time. This simplifies the implementation, making it easier to confirm correctness.

Note that the index in #375 is not in any release yet, making it safe to just update the existing migration.

Copy link

changeset-bot bot commented Oct 16, 2025

⚠️ No Changeset found

Latest commit: 658bff4

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes changesets to release 11 packages
Name Type
@powersync/service-module-mongodb-storage Patch
@powersync/service-core Patch
@powersync/service-image Patch
@powersync/service-schema Patch
@powersync/service-module-mongodb Patch
@powersync/service-module-mysql Patch
@powersync/service-module-postgres Patch
@powersync/service-core-tests Patch
@powersync/service-module-core Patch
@powersync/service-module-postgres-storage Patch
test-client Patch

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@rkistner rkistner requested a review from simolus3 October 16, 2025 10:55
simolus3
simolus3 previously approved these changes Oct 16, 2025
Copy link
Contributor

@simolus3 simolus3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Since I'm not too familiar with the test setup here: Is it straightforward to test this? E.g. making fewer than 10 changes, triggering this and then asserting that no checksum has been persisted? If that takes a lot of effort I'm not sure it's worth setting that up, but at least from a quick look it seems like storage_compacting.test.ts has some setup code for this already.

@rkistner
Copy link
Contributor Author

Since I'm not too familiar with the test setup here: Is it straightforward to test this? E.g. making fewer than 10 changes, triggering this and then asserting that no checksum has been persisted? If that takes a lot of effort I'm not sure it's worth setting that up, but at least from a quick look it seems like storage_compacting.test.ts has some setup code for this already.

Good point, it should be tested. I added some tests now that checks the number of buckets/checksums that were persisted. The tests are still limited in that it doesn't test whether the persisted checksums are actually used, but it's better at least.

@rkistner rkistner merged commit da532c8 into main Oct 17, 2025
21 checks passed
@rkistner rkistner deleted the redo-compact-index branch October 17, 2025 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants