-
Notifications
You must be signed in to change notification settings - Fork 401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prune messages after merge #1989
Conversation
🦋 Changeset detectedLatest commit: f5d1705 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@@ -803,7 +803,7 @@ impl Store { | |||
ts_hash: &[u8; TS_HASH_LENGTH], | |||
message: &Message, | |||
) -> Result<Vec<u8>, HubError> { | |||
// If the store supports compact state messages, we don't merge messages that don't exist in the compact state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment was wrong, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should say: "we don't merge messages older than the compact state message and don't exist within it". Compact state message is like a checkpoint, you cannot go back and add messages that weren't present at the time of compaction.
8a4fe2b
to
acf6f1b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1989 +/- ##
===========================================
- Coverage 74.05% 48.98% -25.08%
===========================================
Files 99 118 +19
Lines 9425 15653 +6228
Branches 2097 4885 +2788
===========================================
+ Hits 6980 7667 +687
- Misses 2327 7985 +5658
+ Partials 118 1 -117 ☔ View full report in Codecov by Sentry. |
@@ -117,19 +123,26 @@ export abstract class RustStoreBase<TAdd extends Message, TRemove extends Messag | |||
} | |||
} | |||
|
|||
for (const [fid, _] of fidToPrune) { | |||
const pruneResult = await this.pruneMessages(fid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach works, but I'm curious if it slows down the merge code path.
I suspect this is somewhat inefficient, but I don't know for sure.
e.g. current flow is:
- check if is prunable in js (gets the storage count and earliest ts hash)
- merge in rust
- call the prune logic from js, which again gets the storage count etc and passes it to rust which iterates the db to find the earliest hash and prunes it.
But isPrunable already knows the storage units, message count and the earliestTsHash. One approach could be the pass in the earliestTsHash if it would cause a prune, and have the store prune the earliest message within the same transaction as the merge. Then we don't have to go back and forth between js and rust and we save some db calls.
But if you think perf impact is not that bad, then this is definitely simpler code wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update my benchmarking branch to see how much overhead the pruneMessage call adds to try and get some real numbers.
Reading getMessageCount
, it seems like the value is cached so the second load shouldn't be too bad.
One approach could be the pass in the earliestTsHash if it would cause a prune, and have the store prune the earliest message within the same transaction as the merge.
I like the idea of that. When you add a single message it can optionally remove the earliest stored item in the same operation. Is the earliestTsHash
for a particular FID
& store cached somewhere on the javascript side? If not, we'd probably still need to hit the db again. It would also change mergeMessages
a bit as you'd need to do a remove for each merge rather than for each unique FID at the end (probably doesn't matter that much).
Here's an alternative design that would probably help with perf. We keep the prune job but mergeMessages
simply helps the job track which FIDs on which stores need pruning. The job runs more frequently and only targets what needs to be updated. That would avoid the extra logic on the hot path and make the prune job run much faster. Thoughts? Might just be worth making that change as the design is still fairly simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, isPrunable fetches the earliest tsHash from cache. I think a design where we can perform the prune at the same time as the merge so we don't even need the job would be better.
If that is too complicated, I prefer the current approach of doing the prune within the merge call but outside the tx. I don't think we should do the third of just having the job run more frequently. I think we should eventually get rid of the job entirely.
The first approach above is more effort, so maybe let's decide based on the benchmark results?
@@ -615,14 +636,19 @@ impl Store { | |||
Ok(()) | |||
} | |||
|
|||
pub fn merge(&self, message: &Message) -> Result<Vec<u8>, HubError> { | |||
// Grab a merge lock. The typescript code does this by individual fid, but we don't have a | |||
pub fn merge(&self, message: &Message, max_mesages_for_fid: u32) -> Result<Vec<u8>, HubError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New prune logic is going here.
todo: move argument max_mesages_for_fid
to FidInfoCache
0cdc068
to
d2f7e40
Compare
d2f7e40
to
45e800c
Compare
Updates isPrunable to be getPruneAction which lets caller know if message should be pruned (original function) or would cause a prune action (added functionality). When applicable we prune for the given FID after the message is merged. Prune job is still running in the background but can be removed in the future once it's confirmed there's no stragglers. Benchmarking with production data, this change seems to add a 5% overhead on average. The FID lock added to Store::prune_messages is required for performance reasons. Without the lock, the overhead is run dependent and takes 100%-500% longer.
45e800c
to
4c7f1dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I would just update the tests to cover both the mergeMessages
and mergeMessage
codepath. Don't have to extensive, just one of the tests using the mergeMessages should be good.
for (const result of mergeResults.values()) { | ||
expect(result.isOk()).toBeTruthy(); | ||
} | ||
test("earlier remove messages gets pruned", async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems duplicated? One of these should test the mergeMessages (bundles) code path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. A lot of the tests are similar across stores, I must have copied / migrated the test over incorrectly. Will fix 👍
Closing due to age, and not actively being pursued. Thanks! |
Updates isPrunable to be getPruneAction which lets caller know if
message should be pruned (original function) or would cause a prune
action (added functionality). When applicable we prune for the given
FID after the message is merged.
Prune job is still running in the background but can be removed
in the future once it's confirmed there's no stragglers.
Benchmarking with production data, this change seems to add a 5%
overhead on average. The FID lock added to Store::prune_messages
is required for performance reasons. Without the lock, the overhead
is run dependent and takes 100%-500% longer.
Motivation
Currently hubs are seeing regular latency spikes due the periodic pruning job. This update causes the hub to immediately prune messages for a given
FID
after the message is merged.Change Summary
Updates
isPrunable
to begetPruneAction
which lets caller know if message should be pruned (original function) or would cause a prune action (added functionality). When applicable we prune for the givenFID
after the message is merged.Merge Checklist
Choose all relevant options below by adding an
x
now or at any time before submitting for reviewAdditional Context
Adds a FID based lock to
Store::prune_messages
. This might be an issue for the prune job which are left running. If a prune job is taking a long time it may block merges. I suspect this won't be too different from what's happening today during prunes as the threadpool is limited to 4 threads, so a slow prune would already be blocking things.PR-Codex overview
This PR focuses on enhancing message merge operations in the
store.rs
file and adding aPruneAction
enum instoreEventHandler.ts
.Detailed summary
PruneAction
enum to determine pruning actions