-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Duplicated events are always removed #4706
base: main
Are you sure you want to change the base?
fix: Duplicated events are always removed #4706
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4706 +/- ##
=======================================
Coverage 85.90% 85.91%
=======================================
Files 292 292
Lines 33906 33950 +44
=======================================
+ Hits 29128 29168 +40
- Misses 4778 4782 +4 ☔ View full report in Codecov by Sentry. |
32c7aa3
to
a7ca6e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments inline but otherwise makes sense and lgtm 👍
/// Events **must** be sorted by their position (descending, i.e. from | ||
/// newest to oldest). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit footgun-y, can't we sort them here too, similar to what the deduplicator is doing in its sort_events_by_position_descending
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I've rewritten this part. Deduplicator
is no longer responsible to sort events by their position. This is done in a new RoomEventCacheState::remove_events
, which is responsible to remove events in the store, and in RoomEvents
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! just skimmed the high-level changes so I could understand it, and it makes sense to me 👍
…tion`. This patch changes `EventCacheStore::filter_duplicated_events` to return the `Position` of the duplicated event.
This patch redesigns `Deduplicator::filter_duplicate_events`. First off, `filter_duplicate_events` does remove events with no valid ID. At the same time, it removes duplicate events within the new events (`events`). This check was done in the `BloomFilterDeduplicator` but not in the `StoreDeduplicator`. Now it's done at the front of these implementations, directly inside `Deduplicator`. Second, this patch introduces `DeduplicationOutcome` to replace the return type `(Vec<Event>, Vec<OwnedEventId>)`, especially because now it would have become `(Vec<Event>, Vec<(OwnedEventId, Position)>, Vec<(OwnedEventId, Position)>)`. Why? 1. Because the positions of the duplicated events are returned, 2. We differentiate between in-memory vs. in-store duplicated events. Third, now there are positions associated to duplicated events, events must be sorted. It's the role of `sort_events_by_position_descending`. This way, `DeduplicatorOutcome` brings guarantees and less checks are required.
…ct place. This patch uses `DeduplicationOutcome` to remove events either in memory, or in the store, when required. The `remove_events_by_id` method has been renamed `remove_events_by_position`.
This patch adds a test for `sort_events_by_position_descending`. It also updates this function so that events are sorted by their chunk identifier from newest to oldest, it makes no difference but it matches the order of the position indices too. Everything “dimension” is descending.
This patch adds a test ensuring that `Deduplicator` is able to find duplicates in its own inputs.
This patch adds a test ensuring that `Deduplicator` excludes invalid events, i.e. event with no ID.
Nothing fancy here. Just regular chore tasks.
… store. This patch tests that `Deduplicator` dispatches duplicated events in the correct field of `DeduplicationOutcome`.
This patch renames `EmptyChunk` into `EmptyChunkRule`. Name suggested by @stefanceriu, it makes a lot more sense, thanks!
This patch makes the code more robust around event removals. Sorting events by their position is no longer done in the `Deduplicator` but in a new `RoomEventCacheState::remove_events` method, which removes events in the store and in the `RoomEvents`. This method is responsible to sort events, this stuff is less fragile like so.
a7ca6e5
to
c1d8e6a
Compare
This patch puts the `Ok` outside the `match` for a better ergonomics.
When a duplicated event is found, sometimes we keep the older one and the new one is dropped, but sometimes the older one is removed and the new one is kept (backwards pagination vs. sync). So we need to be able to remove an event.
However, since #4662 and #4632, this can fail. Why? Because we can find a duplicated event in the store, and that is not loaded in the in-memory
LinkedChunk
. What happens in this case? We simply give up! See theerror!
case?matrix-rust-sdk/crates/matrix-sdk/src/event_cache/room/events.rs
Lines 246 to 254 in bdf5fad
Well, this is pretty bad, because it means the duplicated event will not be removed. And it means a new event with the same ID will be inserted. Kaboom 💥.
These patches fix that in simple steps:
EventCacheStore::filter_duplicated_events
returns the event positions,Deduplicator
is redesigned to produce aDeduplicatorOutcome
which makes the difference between in-memory vs. in-store duplicated events,These patches also fix inconsistencies between the
BloomFilterDeduplicator
(used when there is no store) andStoreDeduplicator
(used when there is a store), like: the new events can contain duplicates (new_events = [$ev0, $ev1, $ev0]
) in theory (not in practise so far). This check was only done in the bloom filter deduplicator. Now it's done for all of them, directly inDeduplicator
. Same for detecting events with no ID, it's done beforehand. See the patches one by one.Tasks
EventCache
storage #3280EventCache
lazy-loading #4632