Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sqlite): SqliteEventCacheStore is 35 times faster #4739

Merged
merged 3 commits into from
Mar 5, 2025

Conversation

Hywan
Copy link
Member

@Hywan Hywan commented Feb 28, 2025

This patch adds a benchmark for the LinkedChunk with the following matrix:

  • Handling 10, 100, 1_000, 10_000 and 100_000 events,
  • With no store, with the MemoryStore, with the SqliteEventCacheStore.

Before this patch, the case with no store and MemoryStore were not a blocker. I was surprisingly delighted to see that the LinkedChunk is able to handle a throughput of 9.3 millions events per second. With the MemoryStore, it drops at 4.5 millions events per second, not bad for a test-tailored store. However, for SqliteEventCacheStore, the story was clearly different… 7_300 events per second, only. And it got worst with more events.

Screenshot 2025-02-28 at 13 42 48

I dug into this for 2 days, and found several things.

In the future we can improve the benchmark to test more operations, but I wanted to solve the performance issue first.

Prepared statements for Update::PushNewItems

This is addressed as a standalone patch. I've replaced a query() in a loop by a prepare() outside the loop, then an execute() inside the loop.

Guess what? It barely improved the situation. I've reached 8_400 events per second. Not really the improvement I was expecting. However, the performance was now linear with the number of events. Before that, it was totally unpredictable. That's a first step.

I've tried to use a bulk insertion to avoid to call execute() in a loop. A single prepare() + a single execute() like so:

INSERT INTO events VALUES (?, ?, ?, ?, ?), (?, ?, ?, ?, ?), …

Absolutely zero difference. Well. Time not well-spent on this one, but it was worth the try.

Remove all tables: time for a new schema

The tables were fine but none of them contained primary keys. I hoped it could solve things. Sadly, ALTER TABLE doesn't allow to change the schema of tables, only a bit of renamings, and column changes.

The solution is to remove all indexes and all tables, and to recreate them. Same table names, same column names, but different semantics:

  • linked_chunks has a primary key over room_id and id (the chunk identifier), this pair is necessarily unique (this uniqueness constraint was missing before),
  • gaps has a primary key over room_id and chunk_id, same reason,
  • events has a primary key composed of room_id, chunk_id and position. Again, this tuple is necessarily unique, and this constraint was missing. The uniqueness on event_id has been added in feat(sqlite) Add an index on events.event_id and .room_id #4685.

Why events cannot use event_id as its primary key? Because an event_id can be null if the event is invalid. After all, it's allowed in SQLite (because it was a bug in the past, buggy behaviours are kept).

Note

Why an invalid event is stored in the database? It's still a mystery to me.
cc @bnjbvr

With this new schema, things have improved a bit. 14_000 events per second, better!

WITHOUT ROWID

Please read Clustered Indexes and the WITHOUT ROWID Optimization.

In addition to the new schema, the 3 tables are marked as WITHOUT ROWID. It has dramatically improved the performance. We are reaching 178_000 events per second!

I lied a bit in the previous section. At first, the primary key for events was event_id, and it did improved the performance, but WITHOUT ROWID expects the primary key to NOT be null. So I had to find another unique key, hence the tuple room_id, chunk_id and position. And this whole re-designed was with WITHOUT ROWID in mind, but it's the result of several tries and failures.

Restore the index over events

Finally, in the new schema, I re-create the index for events.room_id and events.event_id. And we are finally reaching 260_000 events per second!!

Results for 100_000 events

(These results are a bit different because, at the time of publishing these patches, I re-run them on my laptop running on a low battery. The orders are the same though.)

No store

Screenshot 2025-02-28 at 18 34 05
Lower bound Estimate Upper bound
Slope 13.641 ms 15.264 ms 16.948 ms
Throughput 5.9003 Melem/s 6.5513 Melem/s 7.3308 Melem/s
0.6328193 0.7358459 0.6261084
Mean 13.285 ms 14.181 ms 15.422 ms
Std. Dev. 548.23 µs 1.8684 ms 2.7520 ms
Median 13.077 ms 13.206 ms 14.839 ms
MAD 42.535 µs 299.56 µs 2.2939 ms

MemoryStore

Screenshot 2025-02-28 at 18 34 14
Lower bound Estimate Upper bound
Slope 32.587 ms 33.022 ms 33.414 ms
Throughput 2.9928 Melem/s 3.0283 Melem/s 3.0687 Melem/s
0.9932222 0.9961436 0.9937653
Mean 32.545 ms 32.748 ms 33.022 ms
Std. Dev. 102.11 µs 417.19 µs 600.38 µs
Median 32.490 ms 32.601 ms 32.891 ms
MAD 50.689 µs 166.82 µs 458.06 µs

SqliteEventCacheStore

Screenshot 2025-02-28 at 18 34 21
Lower bound Estimate Upper bound
Slope 403.39 ms 403.88 ms 404.57 ms
Throughput 247.17 Kelem/s 247.60 Kelem/s 247.90 Kelem/s
0.9998947 0.9999231 0.9998668
Mean 403.77 ms 404.39 ms 405.02 ms
Std. Dev. 643.36 µs 1.0718 ms 1.2887 ms
Median 403.47 ms 404.15 ms 405.54 ms
MAD 205.61 µs 1.3011 ms 1.8636 ms

Cons

Migrating from version 5 to 6 will erase all existing caches. I consider it's okay since (i) it's a cache, (ii) it's not released yet.


Copy link

codecov bot commented Feb 28, 2025

Codecov Report

Attention: Patch coverage is 76.92308% with 6 lines in your changes missing coverage. Please review.

Project coverage is 86.11%. Comparing base (7694b01) to head (16a5401).
Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
crates/matrix-sdk-sqlite/src/event_cache_store.rs 78.94% 4 Missing ⚠️
crates/matrix-sdk/src/event_cache/deduplicator.rs 71.42% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4739      +/-   ##
==========================================
- Coverage   86.12%   86.11%   -0.02%     
==========================================
  Files         292      292              
  Lines       34346    34355       +9     
==========================================
+ Hits        29581    29585       +4     
- Misses       4765     4770       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Hywan Hywan force-pushed the bench-linked-chunk branch from 6e215ce to ed1a906 Compare February 28, 2025 17:40
@Hywan Hywan marked this pull request as ready for review February 28, 2025 17:50
@Hywan Hywan requested a review from a team as a code owner February 28, 2025 17:50
@Hywan Hywan requested review from poljar and removed request for a team February 28, 2025 17:50
@Hywan Hywan changed the title bench: Add a benchmark for the LinkedChunk fix(sqlite): SqliteEventCacheStore is 35 times faster Feb 28, 2025
@bnjbvr bnjbvr requested review from bnjbvr and removed request for poljar March 3, 2025 09:47
Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice performance improvements, good job! Unfortunately the new schema is incorrect because of the uniqueness constraint, see latest comment.

@Hywan Hywan force-pushed the bench-linked-chunk branch from ed1a906 to 282c120 Compare March 5, 2025 08:04
@Hywan Hywan requested a review from bnjbvr March 5, 2025 08:16
bnjbvr
bnjbvr previously approved these changes Mar 5, 2025
Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good, only one minor comment about the benchmark below.

On the other hand, to avoid another later migration, maybe this is a good time to (1) do not store events without an event id in the linked chunk (in memory or in store), (2) add the non null constraint on the event_id field, (3) add the uniqueness constraint back on the tuple (room, event_id). What do you think? (Please ask for another round of review if you do it in this PR)

Hywan added 2 commits March 5, 2025 12:27
This patch uses a prepared statement to insert events in the linked
chunks. It offers more predictable performance, and SQLite prefers that.
@Hywan Hywan force-pushed the bench-linked-chunk branch 2 times, most recently from aed17bc to 1d1b4fc Compare March 5, 2025 11:29
@Hywan Hywan requested a review from poljar March 5, 2025 12:14
@Hywan Hywan dismissed bnjbvr’s stale review March 5, 2025 12:14

@bnjbvr is absent today. Only the last patch must be reviewed as he said. @poljar takes it over.

Copy link
Contributor

@poljar poljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I left a small suggestion.

Please beware that commit 4 modifies the table again without having a migration, also some of the code changes in commit 4 should probably have been part of commit 3, i.e. where we change the insert statement where the event ID becomes a String instead of an Option<String>.

What I'm saying, 3 and 4 should probably be squashed.

@Hywan Hywan force-pushed the bench-linked-chunk branch from 1d1b4fc to 71c8516 Compare March 5, 2025 12:39
@Hywan
Copy link
Member Author

Hywan commented Mar 5, 2025

Patches 3 and 4 are now squashed together. Thanks for the suggestion.

This patch is twofold. First off, it provides a new schema allowing to
improve the performance of `SqliteEventCacheStore` for 100_000 events
from 6.7k events/sec to 284k events/sec on my machine.

Second, it now assumes that `EventCacheStore` does NOT store invalid
events. It was already the case, but the SQLite schema was not rejecting
invalid event in case some were handled. It's now explicitely forbidden.
@Hywan Hywan force-pushed the bench-linked-chunk branch from 71c8516 to 16a5401 Compare March 5, 2025 12:42
@Hywan Hywan enabled auto-merge (rebase) March 5, 2025 12:43
@Hywan Hywan merged commit 3d653d3 into matrix-org:main Mar 5, 2025
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants