Fix bug that could cause a `/sync` to tightloop with sqlite after restart #16540

erikjohnston · 2023-10-23T12:47:14Z

This could happen if the last rows in the account data stream were inserted into account_data. After a restart the max account ID would be calculated without looking at the account_data table, and so have an old ID.

DMRobertson · 2023-10-23T12:49:27Z

synapse/storage/databases/main/account_data.py

+                extra_tables=[
+                    ("account_data", "stream_id"),
+                    ("room_tags_revisions", "stream_id"),
+                ],


Should this include room_account_data too, like the postgres ID generator?

That's included above. (These are extra_tables= rather than tables= that MultiWriterIdGenerator uses)

(This feels like something a lint could check 😢 )

DMRobertson · 2023-10-23T12:53:44Z

Fixes #15824?

erikjohnston · 2023-10-23T13:05:36Z

Fixes #15824?

Quite possibly

thebalaa · 2023-10-23T20:14:41Z

Didn't fix #15824

Still seeing with the following development based container:

 ...
                "gitsha1": "3df70aa80001e05b0bbe69fd3328f11aceaab4aa",
                "org.homeserver": "true",
                "org.opencontainers.image.documentation": "https://github.com/matrix-org/synapse/blob/master/docker/README.md",
                "org.opencontainers.image.licenses": "Apache-2.0",
                "org.opencontainers.image.source": "https://github.com/matrix-org/synapse.git",
                "org.opencontainers.image.url": "https://matrix.org/docs/projects/server/synapse",
                "org.opencontainers.image.version": "1.95.0rc1"

Note the below sync query returns a response that does not advance the next_batch:

http://localhost:8008/_matrix/client/r0/sync?filter=0&timeout=30000&since=s93_7_0_1_5_1_1_11_0_1

{
    "next_batch": "s93_7_0_1_5_1_1_11_0_1",
    "device_lists": {
        "changed": [
            "@admin:localhost"
        ]
    },
    "device_one_time_keys_count": {
        "signed_curve25519": 50
    },
    "org.matrix.msc2732.device_unused_fallback_key_types": [
        "signed_curve25519"
    ],
    "device_unused_fallback_key_types": [
        "signed_curve25519"
    ]
}

Restarting synapse fixes the tightloop temporarily but it returns within a few minutes.

Our reproduction steps:
We have a custom matrix-nio based client that is syncing, we then login via lement with the same matrix ID and within a few minutes it will start tightlooping.

SQLite database docker / docker compose deployment

erikjohnston · 2023-10-23T21:13:40Z

Hmm, I wonder if we have a similar problem with device lists then

erikjohnston added 2 commits October 23, 2023 13:45

Fix bug that could cause a /sync to tightloop with sqlite after restart

a95735e

Newsfile

5753c9d

DMRobertson reviewed Oct 23, 2023

View reviewed changes

erikjohnston marked this pull request as ready for review October 23, 2023 13:05

erikjohnston requested a review from a team as a code owner October 23, 2023 13:05

DMRobertson approved these changes Oct 23, 2023

View reviewed changes

erikjohnston enabled auto-merge (squash) October 23, 2023 13:15

erikjohnston merged commit 3bc23cc into develop Oct 23, 2023
41 checks passed

erikjohnston deleted the erikj/sync_sqlite_loop branch October 23, 2023 13:39

erikjohnston mentioned this pull request Oct 23, 2023

/sync randomly tightloops #15824

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug that could cause a `/sync` to tightloop with sqlite after restart #16540

Fix bug that could cause a `/sync` to tightloop with sqlite after restart #16540

erikjohnston commented Oct 23, 2023

DMRobertson Oct 23, 2023

erikjohnston Oct 23, 2023

clokep Oct 23, 2023

DMRobertson commented Oct 23, 2023

erikjohnston commented Oct 23, 2023

thebalaa commented Oct 23, 2023 •

edited

Loading

erikjohnston commented Oct 23, 2023

Fix bug that could cause a /sync to tightloop with sqlite after restart #16540

Fix bug that could cause a /sync to tightloop with sqlite after restart #16540

Conversation

erikjohnston commented Oct 23, 2023

DMRobertson Oct 23, 2023

Choose a reason for hiding this comment

erikjohnston Oct 23, 2023

Choose a reason for hiding this comment

clokep Oct 23, 2023

Choose a reason for hiding this comment

DMRobertson commented Oct 23, 2023

erikjohnston commented Oct 23, 2023

thebalaa commented Oct 23, 2023 • edited Loading

erikjohnston commented Oct 23, 2023

Fix bug that could cause a `/sync` to tightloop with sqlite after restart #16540

Fix bug that could cause a `/sync` to tightloop with sqlite after restart #16540

thebalaa commented Oct 23, 2023 •

edited

Loading