Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Fix bug that could cause a /sync to tightloop with sqlite after restart #16540

Merged
merged 2 commits into from
Oct 23, 2023

Conversation

erikjohnston
Copy link
Member

This could happen if the last rows in the account data stream were inserted into account_data. After a restart the max account ID would be calculated without looking at the account_data table, and so have an old ID.

Comment on lines +97 to +100
extra_tables=[
("account_data", "stream_id"),
("room_tags_revisions", "stream_id"),
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this include room_account_data too, like the postgres ID generator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's included above. (These are extra_tables= rather than tables= that MultiWriterIdGenerator uses)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This feels like something a lint could check 😢 )

@DMRobertson
Copy link
Contributor

Fixes #15824?

@erikjohnston erikjohnston marked this pull request as ready for review October 23, 2023 13:05
@erikjohnston erikjohnston requested a review from a team as a code owner October 23, 2023 13:05
@erikjohnston
Copy link
Member Author

Fixes #15824?

Quite possibly

@erikjohnston erikjohnston enabled auto-merge (squash) October 23, 2023 13:15
@erikjohnston erikjohnston merged commit 3bc23cc into develop Oct 23, 2023
41 checks passed
@erikjohnston erikjohnston deleted the erikj/sync_sqlite_loop branch October 23, 2023 13:39
@thebalaa
Copy link
Contributor

thebalaa commented Oct 23, 2023

Didn't fix #15824

Still seeing with the following development based container:

 ...
                "gitsha1": "3df70aa80001e05b0bbe69fd3328f11aceaab4aa",
                "org.homeserver": "true",
                "org.opencontainers.image.documentation": "https://github.com/matrix-org/synapse/blob/master/docker/README.md",
                "org.opencontainers.image.licenses": "Apache-2.0",
                "org.opencontainers.image.source": "https://github.com/matrix-org/synapse.git",
                "org.opencontainers.image.url": "https://matrix.org/docs/projects/server/synapse",
                "org.opencontainers.image.version": "1.95.0rc1"

Note the below sync query returns a response that does not advance the next_batch:

http://localhost:8008/_matrix/client/r0/sync?filter=0&timeout=30000&since=s93_7_0_1_5_1_1_11_0_1

{
    "next_batch": "s93_7_0_1_5_1_1_11_0_1",
    "device_lists": {
        "changed": [
            "@admin:localhost"
        ]
    },
    "device_one_time_keys_count": {
        "signed_curve25519": 50
    },
    "org.matrix.msc2732.device_unused_fallback_key_types": [
        "signed_curve25519"
    ],
    "device_unused_fallback_key_types": [
        "signed_curve25519"
    ]
}

Restarting synapse fixes the tightloop temporarily but it returns within a few minutes.

Our reproduction steps:
We have a custom matrix-nio based client that is syncing, we then login via lement with the same matrix ID and within a few minutes it will start tightlooping.

SQLite database docker / docker compose deployment

@erikjohnston
Copy link
Member Author

Hmm, I wonder if we have a similar problem with device lists then

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants