-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Activate share_pos on the room-list sliding sync instance #4035
Conversation
3e42b53
to
d17e83f
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #4035 +/- ##
==========================================
- Coverage 84.88% 84.87% -0.01%
==========================================
Files 272 272
Lines 29166 29166
==========================================
- Hits 24757 24756 -1
- Misses 4409 4410 +1 ☔ View full report in Codecov by Sentry. |
d17e83f
to
07310ee
Compare
Ahhh crypto dragons has been triggered :/ . Time to set up complement-crypto :) |
07310ee
to
11d63f2
Compare
Hum |
11d63f2
to
4140bbd
Compare
The failed test was a timeout so I think it's a false alarm. |
It failed again, it may not be a flake especially since it's restart related ? |
The log files show:
..which imply that |
A successful local run shows:
|
Right, found the cause. When it starts up, 2 sliding sync connections are made:
This causes 2x HTTP requests to /sync. On passing tests, both finish quickly. On failing tests, one does not return at all for the duration of the test. Looking at the MITM dump for the failed run and it's clear that the one that doesn't return is {
"conn_id": "room-list",
"extensions": {
"account_data": {
"enabled": true
},
"receipts": {
"enabled": true,
"rooms": [
"*"
]
},
"typing": {
"enabled": true
}
},
"lists": {
"all_rooms": {
"filters": {
"not_room_types": [
"m.space"
]
},
"include_heroes": true,
"ranges": [
[
0,
0
]
],
"required_state": [
[
"m.room.encryption",
""
],
[
"m.room.member",
"$LAZY"
],
[
"m.room.member",
"$ME"
],
[
"m.room.name",
""
],
[
"m.room.canonical_alias",
""
],
[
"m.room.power_levels",
""
]
],
"timeline_limit": 1
}
},
"txn_id": "86a22ed3a03948dc985c7c32be1b5327"
} I suspect the sync code assumes that room-list requests return immediately on startup (which has been true up until now). By sharing the So this is a real bug unfortunately. The reason why it is flakey is likely due to the slow speed of GHA boxes. I suspect the test itself runs much more slowly, giving enough time for the new data to come down before the restart, meaning when it starts up again there is no new data to fetch, hence it blocks waiting for more data. The end-user impact of this will presumably be a spinner showing for 30s (as it's also waiting on the same listener the tests are) upon restart. You likely didn't see this because your account is way more active, meaning there is a high chance something will come down before the timeout, but in a test scenario this is not the case. |
@kegsay thanks a lot for the detailed investigation 🙏 The fix should be easy-ish with all those infos, I'll have a look tomorrow. |
The fix fwiw is to just use ?timeout=0 on the first sync (before the room list state is loaded). This matches sync V2 behaviour in other clients. |
I don't think it will be useful: syncv2 spec is saying that no timeout specified is the same as timeout=0. Synapse code seems to agree, for both syncv2 and SSS. |
Nevermind, timeout of the first req is 60s in your logs. |
It seems we don't really need to use this trick here, cf 492166a. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes for the comments in testing; I'll let @Hywan finish the review and have the final words. Thanks!
We are very close to see a merge here. As soon as you address the last comments, we are good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are good! Can you rewrite your patch history a little bit before I merge please? I think the fix with subscribe_next
at least its own commit.
9099e50
to
6f2f2e1
Compare
This fixes a problem when doing an incremental sync at launch, where `NotLoaded` event would not be dispatched until data became available or timeout is reached, leading to app waiting for it.
6f2f2e1
to
67f18d5
Compare
Fixes #3936.
Signed-off-by: Mathieu Velten [email protected]