Sender Data: Background task to retry fetching sender data for megolm sessions #3546

andybalaam · 2024-06-13T11:20:27Z

Latest update: we think we don't need a background task at all: there is only a point re-trying if we get back more information about devices or users from a /keys/query request, so we will simply process any retries due whenever a /keys/query request ends.

Launch a background task when we start that retries fetching sender data for megolm sessions.

Update: originally we expected this background task not to handle "slow lane" (new sessions for which we don't have data but we suspect it may appear from the next keys/query response) items, but we now expect that it will, so we need make sure that we kick off a retry cycle whenever receive_keys_query_response completes. (This should not clash with normal retries so we need a lock of some kind.)

Part of #3544 which is part of Invisible Crypto.

The algorithm below mentions jumping to certain steps. This is referring to the algorithm in #3543 .

Algorithm

This handles all of the following scenarios:

Legacy sessions - these effectively have missing device info.
- These may already exist before this code is deployed, or be created by importing keys that are missing device info, or by restoring backups that are missing device info.
Non-legacy sessions with missing device info - we are waiting for it to appear.
Sessions with device info that is signed by a cross-signing key we don't recognise - we are waiting to see whether it will appear.
Sessions that we received but were interrupted before we finished processing them.
- (Note: we don't store the session immediately we receive it, but if we have not finished with it by the time the sync is finished, we will have stored it. If we were interrupted before the sync is finished, we will see this to-device message again the next time we sync.)

In each case we will query the store for these sessions and retry them, updating their retry count if we need to continue waiting.

The background job begins on startup and repeatedly does the following:

Query the sessions in the store that are due for a retry because either:
- they are in state UnknownDevice and their next_retry_time_ms property is before now
- or they are in state DeviceInfo and their next_retry_time_ms property is before now
Order by next_retry_time_ms ascending and take the first $BATCH_SIZE (~200?).
For each
- if state==UnknownDevice, [take the lock, bail out if not], jump to step C.3 (Within the algorithm above, we don't technically need to spawn additional tasks in step F.2, but it won't do any harm to do so - they probably complete immediately because we're probably not waiting for a keys/query given we just waited for one in C.3.)
- else, state==DeviceInfo, [take the lock, bail out if not], jump to F.3
Await all of these tasks - they will complete quickly.
Repeat. If there were no pending jobs… go to sleep for a while

Each session gets its own independent background task, which is OK because tasks are light and they all just wait for the same keys/query to finish. After that, they may save to the store, which might cause some contention but is probably fine.

The text was updated successfully, but these errors were encountered:

poljar · 2024-06-13T12:29:39Z

If every instance of session here means megolm session, can we please use the "room key" term instead. I know that we call this the InboundGroupSession in the crypto crate, but perhaps we should reconsider this name as well.

dkasak · 2024-06-13T12:33:20Z

We don't differentiate enough between the Matrix layer things and the Olm/Megolm layer things. If anything should have the name "Megolm session", it should be the low level packets described in https://gitlab.matrix.org/matrix-org/olm/-/blob/master/docs/megolm.md#data-exchange-formats. The m.room_key thing on the Matrix layer should never be called a Megolm session but a room key or message key.

andybalaam · 2024-06-13T15:26:01Z

@BillCarsonFr should I search-and-replace "session" -> "room key" everywhere in this issue and its parents?

andybalaam · 2024-07-23T12:56:48Z

No longer needed - see the updated plan in #3544

This was referenced Jun 13, 2024

Sender Data: Async migration task to prepare old sessions for fetching sender data #3547

Closed

Store sender data for Megolm sessions as per Invisible crypto #3544

Closed

This was referenced Jul 16, 2024

Sender Data: Fetch and store sender data when we receive a new megolm session #3543

Closed

crypto: Simplify the interface of search_for_device #3748

Merged

andybalaam closed this as completed Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sender Data: Background task to retry fetching sender data for megolm sessions #3546

Sender Data: Background task to retry fetching sender data for megolm sessions #3546

andybalaam commented Jun 13, 2024 •

edited

Loading

poljar commented Jun 13, 2024

dkasak commented Jun 13, 2024

andybalaam commented Jun 13, 2024

andybalaam commented Jul 23, 2024

Sender Data: Background task to retry fetching sender data for megolm sessions #3546

Sender Data: Background task to retry fetching sender data for megolm sessions #3546

Comments

andybalaam commented Jun 13, 2024 • edited Loading

Algorithm

poljar commented Jun 13, 2024

dkasak commented Jun 13, 2024

andybalaam commented Jun 13, 2024

andybalaam commented Jul 23, 2024

andybalaam commented Jun 13, 2024 •

edited

Loading