Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sender Data: Background task to retry fetching sender data for megolm sessions #3546

Closed
andybalaam opened this issue Jun 13, 2024 · 4 comments

Comments

@andybalaam
Copy link
Contributor

andybalaam commented Jun 13, 2024

Latest update: we think we don't need a background task at all: there is only a point re-trying if we get back more information about devices or users from a /keys/query request, so we will simply process any retries due whenever a /keys/query request ends.

Launch a background task when we start that retries fetching sender data for megolm sessions.

Update: originally we expected this background task not to handle "slow lane" (new sessions for which we don't have data but we suspect it may appear from the next keys/query response) items, but we now expect that it will, so we need make sure that we kick off a retry cycle whenever receive_keys_query_response completes. (This should not clash with normal retries so we need a lock of some kind.)

Part of #3544 which is part of Invisible Crypto.

The algorithm below mentions jumping to certain steps. This is referring to the algorithm in #3543 .

Algorithm

This handles all of the following scenarios:

  • Legacy sessions - these effectively have missing device info.
    • These may already exist before this code is deployed, or be created by importing keys that are missing device info, or by restoring backups that are missing device info.
  • Non-legacy sessions with missing device info - we are waiting for it to appear.
  • Sessions with device info that is signed by a cross-signing key we don't recognise - we are waiting to see whether it will appear.
  • Sessions that we received but were interrupted before we finished processing them.
    • (Note: we don't store the session immediately we receive it, but if we have not finished with it by the time the sync is finished, we will have stored it. If we were interrupted before the sync is finished, we will see this to-device message again the next time we sync.)

In each case we will query the store for these sessions and retry them, updating their retry count if we need to continue waiting.

The background job begins on startup and repeatedly does the following:

  • Query the sessions in the store that are due for a retry because either:
    • they are in state UnknownDevice and their next_retry_time_ms property is before now
    • or they are in state DeviceInfo and their next_retry_time_ms property is before now
  • Order by next_retry_time_ms ascending and take the first $BATCH_SIZE (~200?).
  • For each
    • if state==UnknownDevice, [take the lock, bail out if not], jump to step C.3 (Within the algorithm above, we don't technically need to spawn additional tasks in step F.2, but it won't do any harm to do so - they probably complete immediately because we're probably not waiting for a keys/query given we just waited for one in C.3.)
    • else, state==DeviceInfo, [take the lock, bail out if not], jump to F.3
  • Await all of these tasks - they will complete quickly.
  • Repeat. If there were no pending jobs… go to sleep for a while

Each session gets its own independent background task, which is OK because tasks are light and they all just wait for the same keys/query to finish. After that, they may save to the store, which might cause some contention but is probably fine.

@poljar
Copy link
Contributor

poljar commented Jun 13, 2024

If every instance of session here means megolm session, can we please use the "room key" term instead. I know that we call this the InboundGroupSession in the crypto crate, but perhaps we should reconsider this name as well.

@dkasak
Copy link
Member

dkasak commented Jun 13, 2024

We don't differentiate enough between the Matrix layer things and the Olm/Megolm layer things. If anything should have the name "Megolm session", it should be the low level packets described in https://gitlab.matrix.org/matrix-org/olm/-/blob/master/docs/megolm.md#data-exchange-formats. The m.room_key thing on the Matrix layer should never be called a Megolm session but a room key or message key.

@andybalaam
Copy link
Contributor Author

@BillCarsonFr should I search-and-replace "session" -> "room key" everywhere in this issue and its parents?

@andybalaam
Copy link
Contributor Author

No longer needed - see the updated plan in #3544

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants