Speed up MAS token introspection #18357

erikjohnston · 2025-04-23T14:30:58Z

We do this by shoving it into Rust. We believe our python http client is a bit slow.

Also bumps minimum rust version to 1.81.0, released last September (over six months ago)

To allow for async Rust, includes some adapters between Tokio in Rust and the Twisted reactor in Python.

rust/Cargo.toml

synapse/synapse_rust/http_client.pyi

rust/src/async_twisted.rs

sandhose · 2025-04-23T14:51:32Z

rust/src/http_client.rs

+        // TODO: Is it safe to assert unwind safety here? I think so, as we
+        // don't use anything that could be tainted by the panic afterwards.
+        // Note that `.spawn(..)` asserts unwind safety on the future too.
+        let res = AssertUnwindSafe(fut).catch_unwind().await;


alternatively, spawn the future on the runtime, await on the handle, and it will give you an Err with the panic in case it panics

We could, though then you end up spawning two tasks per function, rather than one. Probably not a huge deal, but feels a bit bleurgh

Spawning is cheap, let's do that instead please. Also we'll need to spawn a separate task anyway if we want to properly support cancel

Can you expand on why you want to use new tasks please? I don't see the benefit of spawning a new task to just wait on it, semantically you end up with a bunch of tasks with different IDs all for the same work. In future, if we wanted to start tracking tasks and e.g. their resource usage then using multiple tasks makes that more complicated.

I also don't think we need a separate task for cancellation necessarily. You can change this line to do a select on both fut and the cancellation future.

I'm not really confortable with AssertUnwindSafe being used so broadly. Tasks are cheap to spawn, and I don't think we'd want to base our potential resource consumption measurement in Rust-world to Tokio task IDs?

Anyway, even though AssertUnwindSafe smells like a bad thing waiting to happen, I won't block this PR further because of this if you're not convinced that spawning is fine

I'm not really confortable with AssertUnwindSafe being used so broadly. Tasks are cheap to spawn, and I don't think we'd want to base our potential resource consumption measurement in Rust-world to Tokio task IDs?

I think we would do this at the task level, we'd have a task-local context that records resource usage, so you could e.g. wrap the top-level future that records the resource consumption of poll, or have DB functions record transaction times/etc. When spawning new tasks you'd want to decide if the resources of the task get allocated to the current task or to a new one.

Anyway, even though AssertUnwindSafe smells like a bad thing waiting to happen, I won't block this PR further because of this if you're not convinced that spawning is fine

Bear in mind that this is exactly what spawning a task does in tokio, so its hard to see how it would be fine for that and not here.

rust/src/http_client.rs

sandhose · 2025-04-23T14:59:45Z

rust/src/http_client.rs

+    static ref DEFERRED_CLASS: PyObject = {
+        Python::with_gil(|py| {
+            py.import("twisted.internet.defer")
+                .expect("module 'twisted.internet.defer' should be importable")


I'm not a fan of panicking like that, not sure what will happen if it's the case

This will cause a panic the first and subsequent derefs. Given that this shouldn't ever fail I prefer having the initialisation closer to the definition for clarities sake.

I've added an explicit call to these functions in the init of HttpClient so that if it ever did fail, it'd fail at startup.

This is because openssl cannot be used in a manylinux context due to lack of stable ABI.

sandhose · 2025-04-24T14:09:44Z

rust/src/http_client.rs

+        // Make sure we fail early if we can't build the lazy statics.
+        LazyLock::force(&RUNTIME);
+        LazyLock::force(&DEFERRED_CLASS);


Could be done in the module initialisation?

Annoyingly, we can't import twisted reactor at this stage as it happens too early. See comment I left in HttpClient::new

rust/Cargo.toml

sandhose · 2025-04-24T14:15:35Z

rust/src/http_client.rs

+            let mut stream = response.bytes_stream();
+            let mut buffer = Vec::new();
+            while let Some(chunk) = stream.try_next().await.context("reading body")? {
+                if buffer.len() + chunk.len() > response_limit {
+                    Err(anyhow::anyhow!("Response size too large"))?;
+                }
+
+                buffer.extend_from_slice(&chunk);
+            }


I believe you can achieve the same with http_body_util::Limited; reqwest::Response implements Into<Body>

Yeah, originally tried that but it messed up the errors (one of the exceptions stops implementing std Error). Given how straight forwards this is it felt easier than faffing with error types.

sandhose · 2025-04-24T14:16:37Z

rust/src/http_client.rs

+        // TODO: Is it safe to assert unwind safety here? I think so, as we
+        // don't use anything that could be tainted by the panic afterwards.
+        // Note that `.spawn(..)` asserts unwind safety on the future too.
+        let res = AssertUnwindSafe(fut).catch_unwind().await;


Spawning is cheap, let's do that instead please. Also we'll need to spawn a separate task anyway if we want to properly support cancel

rust/src/http_client.rs

sandhose · 2025-04-24T14:21:38Z

rust/src/http_client.rs

+/// The tokio runtime that we're using to run async Rust libs.
+static RUNTIME: LazyLock<Runtime> = LazyLock::new(|| {
+    tokio::runtime::Builder::new_multi_thread()
+        .worker_threads(4)


We'll likely want to have that configurable at some point, but this is probably a sane default.

…trospect

erikjohnston · 2025-06-09T08:57:09Z

Friendly ping @sandhose

sandhose · 2025-06-10T14:57:12Z

rust/src/http_client.rs

+        // TODO: Is it safe to assert unwind safety here? I think so, as we
+        // don't use anything that could be tainted by the panic afterwards.
+        // Note that `.spawn(..)` asserts unwind safety on the future too.
+        let res = AssertUnwindSafe(fut).catch_unwind().await;


I'm not really confortable with AssertUnwindSafe being used so broadly. Tasks are cheap to spawn, and I don't think we'd want to base our potential resource consumption measurement in Rust-world to Tokio task IDs?

Anyway, even though AssertUnwindSafe smells like a bad thing waiting to happen, I won't block this PR further because of this if you're not convinced that spawning is fine

Broke in #18357

MadLittleMods · 2025-08-22T23:44:13Z

synapse/api/auth/msc3861_delegated.py

+                with PreserveLoggingContext():
+                    resp_body = await self._rust_http_client.post(
+                        url=uri,
+                        response_limit=1 * 1024 * 1024,
+                        headers=raw_headers,
+                        request_body=body,
+                    )


What's the reasoning behind using PreserveLoggingContext() here? Why do we want to reset the LoggingContext to the SENTINEL_CONTEXT during the operation and then restore the old context?

As far as I can tell we're not doing any sort of fire-and-forget here where this would matter

Perhaps, it's because the way the Rust HTTP client handles the deferreds? In any case, it seems like we should have some wrapper around it that uses make_deferred_yieldable(...) to make things right so we don't have to do this in the downstream code.

Perhaps, it's because the way the Rust HTTP client handles the deferreds? In any case, it seems like we should have some wrapper around it that uses make_deferred_yieldable(...) to make things right so we don't have to do this in the downstream code.

The returned deferred does not follow the log context rules, so we need to make it follow the rules. The make_deferred_yieldable(...) function is a way of doing so, but it is equivalent to using with PreserveLoggingContext():, i.e. it clears the logcontext before awaiting (and so before execution passes back to the reactor) and restores the old context once the awaitable completes (execution passes from the reactor back to the code).

it seems like we should have some wrapper around it that uses make_deferred_yieldable(...) to make things right so we don't have to do this in the downstream code.

Addressing this in #18903

So downstream usage doesn't need to use `PreserveLoggingContext()` or `make_deferred_yieldable` Spawning from #18870 and #18357 (comment)

@MadLittleMods

Wrap the Rust HTTP client with `make_deferred_yieldable` so downstream usage doesn't need to use `PreserveLoggingContext()` or `make_deferred_yieldable`. > it seems like we should have some wrapper around it that uses [`make_deferred_yieldable(...)`](https://github.com/element-hq/synapse/blob/40edb10a98ae24c637b7a9cf6a3003bf6fa48b5f/docs/log_contexts.md#where-you-create-a-new-awaitable-make-it-follow-the-rules) to make things right so we don't have to do this in the downstream code. > > *-- @MadLittleMods, #18357 (comment) Spawning from wanting to [remove `PreserveLoggingContext()` from the codebase](#18870) and thinking that we [shouldn't have to pollute all downstream usage with `PreserveLoggingContext()` or `make_deferred_yieldable`](#18357 (comment)) Part of #18905 (Remove `sentinel` logcontext where we log in Synapse)

erikjohnston added 4 commits April 23, 2025 15:02

Rust impl

a20a0bc

Bump minimum rust version

7254716

Logging/tracing

95f3b24

Also inject opentracing headers

34c65ee

erikjohnston requested a review from a team as a code owner April 23, 2025 14:30

erikjohnston added 3 commits April 23, 2025 15:31

Newsfile

44a301a

Bump CI rust toolchains

27da4ec

Remove spurious empty file

61b7ed0

sandhose reviewed Apr 23, 2025

View reviewed changes

erikjohnston added 7 commits April 23, 2025 16:16

Use vendored native-tls, and remove unused dep

236d7a7

Update nightly

59473e3

Use rustls instead of openssl

a8a34bb

This is because openssl cannot be used in a manylinux context due to lack of stable ABI.

Bump clippy to get rid of spurious lint

d5845d5

But not that far

540b460

Review comments

48d2217

Fix tests

f501987

erikjohnston requested a review from sandhose April 24, 2025 11:24

sandhose requested changes Apr 24, 2025

View reviewed changes

erikjohnston added 3 commits April 24, 2025 16:31

Review comments

f2bd5b0

Move twisted reactor import to static as well

6617341

Only import twisted reactor after twisted has already been imported

7283b6b

erikjohnston requested a review from sandhose April 25, 2025 12:48

MadLittleMods added the rust label Jun 3, 2025

Merge remote-tracking branch 'origin/develop' into erikj/rust_http_in…

25c05d4

…trospect

MadLittleMods mentioned this pull request Jun 9, 2025

The sytest (bullseye, multi-postgres, workers, asyncio) CI job is flaky #18507

Closed

sandhose approved these changes Jun 10, 2025

View reviewed changes

erikjohnston merged commit f500c7d into develop Jun 16, 2025
43 checks passed

erikjohnston deleted the erikj/rust_http_introspect branch June 16, 2025 15:41

erikjohnston added a commit that referenced this pull request Jun 17, 2025

Fix Cargo.lock after bad merge

327667d

Broke in #18357

erikjohnston mentioned this pull request Jun 17, 2025

Fix Cargo.lock after bad merge #18561

Merged

erikjohnston added a commit that referenced this pull request Jun 17, 2025

Fix Cargo.lock after bad merge (#18561)

3e57156

Broke in #18357

MadLittleMods reviewed Aug 22, 2025

View reviewed changes

MadLittleMods mentioned this pull request Aug 22, 2025

Stabilise MAS integration #18759

Merged

This was referenced Sep 9, 2025

Remove sentinel logcontext where we log in setup, start and exit #18870

Merged

Better explain logcontext in run_in_background(...), run_as_background_process(...), and the sentinel logcontext #18900

Merged

MadLittleMods added a commit that referenced this pull request Sep 9, 2025

Wrap the Rust HTTP client with make_deferred_yieldable

b7b0e23

So downstream usage doesn't need to use `PreserveLoggingContext()` or `make_deferred_yieldable` Spawning from #18870 and #18357 (comment)

MadLittleMods added a commit that referenced this pull request Sep 9, 2025

Wrap the Rust HTTP client with make_deferred_yieldable

986e15c

So downstream usage doesn't need to use `PreserveLoggingContext()` or `make_deferred_yieldable` Spawning from #18870 and #18357 (comment)

MadLittleMods mentioned this pull request Sep 9, 2025

Wrap the Rust HTTP client with make_deferred_yieldable #18903

Merged

3 tasks

Speed up MAS token introspection #18357

Speed up MAS token introspection #18357

Uh oh!

Conversation

erikjohnston commented Apr 23, 2025 • edited by MadLittleMods Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erikjohnston commented Jun 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

erikjohnston commented Apr 23, 2025 •

edited by MadLittleMods

Loading