-
Notifications
You must be signed in to change notification settings - Fork 393
Store the LoggingContext
in a ContextVar
#18871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store the LoggingContext
in a ContextVar
#18871
Conversation
# TODO: This function is a no-op now and should be removed in a follow-up PR. | ||
def make_deferred_yieldable(deferred: "defer.Deferred[T]") -> "defer.Deferred[T]": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make_deferred_yieldable
no longer does anything (no-op) but there a lot of references to clean-up. I think it would be better to do this in a follow-up PR than bulk up with this diff with changes that will cloud the main change we're trying to introduce.
Ideally, nothing from the Synapse homeserver would be logged against the `sentinel` | ||
context as we want to know where the logs came from. In practice, this is not always the | ||
case yet especially outside of request handling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over time, we can remove PreserveLoggingContext
from many scenarios that cause the sentinel
context to be used.
I've already started this separately in #18870
@@ -1,250 +0,0 @@ | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, these checks provide no value to us anymore. We don't have specific log rules to worry about anymore and the ContextVar
properly follows the context regardless.
self.assertEqual( | ||
current_context(), | ||
SENTINEL_CONTEXT, | ||
c1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, these new context values make sense. We're in context c1
, so current_context()
should be c1
.
if bool(os.environ.get("SYNAPSE_TEST_PATCH_LOG_CONTEXTS", False)): | ||
# We import here so that we don't have to install a bunch of deps when | ||
# running the packaging tox test. | ||
from synapse.util.patch_inline_callbacks import do_patch | ||
|
||
do_patch() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #18871 (comment) for why we've removed the patch_inline_callbacks
When we `daemonize`, we fork the process and cputime metrics get confused about the per-thread resource usage appearing to go backwards because we're comparing the resource usage (`rusage`) from the original process to the forked process. We now kick off the background tasks (`run_as_background_process`) after we have forked the process so the `rusage` we record when we `start` is in the same thread when we `stop`. Bad log examples from before: ``` synapse.logging.context - ERROR - _schedule_next_expiry-0 - utime went backwards! 0.050467 < 0.886526 synapse.logging.context - ERROR - _schedule_db_events-0 - stime went backwards! 0.009941 < 0.155106 synapse.logging.context - ERROR - wake_destinations_needing_catchup-0 - stime went backwards! 0.010175 < 0.130923 synapse.logging.context - ERROR - resume_sync_partial_state_room-0 - utime went backwards! 0.052898 < 0.886526 ``` Testing strategy: 1. Run with `daemonize: true` in your `homeserver.yaml` 1. `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Shutdown the server 1. Look for any bad log entries in your homeserver logs: - `Expected logging context sentinel but found main` - `Expected logging context main was lost` - `utime went backwards!`/`stime went backwards!`
return _current_context.get(SENTINEL_CONTEXT) | ||
|
||
|
||
def set_current_context(context: LoggingContextOrSentinel) -> LoggingContextOrSentinel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: The docstring needs to be updated
|
||
_thread_local = threading.local() | ||
_thread_local.current_context = SENTINEL_CONTEXT | ||
_current_context: ContextVar[LoggingContextOrSentinel] = ContextVar("current_context") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error: Called stop on logcontext POST-0 without recording a start rusage
There is a problem where the POST-0
LoggingContext
is somehow becoming the current context without a corresponding set_current_context(POST-0)
call.
See the lines marked red in the snippet below.
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.rest.client.test_rooms.RoomStateTestCase.test_get_state_event_cancellation
_trial_temp/test.log
2025-09-02 19:12:06-0500 [-] synapse.http.site - 304 - INFO - sentinel - asdf SynapseRequest render
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 662 - INFO - sentinel - asdf PreserveLoggingContext(POST-0).__enter__ nonce=meZAG
+ 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(POST-0) (previous=sentinel)
+ 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - POST-0 - asdf LoggingContext(POST-0).start usage_start=True
+ 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 675 - INFO - POST-0 - asdf PreserveLoggingContext(POST-0).__exit__ nonce=meZAG restoring old_context=sentinel
+ 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - POST-0 - asdf set_current_context(sentinel) (previous=POST-0)
+ 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - POST-0 - asdf LoggingContext(POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 662 - INFO - sentinel - asdf PreserveLoggingContext(sentinel).__enter__ nonce=xQsLm
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(sentinel) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(_handle_new_device_update_async-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - _handle_new_device_update_async-0 - asdf LoggingContext(_handle_new_device_update_async-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 675 - INFO - sentinel - asdf PreserveLoggingContext(sentinel).__exit__ nonce=xQsLm restoring old_context=sentinel
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(sentinel) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-_handle_new_device_update_async-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-_handle_new_device_update_async-0 - asdf LoggingContext(db-_handle_new_device_update_async-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-_handle_new_device_update_async-0 - asdf set_current_context(sentinel) (previous=db-_handle_new_device_update_async-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-_handle_new_device_update_async-0 - asdf LoggingContext(db-_handle_new_device_update_async-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-_handle_new_device_update_async-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-_handle_new_device_update_async-0 - asdf LoggingContext(db-_handle_new_device_update_async-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-_handle_new_device_update_async-0 - asdf set_current_context(sentinel) (previous=db-_handle_new_device_update_async-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-_handle_new_device_update_async-0 - asdf LoggingContext(db-_handle_new_device_update_async-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - _handle_new_device_update_async-0 - asdf set_current_context(sentinel) (previous=_handle_new_device_update_async-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - _handle_new_device_update_async-0 - asdf LoggingContext(_handle_new_device_update_async-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
- 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 662 - INFO - POST-0 - asdf PreserveLoggingContext(sentinel).__enter__ nonce=JvLjU
- 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - POST-0 - asdf set_current_context(sentinel) (previous=POST-0)
- 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - POST-0 - asdf LoggingContext(POST-0).stop usage_start=False rusage=True
- 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 488 - ERROR - POST-0 - asdf Called stop on logcontext POST-0 without recording a start rusage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this is a Twisted bug?
I was under the impression that Twisted supported ContextVar
's but now I'm not sure. All of the issues mentioned in matrix-org/synapse#10342 are resolved but there are other things in the Twisted tracker:
Unresolved issues:
- Support contextvars in Deferred twisted/twisted#9807
- Support contextvars in DelayedCall twisted/twisted#9824
Resolved issues:
- Support Contextvars in coroutines (inlineCallbacks/ensureDeferred) twisted/twisted#9719
- Contextvars support does not support
.reset
twisted/twisted#10301
And we could even be running into something unreported 🤷 Need to investigate more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From some more debugging, I think the ContextVar
is acting normally.
And this may just be the case that the LoggingContext
start
/stop
pattern isn't compatible with the ContextVar
we're using now. We'd have to maintain the log context rules 🤔.
In this case, it's the SynapseRequest.logcontext
where stop
is called because we have a PreserveLoggingContext
around SynapseRequest.render
which only kicks off the render and doesn't wait for it to finish so we stop
way before the request is done. And it's never re-started. So when other LoggingContext
utilities are used in the downstream code to set_current_context
, it will stop
the already stopped SynapseRequest.logcontext
.
I go back and forth on whether we can update things to work correctly. If I naively try to manage the lifetime myself by calling self.logcontext.__enter__
manually in SynapseRequest.render
, it still doesn't work out.
# Register background tasks required by this server. This must be done | ||
# somewhat manually due to the background tasks not being registered | ||
# unless handlers are instantiated. | ||
if hs.config.worker.run_background_tasks: | ||
hs.start_background_tasks() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split out this change to #18886 since it seems good in any case
And this PR may get stale
Spawning from #18871 [This change](6ce2f3e) was originally used to fix CPU time going backwards when we `daemonize`. While, we don't seem to run into this problem on `develop`, I still think this is a good change to make. We don't need background tasks running on a process that will soon be forcefully exited and where the reactor isn't even running yet. We now kick off the background tasks (`run_as_background_process`) after we have forked the process and started the reactor. Also as simple note, we don't need background tasks running in both halves of a fork.
Closing as I've decided to continue trudging in the |
This is a first step towards
ContextVar
basedLoggingContext
. This PR only goes as far as to store theLoggingContext
in aContextVar
instead of thread-local. But this still gives us the benefit of being able to remove the painful log context rule complexity around needing to make sure the thread-local is set correctly as awaitables are suspended and resumed in the Twisted reactor.Part of #10342 (previously matrix-org/synapse#10342)
This is purely based on @sandhose's branch which I've just picked up, kicked the tires, and brought forward to propose and merge.
This is spawning from adding
server_name
to theLoggingContext
and finding that we use thesentinel
LoggingContext
in many places (which means theserver_name
isn't tracked in those places). After removing thesentinel
LoggingContext
from a few places, it uncovered some places where we don't seem to be following the log context rules so things are getting messed up. Instead of trying to adapt a bunch of tricky areas to follow the rules, I decided to just try removing the need for the log context rules and just refactor to theContextVar
basedLoggingContext
.Testing strategy
daemonize: true
:poetry run synapse_homeserver --config-path homeserver.yaml
Expected logging context sentinel but found main
Expected logging context main was lost
Expected previous context
utime went backwards!
/stime went backwards!
Called stop on logcontext POST-0 without recording a start rusage
Todo
docs/log_contexts.md
tests/util/caches/test_descriptors.py
(synapse/util/patch_inline_callbacks.py
)synapse/rust/src/http_client.rs
Lines 247 to 251 in c68c5dd
Future:
PreserveLoggingContext
from many scenariossentinel
logcontext where we log insetup
,start
and exit #18870make_deferred_yieldable
Dev notes
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.