Skip to content

Conversation

ddworken
Copy link
Contributor

@ddworken ddworken commented Sep 4, 2025

Summary

  • Added automatic cleanup of idle sessions when session count exceeds threshold
  • Sessions inactive longer than idle timeout are terminated to free resources
  • Prevents memory exhaustion from clients that create sessions but never terminate them

Changes

  • Added configurable session idle timeout (default 30 minutes)
  • Added background cleanup task that activates when session count > 10k
  • Track last activity time for each session
  • Double-check idle status before cleanup to prevent race conditions

Testing

  • Added tests for idle session cleanup behavior
  • Added tests for threshold-based activation
  • Added tests for session activity tracking
  • All existing tests continue to pass

This ensures the server remains stable even when clients don't properly close their sessions.

Implement automatic cleanup of idle sessions when session count exceeds
configurable threshold (default 10k). Sessions inactive for longer than
the idle timeout (default 30 minutes) are terminated to free resources.

This prevents memory exhaustion from clients that create sessions but
never send termination requests.
- Replace untyped lambdas with properly typed async functions
- Fix pyright reportUnknownLambdaType errors
- Apply ruff formatting
@ddworken ddworken marked this pull request as ready for review September 4, 2025 23:35
@ddworken ddworken requested a review from a team as a code owner September 4, 2025 23:35
@ddworken ddworken requested a review from ochafik September 4, 2025 23:35
Copy link

@ochafik ochafik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ddworken for preparing this!

I'd feel more comfortable if we limited the number of open sessions first (and decline further sessions w/ a 503). There's no good way to label long sessions as inactive atm, as there are growing use cases (and spec change proposals) around long running tool calls for instance, so 30 min w/o activity may be perfectly fine / ditching these sessions would be a breaking change.

Also not sure the current changes prevent a crash caused by a very high number of sessions started at once?

@ochafik
Copy link

ochafik commented Sep 15, 2025

long running tool calls for instance, so 30 min w/o activity may be perfectly fine / ditching these sessions would be a breaking change.

Other suggestions:

  • Define idle as "no activity since N seconds and no pending request": while this would allow long running tools, it would still not allow patterns of, say, a client that's just listening to resource updates all day long (and reads said updated resources once a day to index them), so I probably wouldn't default to this either
  • Define idle as "no open connection / stream since N seconds" (i.e. connection was closed and not resumed for a while). Would keep track of # open streams for a given connection (maybe in the try/finally of run_server). Probably the least breaking change

WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants