Skip to content

feat: implement MCP-compliant keep-alive functionality for server transports #430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 30, 2025

Conversation

tzolov
Copy link
Contributor

@tzolov tzolov commented Jul 30, 2025

  • Add KeepAliveScheduler utility class for configurable periodic session pings
  • Integrate keep-alive support in WebFlux, WebMVC, and HttpServlet SSE transport providers
  • Add keepAliveInterval configuration option to all transport provider builders
  • Deprecate existing constructors in favor of builder pattern with enhanced configuration
  • Update graceful shutdown to properly clean up keep-alive schedulers
  • Add unit tests for KeepAliveScheduler functionality
  • Configurable ping frequency to suit different network environments
  • Optional keep-alive (disabled by default) to avoid excessive network overhead

Resolves: #414, #158
Replaces #353

Motivation and Context

This change implements the Model Context Protocol specification recommendations for connection health detection. The MCP spec states that implementations SHOULD periodically issue pings to detect connection health, with configurable frequency and appropriate timeouts for different network environments.

Without keep-alive functionality, long-running MCP sessions over SSE connections can experience idle timeouts, leading to unexpected connection drops and poor user experience. This implementation provides:

  • Configurable periodic ping messages to maintain connection health
  • Prevention of idle connection timeouts in proxy servers and load balancers
  • Improved reliability for long-running MCP sessions
  • Compliance with MCP specification guidelines

How Has This Been Tested?

  • Unit Tests: Comprehensive test suite added for KeepAliveScheduler covering:
  • Integration Testing: Tested with existing SSE transport provider test suites to ensure backward compatibility

Breaking Changes

No breaking changes. This is a backward-compatible enhancement:

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Design Decisions

  • Keep-alive is disabled by default to avoid network overhead for applications that don't need it
  • Uses MCP's standard ping method for compatibility
  • Start the KeepAliveScheduler in provider's constructor if the keep-alive interval is provided.
  • Shutdown the KeepAliveScheduler in the closeGracefully() methods
  • Error Resilience: Failed pings are logged but don't stop the scheduler

Migration Path

Users can gradually adopt the new builder pattern:

// Old (still works, but deprecated)
new WebFluxSseServerTransportProvider(objectMapper, messageEndpoint);

// New (recommended)
WebFluxSseServerTransportProvider.builder()
    .objectMapper(objectMapper)
    .messageEndpoint(messageEndpoint)
    .keepAliveInterval(Duration.ofSeconds(30))
    .build();

@He-Pin
Copy link
Contributor

He-Pin commented Jul 30, 2025

Nice to have KeepAliveScheduler, thanks, otherwise the connection will be broken by middlebox。

Should it be PingScheduler instead of KeepAliveScheduler

@tzolov tzolov added this to the 0.11.0 milestone Jul 30, 2025
public Disposable start() {
if (this.isRunning.compareAndSet(false, true)) {

this.currentSubscription = Flux.interval(this.initialDelay, this.interval, this.scheduler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use maxIdle instead of this. If there are interactions between the servers and clients, then we do not need to ping it, in pekko-stream, there is an keepAlive operator for this. not sure about flux.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@He-Pin can you please elaborate on this?

Copy link
Member

@chemicL chemicL Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert on this, but potentially a more robust approach could be devised by hooking into the transport FluxSink and applying a switchOnNext with the result as Flux.just(result), followed a concat of the interval operator. Each next item would first cancel the ongoing pinging sequence and would trigger a new one.

That would address @He-Pin concerns I believe, but would be a more complicated solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tzolov If there is already an interaction happend between client and server, we should cancel the scheduled ping task, and reschedule a new one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chemicL, would current approach do the job?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as far as I see it, it's just some additional payload that could be avoided. All in all, this change only applies to the generic SSE stream for Streamable HTTP transports and the one-and-only SSE stream for the legacy SSE. I do understand @He-Pin's concern and we should come up with a more robust solution, but this is already a step in the right direction to provide something resolving an issue that idle clients face where they disconnect and can't make further progress unless they restart.

@tzolov tzolov force-pushed the server-keep-alive branch from 2ef82cf to 971122b Compare July 30, 2025 16:16
…nsports

- Add KeepAliveScheduler utility class for configurable periodic session pings
- Integrate keep-alive support in WebFlux, WebMVC, and HttpServlet SSE transport providers
- Add keepAliveInterval configuration option to all transport provider builders
- Deprecate existing constructors in favor of builder pattern with enhanced configuration
- Update graceful shutdown to properly clean up keep-alive schedulers
- Add unit tests for KeepAliveScheduler functionality

Implements MCP specification recommendations for connection health detection:
- Configurable ping frequency to suit different network environments
- Optional keep-alive (disabled by default) to avoid excessive network overhead
- Proper resource cleanup to prevent connection leaks

https://modelcontextprotocol.io/specification/2025-06-18/basic/utilities/ping#implementation-considerations

Resolves: modelcontextprotocol#414, modelcontextprotocol#158
Replaces modelcontextprotocol#353

Signed-off-by: Christian Tzolov <[email protected]>
@tzolov tzolov force-pushed the server-keep-alive branch from 971122b to 1840f63 Compare July 30, 2025 18:39
@tzolov tzolov merged commit ee2c891 into modelcontextprotocol:main Jul 30, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SSE Server drops client connection at 5 min.
3 participants