Skip to content

[✨ Triage] dotnet/runtime#120179 by MihaZupan - SocketsHttpHandler HTTP/2 pings are broken in multiple scenarios #1547

@MihuBot

Description

@MihuBot

Triage for dotnet/runtime#120179.
Repo filter: All networking issues.
MihuBot version: 554101.
Ping MihaZupan for any issues.

This is a test triage report generated by AI, aimed at helping the triage team quickly identify past issues/PRs that may be related.
Take any conclusions with a large grain of salt.

Tool logs
dotnet/runtime#120179: SocketsHttpHandler HTTP/2 pings are broken in multiple scenarios by MihaZupan
[Tool] Searching for HTTP/2 ping, SocketsHttpHandler ping, HTTP/2 connection heartbeat, GOAWAY frame stuck, gRPC stream stuck, HTTP/2 connection dead, SocketsHttpHandler heartbeat timer (IncludeOpen=True, IncludeClosed=True, IncludeIssues=True, IncludePullRequests=True, Repository=dotnet/runtime)
[Tool] Found 69 issues, 337 comments, 40 returned results (17526 ms)

Here are the most relevant related issues and discussions for issue #120179, "SocketsHttpHandler HTTP/2 pings are broken in multiple scenarios":


1. Issue #31198 (October 2019) - "configurable HTTP/2 PING timeouts in HttpClient"

  • Summary: This issue tracks the original design and implementation of HTTP/2 ping keep-alive support in SocketsHttpHandler. It discusses the need for pings to detect dead connections, especially for long-lived gRPC streams, and the design of APIs like KeepAlivePingDelay, KeepAlivePingTimeout, and KeepAlivePingPolicy. There is extensive discussion about when pings should be sent (with or without active requests), and the importance of aligning with gRPC server/client expectations.
  • Key Points:
    • The API was designed to allow pings only with active requests by default, but also to allow "Always" mode for scenarios where connections must be kept alive even when idle.
    • There was significant discussion about server-side enforcement (e.g., Google and gRPC servers closing connections if pings are sent without active streams).
    • The implementation was merged in PR #40257 (August 2020).
    • The heartbeat logic is tied to the connection pool's available connections, which is the root of the problem described in #120179.
  • Relevance: This issue and its resolution are the foundation for the current ping logic, including the limitation that pings only run on connections in the _availableHttp2Connections list.

2. PR #40257 (August 2020) - "Configure ping in HTTP2"

  • Summary: This PR implements the HTTP/2 ping keep-alive feature in SocketsHttpHandler, adding the relevant properties and logic.
  • Key Points:
    • The heartbeat timer is only started for connections in the available pool.
    • The PR and its review discuss the design trade-offs and the need for the timer to be active only when pings are enabled.
    • There is mention of the timer being stopped when the handler is disposed, which is one of the scenarios called out in #120179.
  • Relevance: The implementation details here directly lead to the scenarios where pings are not sent for connections that are not in the available pool (e.g., after GOAWAY or when the stream limit is reached).

3. Issue #29889 (June 2019) - "HTTP2: HttpClient does not respond to pings after GOAWAY when some streams are still active"

  • Summary: This issue discusses the behavior after a GOAWAY frame is received. The client should continue to respond to PING frames as long as there are active streams, but there was confusion and some disagreement about the correct behavior per the HTTP/2 spec.
  • Key Points:
    • The code at the time would stop responding to PINGs after GOAWAY, even if streams were still active.
    • There was discussion about distinguishing between user-initiated Dispose and GOAWAY from the server.
    • The issue was fixed as part of a broader cleanup, but the logic for "available" connections remained.
  • Relevance: This is a precursor to the scenario described in #120179, where a connection removed from the available pool due to GOAWAY no longer participates in heartbeat/ping logic.

4. Issue #113532 (March 2025) - "HTTP/2 pings don't work if the connection lifetime is zero"

  • Summary: This recent issue (by the same author as #120179) describes a bug where HTTP/2 pings are not sent if the connection lifetime is set to zero, due to an optimization that skips creating the heartbeat timer.
  • Key Points:
    • The heartbeat timer is only created if the connection is stored in the pool, which is not the case when lifetime is zero.
    • This is a niche but related scenario where the ping logic is inadvertently skipped.
  • Relevance: This is another example of the ping/heartbeat logic being too tightly coupled to connection pool state, leading to missed pings in certain configurations.

5. PR #115735 (May 2025) - "Fix HTTP/2 pings not working with zero connection lifetime"

  • Summary: This PR fixes the above issue by ensuring that the heartbeat timer is created if pings are enabled, even when the connection lifetime is zero.
  • Key Points:
    • The fix is to check the ping settings before skipping heartbeat timer creation.
    • The PR includes tests for this scenario.
  • Relevance: Shows recent work to address edge cases in the ping logic, but does not address the broader issue of pings being tied to the available pool.

6. Issue #57617 (August 2021–September 2025) - "[HTTP/2] Race between reading response and reset?"

  • Summary: This long-running issue discusses a rare but serious bug where HTTP/2 connections can get stuck or throw protocol errors, often in the context of long-lived gRPC streams and connection shutdowns (e.g., after GOAWAY).
  • Key Points:
    • There are reports of connections getting stuck after GOAWAY, with requests appearing to hang.
    • The discussion includes traces and attempts to reproduce, with some overlap to the scenario in #120179.
    • The issue remains difficult to reproduce and is still open for investigation.
  • Relevance: This issue provides real-world evidence that the problems described in #120179 are affecting users, especially in gRPC streaming scenarios.

7. Issue #62216 (November 2021–June 2022) - "[HTTP/2] RTT pings shutting down connection in gRPC CI - regression in .NET 6"

  • Summary: This issue tracks a regression where RTT pings cause connections to be closed by strict servers (e.g., Google backends), due to pings being sent at the wrong times.
  • Key Points:
    • The discussion focuses on the importance of only sending pings when allowed by server policy (e.g., only with active streams).
    • There is mention of the need to align ping logic with server expectations and the problems that arise when the logic is too simplistic.
  • Relevance: Highlights the importance of correct ping logic and the consequences of getting it wrong, especially for interoperability.

8. Issue #92840 (September 2023) - "Ping is not off by default in SocketsHttpHandler"

  • Summary: This issue was a false alarm about pings being sent by default, but the discussion confirms that the heartbeat logic is only triggered for connections in the available pool.
  • Key Points:
    • The heartbeat is only run for connections in the available pool, confirming the design described in #120179.
  • Relevance: Confirms the current design and its limitations.

9. Issue #61261 (November 2021–June 2022) - "Expose TCP Keep-alive on SocketsHttpHandler"

  • Summary: This issue is about TCP-level keep-alives, not HTTP/2 pings, but there is discussion about the distinction and the importance of HTTP/2 pings for application-level liveness.
  • Relevance: Provides context for why HTTP/2 pings are needed in addition to TCP keep-alives.

10. Issue #30387 (July 2019) - "HTTP2 connections are closed after ~2 minutes of no activity"

  • Summary: This issue is about idle timeouts, but the discussion includes how to configure connection lifetimes and the importance of keeping connections alive for long-lived scenarios.
  • Relevance: Context for why users care about keep-alive and ping settings.

Summary of Conclusions:

  • The current design of HTTP/2 ping/heartbeat logic in SocketsHttpHandler is tightly coupled to the connection pool's available connections list.
  • This leads to pings not being sent in scenarios where a connection is temporarily removed from the available pool (e.g., due to stream limit or after GOAWAY), even if there are still active streams.
  • This has been a known limitation since the feature was implemented, and recent issues (#113532, #120179) and user reports confirm that it leads to real-world problems, especially for long-lived gRPC streams.
  • There have been some targeted fixes for edge cases (e.g., zero connection lifetime), but the broader issue remains.
  • There is strong evidence from user reports and test failures that this logic needs to be revisited to ensure pings are sent as long as there are active streams, regardless of the connection's pool state.

Actionable Next Steps:

  • Consider redesigning the heartbeat/ping logic so that it is tied to the presence of active streams, not just the available pool.
  • Review and possibly refactor the logic in HttpConnectionPool and Http2Connection to ensure pings are sent in all scenarios where they are needed.
  • Add tests for the scenarios described in #120179 to prevent regressions.

If you need more details from any of the above issues or PRs, let me know!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions