Skip to content

feat(gateway): add configurable response write timeout #812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gitsrc
Copy link
Contributor

@gitsrc gitsrc commented Jan 23, 2025

This commit introduces a configurable response write timeout for the IPFS gateway.
The timeout can be set via the ResponseWriteTimeout field in the gateway configuration.
If not set, a default timeout of 30 seconds is applied.

The implementation includes:

  • A new timeoutResponseWriter struct that wraps the standard http.ResponseWriter
    and enforces the timeout.
  • A middleware function WithResponseWriteTimeout that applies the timeout logic
    to the HTTP handler chain.
  • Comprehensive unit tests to verify the timeout behavior under various scenarios.

The timeout ensures that slow or unresponsive clients do not indefinitely hold
server resources, improving the overall reliability and stability of the gateway.

This change also attempts to address the issue described in #679
by providing a mechanism to handle slow or stuck HTTP responses more gracefully.

@gitsrc gitsrc requested a review from lidel as a code owner January 23, 2025 14:35
Copy link

welcome bot commented Jan 23, 2025

Thank you for submitting this PR!
A maintainer will be here shortly to review it.
We are super grateful, but we are also overloaded! Help us by making sure that:

  • The context for this PR is clear, with relevant discussion, decisions
    and stakeholders linked/mentioned.

  • Your contribution itself is clear (code comments, self-review for the
    rest) and in its best form. Follow the code contribution
    guidelines

    if they apply.

Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment.
Next steps:

  • A maintainer will triage and assign priority to this PR, commenting on
    any missing things and potentially assigning a reviewer for high
    priority items.

  • The PR gets reviews, discussed and approvals as needed.

  • The PR is merged by maintainers when it has been approved and comments addressed.

We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution.
We are very grateful for your contribution!

Copy link
Contributor

@gammazero gammazero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This addresses what was asked for in #679 but I think that if can live without the ask, "Every time data is written successfully, the timer is reset" then things can be much simpler. For example: #818

Comment on lines +121 to +122
// WithResponseWriteTimeout creates middleware for response write timeout handling
func WithResponseWriteTimeout(next http.Handler, timeout time.Duration) http.Handler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not appear to need to be exported.

Suggested change
// WithResponseWriteTimeout creates middleware for response write timeout handling
func WithResponseWriteTimeout(next http.Handler, timeout time.Duration) http.Handler {
// withResponseWriteTimeout creates middleware for response write timeout handling
func withResponseWriteTimeout(next http.Handler, timeout time.Duration) http.Handler {

ResponseWriter: origWriter,
timeout: timeout,
timer: time.NewTimer(timeout),
requestCtx: ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used here

Suggested change
requestCtx: ctx,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, why we need requestCtx @gitsrc, does not seem to be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this variable is not used and can be deleted.

@gitsrc
Copy link
Contributor Author

gitsrc commented Feb 1, 2025

OK, I understand, it would be cleaner to set a uniform timeout.

This addresses what was asked for in #679 but I think that if can live without the ask, "Every time data is written successfully, the timer is reset" then things can be much simpler. For example: #818

Copy link

codecov bot commented Feb 5, 2025

Codecov Report

Attention: Patch coverage is 94.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 60.44%. Comparing base (9ea9632) to head (23dddd8).
Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
gateway/handler.go 93.75% 2 Missing and 1 partial ⚠️

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #812      +/-   ##
==========================================
- Coverage   60.48%   60.44%   -0.04%     
==========================================
  Files         244      243       -1     
  Lines       31121    31147      +26     
==========================================
+ Hits        18822    18827       +5     
- Misses      10623    10639      +16     
- Partials     1676     1681       +5     
Files with missing lines Coverage Δ
examples/gateway/common/handler.go 95.50% <100.00%> (+0.10%) ⬆️
gateway/gateway.go 83.54% <ø> (ø)
gateway/handler.go 77.48% <93.75%> (+1.20%) ⬆️

... and 11 files with indirect coverage changes

@lidel
Copy link
Member

lidel commented Feb 11, 2025

Triage note:

@lidel lidel added the P2 Medium: Good to have, but can wait until someone steps up label Feb 11, 2025
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to run this on kubo staging before merging. Pushing this to Kubo 0.35.

, in the meantime, needs additional test and some cleanup – details inline.

Comment on lines +52 to +54
// ResponseWriteTimeout is the maximum duration the gateway will wait for a response
// to be written before timing out and returning a 504 Gateway Timeout error.
// If not set, a default timeout will be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ResponseWriteTimeout is the maximum duration the gateway will wait for a response
// to be written before timing out and returning a 504 Gateway Timeout error.
// If not set, a default timeout will be used.
// ResponseWriteTimeout is the maximum duration the gateway will wait for
// new bytes to be retrieved from [gateway.IPFSBackend]. This timeout is
// reset on every [http.ResponseWriter] Write, which is protecting both
// client and server from wasting resources when parts of requested file or
// DAG have no providers and would hang forever.
// Setting to 0 disables this timeout.

Comment on lines +72 to +87
{
name: "Timer reset with staggered writes",
handler: func(w http.ResponseWriter, r *http.Request) {
for i := 0; i < 3; i++ {
select {
case <-time.After(200 * time.Millisecond): // Each write within timeout window
w.Write([]byte("chunk\n")) // Resets timer on each write
case <-r.Context().Done():
return
}
}
},
timeout: 300 * time.Millisecond,
expectStatus: http.StatusOK,
expectedChunks: 3,
},
Copy link
Member

@lidel lidel Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test like this, but that writes two chunks and timeouts on third.

We need to see what happens when Headers were already sent with status 200, but payload truncated due to timeout.

Perhaps it would be useful to w.Write([]byte("\n[TRUNCATED DUE TO gateway.IPFSBackend TIMEOUT]")) to make it easier for clients to understand what happened if they debug things?

Comment on lines +74 to +81
// Use the configured timeout or fall back to a default value
timeout := c.ResponseWriteTimeout
if timeout == 0 {
timeout = 30 * time.Second // Default timeout of 30 seconds
}

// Apply the timeout middleware
return WithResponseWriteTimeout(handler, timeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.. this makes disabling timeout impossible.

Let's change this to no timeout by default (Kubo and Rainbow will set the default themselves), and skip entire things if no timeout.

Suggested change
// Use the configured timeout or fall back to a default value
timeout := c.ResponseWriteTimeout
if timeout == 0 {
timeout = 30 * time.Second // Default timeout of 30 seconds
}
// Apply the timeout middleware
return WithResponseWriteTimeout(handler, timeout)
if c.ResponseWriteTimeout != 0 {
handler = WithResponseWriteTimeout(handler, timeout)
}
// Apply the timeout middleware
return handler

ResponseWriter: origWriter,
timeout: timeout,
timer: time.NewTimer(timeout),
requestCtx: ctx,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, why we need requestCtx @gitsrc, does not seem to be used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: run gofmt on this file

@lidel lidel removed their assignment Feb 28, 2025
@lidel lidel marked this pull request as draft March 4, 2025 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Medium: Good to have, but can wait until someone steps up
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants