Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

canceling a stream many times on the same client in quick succession triggers a Received RST_STREAM with code 2 error #2872

Closed
gunters63 opened this issue Dec 17, 2024 · 14 comments

Comments

@gunters63
Copy link

gunters63 commented Dec 17, 2024

Problem description

I am currently debugging an issue I have with a HTTP proxy failing after a certain number of canceled grpc streaming calls.
So I tried to write a small reproduction, but I have problems even when no proxy is involved.
If I call a streaming call which I cancel server side many times in quick succession, the call fails with a

❯ node test.mjs
server listening on 50052
1002
node:internal/modules/run_main:104
    triggerUncaughtException(
    ^
{
  code: 13,
  details: 'Received RST_STREAM with code 2 triggered by internal client error: Session closed with error code 2',
  metadata: Metadata { internalRepr: Map(0) {}, options: {} }
}

after roughly 1000 calls. If I insert a small delay between the calls it works.

Reproduction steps

comment out the the line await new Promise((resolve) => setTimeout(resolve, 5));
and run node test.mjs:

import * as grpc from "@grpc/grpc-js";
import * as protoLoader from "@grpc/proto-loader";

const __dirname = import.meta.dirname;
const PROTO_PATH = __dirname + "/protos/echo.proto";

const packageDefinition = protoLoader.loadSync(PROTO_PATH, {
  keepCase: true,
  longs: String,
  enums: String,
  defaults: true,
  oneofs: true,
});
const echoProto =
  grpc.loadPackageDefinition(packageDefinition).grpc.examples.echo;

function bidirectionalStreamingEcho(call) {
  call.on("data", (value) => {
    const message = value.message;
    call.write({ message: message });
  });
  // Either 'end' or 'cancelled' will be emitted when the call is cancelled
  call.on("end", () => {
    call.end();
  });
  call.on("cancelled", () => {
  });
}

const serviceImplementation = {
  bidirectionalStreamingEcho,
};

function startServer() {
  return new Promise((resolve, reject) => {
    const port = 50052;
    const server = new grpc.Server();
    server.addService(echoProto.Echo.service, serviceImplementation);
    server.bindAsync(
      `0.0.0.0:${port}`,
      grpc.ServerCredentials.createInsecure(),
      (error, p) => {
        if (error) {
          console.error(error, "could not start server");
          reject(error);
        } else {
          console.log(`server listening on ${p}`);
          resolve();
        }
      }
    );
  });
}

function callServer(client) {
  return new Promise((resolve, reject) => {
    const call = client.bidirectionalStreamingEcho();
    const EXPECTED_MESSAGES = 5;
    let receivedMessages = 0;
    call.on("data", (value) => {
      receivedMessages += 1;
      if (receivedMessages >= EXPECTED_MESSAGES) {
        call.cancel();
      }
    });
    call.on("status", (status) => {
      if (status.code === grpc.status.OK || status.code === grpc.status.CANCELLED) {
        resolve();
      } else {
        reject(status);
      }
    });
    call.on("error", () => {
      // Ignore error event
    });
    for (let i = 0; i < EXPECTED_MESSAGES; i++) {
      call.write({ message: `hello: ${i.toString()}` });
    }
    call.end();
  });
}

async function main() {
  await startServer();
  const client = new echoProto.Echo(
    "localhost:50052",
    grpc.credentials.createInsecure()
  );
  for (let i = 1; i <= 10000; i++) {
    process.stdout.write(`\r${i}`)
    await callServer(client);
    // comment this line to see the issue: 
    // 'Received RST_STREAM with code 2 triggered by internal client error: Session closed with error code 2'
    // after about 1000 iterations
    await new Promise((resolve) => setTimeout(resolve, 5));
  }
  console.log("\ndone");
}

await main();
process.exit(0);

Environment

  • OS name, version and architecture: [e.g. Arch Linux amd64]
  • Node version [e.g. 23.4.0]
  • Node installation method [e.g. arch package installer]
  • Package name and version [e.g. [email protected]]
@gunters63
Copy link
Author

It seems that the server side sent a go away after approx. 1000 canceled calls:

image

Am I doing something wrong with calling the server?

@gunters63
Copy link
Author

gunters63 commented Dec 18, 2024

I made a repository with the reproduction here: https://github.com/gunters63/reply-from-bug

I added a variation of the reproduction using server streaming calls, this time it always fails after about 1450-1500 calls.

Maybe this is a bug in Node or nghttp2 (the http/2 C lib Node is using)?

@gunters63
Copy link
Author

I reproduced the problem also on Ubuntu 24.04 with Node v20.16

@gunters63
Copy link
Author

I converted the test to a pure http/2 equivalent (see pure-http2.js in the repo) and I still get the same error, this time after 1200 calls. So this is maybe an upstream problem in NodeJs.

@murgatroid99
Copy link
Member

I think the behavior you are seeing is not a bug at all, but rather Node's mitigation for the HTTP/2 rapid reset vulnerability. You are likely seeing the GOAWAY after a different number of requests in each test because the limit depends on the exact rate of RST_STREAM frames, as explained in a comment on the change.

The best way to avoid this problem is for your client to not cancel requests that it is handling normally. The intended use of cancellation is to handle unexpected situations on the client, when it decides after starting a call that it no longer needs the response. If the client knows in advance how many messages it wants from the server, it should communicate that information to the server in advance. Alternatively, it could half close on the client (call.end()) to signal to the server that it should complete the stream.

@gunters63
Copy link
Author

Thank you for that valuable information.

I am not sure how I can handle this im my application:

I have a UI running in a browser with dozens of React components on each screen. Each component wants to display a handful of sensor values in real time. These values are changing all the time and generate an endless stream of value change events.

Because the application runs in a web browser it has to use grpc-web which limits streaming to server-side only.

Each React component separately subscribes a handful of values in the grpc server backend through a server-side streaming call. The subscription pushes back the endless stream of value changes to the browser client. There can be dozens of streaming calls running concurrently at the same time. This works well because the browser and the grpc-web library automatically multiplexes the calls through a single HTTP/2 session.

When the user interacts with the UI he changes screens, panels etc, those React components unmount and the subscriptions should be terminated from the client side. In fact this happens in a useEffect in each React component. I didn´t see any other way yet than to cancel the streaming calls using an AbortController. Is there a better way to do this to prevent those mitigations to happen? I cannot tell the server to terminate connections from his side because it is not a bidi stream.

Until recently the browser application was talking directly to a gRPC on .NET Backend (.Net 8.0). The .NET backend never had a problem with this, everything was stable. But we wanted to route the grpc communication through a small HTTP forwarding proxy written in Node (using fastify and some plugins), that's when these problems showed up.

@gunters63
Copy link
Author

I tried my reproduction program now with a Node version prior to the RST mitigation patch, and indeed, it runs without a problem:

Node.js v23.4.0
❯ ~/Downloads/node-v20.6.1-linux-x64/bin/node -v
v20.6.1
❯ ~/Downloads/node-v20.6.1-linux-x64/bin/node pure-http2.js
HTTP/2 server is running on http://localhost:3000
10000Session closed

@gunters63
Copy link
Author

And yes, you can actually configure the rate limiting in the option object of Nodes createServer call:

streamResetBurst and streamResetRate. Sets the rate limit for the incoming stream reset (RST_STREAM frame). Both settings must be set to have any effect, and default to 1000 and 33 respectively.

And if I change my createServer line to this:

const server = http2.createServer({ streamResetBurst: Number.MAX_SAFE_INTEGER, streamResetRate: Number.MAX_SAFE_INTEGER });

the test runs to completion.

@murgatroid99
Copy link
Member

Because the application runs in a web browser it has to use grpc-web which limits streaming to server-side only.

This does change things. Your example was of a Node client making a bidirectional streaming request, so that is what I addressed in my response. You are correct that there is no way other than cancellation to stop an endless server-streaming call whenever the client wants to stop it.

It would probably be possible to expose configuration options for those http2 server options in the gRPC Server constructor, but that won't happen until at least after the new year.

Alternatively, you may just be able to have your client retry requests that fail this way. They could only be sent out on a new session, so they should succeed.

@gunters63
Copy link
Author

gunters63 commented Dec 18, 2024

My grpc Backend is not running under Node with grpc-js, so that wouldn't help with my problem, but maybe someone else has the same problem, so making this configurable surely won't hurt.

My problem is that I have a Node program proxy-forwarding grpc HTTP/2 requests from the browser to a grpc .Net backend, and the forwarding fails after a while because the server tells to client to go away.

Because I have to use HTTP/2 with TLS (browsers don't support plain) I have to use createSecureServer in the proxy:

const server = http2.createSecureServer({ streamResetBurst: Number.MAX_SAFE_INTEGER, streamResetRate: Number.MAX_SAFE_INTEGER });

but Nodes typings suddenly don't know about those properties (it works for plain createServer). I hope this is only bug in the typings :)

@murgatroid99
Copy link
Member

Are you using grpc-js at all in your production setup?

@gunters63
Copy link
Author

gunters63 commented Dec 19, 2024

I use grpc-js through the excellent protobuf-ts package (https://github.com/timostamm/protobuf-ts).

Hmm, now, that you mention it, while protobuf-ts has dependencies on grpc-js (didn´t see grpc-web), it maybe only uses it to import type definitions and enums when used in the browser through @protobuf-ts/grpcweb-transport. When implementing a Node grpc server with protobuf-ts you definitely need grpc-js at runtime.

I implement a mock grpc server running on Node to test the UI. This uses grpc-js . The production grpc server though is implemented in gRPC on .NET.

So, to answer your question, no, not at runtime.

@gunters63
Copy link
Author

gunters63 commented Dec 19, 2024

I tested the rate limiting parameters now with https createSecureServer, and they also work there:

https://github.com/gunters63/reply-from-bug/blob/main/pure-https.js

const server = http2.createSecureServer({
  key: fs.readFileSync("agent1-key.pem"),
  cert: fs.readFileSync("agent1-cert.pem"),
  streamResetBurst: Number.MAX_SAFE_INTEGER,
  streamResetRate: Number.MAX_SAFE_INTEGER,
});

They are not documented in the NodeJs documentation (only for http/2 createServer), and they are also missing in the Typescript definitions. I should report this as a bug in the Node repo I think.

@murgatroid99
Copy link
Member

I agree that this is a bug in Node.js itself, not in gRPC, so it would be good to file a bug for Node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants