Http upgrades #203

dhouse-js · 2021-04-26T15:08:55Z

This PR follows the same idea to implement the HTTP upgrade mechanism as #159, although does so based on the refactored request queues in #172.

This is a prelude to inhabitedtype#159 which introduces upgrade requests, with a few major changes in `Server_connection`. The goals here is to try to make queue management easier to reason about by folding bits of logic from `advance_request_queue_if_necessary` into `next_read_operation` and `next_write_operation` such that we only perform side-effects when the operation in question demands it. One of the ways I tried to make this easier to reason about was to make the `next_<read|write>_operation` functions very parallel. Getting the read operation starts out with a short-circuit for shutting down when the server can no longer make progress (reader is closed and queue is empty). This doesn't feel like it belongs here. Perhaps this check should be part of `advance_request_queue` with some extra logic triggering in `shutdown_reader`? After that, the next-operation functions use some very simple probing of the input/output state of `Reqd` to determine what to do next. Only in the case of `Complete` do we move into a separate function (to make it easier to read): `_final_<read|write>_operation`. In these functions, we decide if we should shutdown the respective reader/writer or consider the `reqd` complete and move it off the queue. What's happening is that we don't know if the write action or read action will be last, so each function checks the state of the other to see if they're both complete. When we do shift it off, we recursively ask for the next operation given the new queue state. In the case of the writer triggering the advancing, before we return the result, we wakeup the reader so that it can evaluate the next operation given the new queue state. Note that in the case of a non-persistent connection, the queue is never advanced and the connection is shut down when both sides are done. Though on the surface, these pieces feel fairly straightforward, there are still a slew of re-entrancy bugs to consider. I think there are two things that we can do to make this drastically easier to manage: 1. We call `t.request_handler` in two places, and this is mostly because we want to keep the invariant that the head of the request queue has already been passed off to the handler. I feel like splitting this up into a simple queue of unhandled requests and a [Reqd.t option] that represents the current request would be easier to manage. 2. It would be nice to schedule calls. Things like waking up the writer before you let the read loop know its next operation just immediately makes my mind fall apart and lose track of state. There's a fairly obvious solution of asking for a `schedule : (unit -> unit) -> unit` function from the runtime that promises to not call the thunk synchronously, but rather waits until it is outside of the read and write loops. But maybe we can solve it using what we have now, like establishing a contract that when the reader/writer is woken up, they must schedule their work for a fresh call stack and not immediately ask for operations.

This is because the writer is always woken by the appropriate calls that push chunks onto the body or writer or calls that close the body. Had to import an additional line from a recent band-aid fix regarding setting the flag on non-chunked streaming responses. It feels like we should find an alternative means of maintaining this piece of information.

We basically never want to call `Queue.clear` because the head of the queue has special semantic meaning. Instead, we never try to clear the queue and rely on the fact that the queue will never be advanced. This is easy to reason about because the only time we advance the request queue is when the current request is not persistent. I added an explicit test of this situation to build confidence. Additionally, there was an incorrect assertion that you couldn't finish a write with reads still pending. A test was added upstream and it no longer fails with this fix. The final change was some subtle but unused code. In the write loop, we have something that decides to shutdown the connection if the reader is closed, parallel to the next read operation. But this felt weird, the reader should always be awake in the case that it is closed, which means that either 1) it will shutdown the connection or 2) it will wait for the writer, which will wake the reader once it's advanced the request queue, and then it will shutdown the connection.

…efactor-request-queue

…ines. In these cases, the correct behaviour is to send the response back to the client and then close the connection. The client may have sent us bytes after the request header which are not HTTP, so we have no hope of recovering this connection. This is achieved by modifying the parser to stop if it detects that the client has requested an upgrade. This ensures that, regardless of whether or not [Reqd.input_state] is [Upgrade], the parser will not attempt to consume bytes that might not be HTTP.

The previous commit asserted that the correct response to the server declining to upgrade was to close the connection. This isn't right. For instance, if the web server is using the Negotiate authentication scheme (https://tools.ietf.org/html/rfc4559), the workflow looks like this: - The client sends a request - The server responds with Not_authorized and 'WWW-Authenticate: Negotiate' - The client re-sends the same request with a 'Authorization: Negotiate <creds>' header. So, if someone wants to do websockets on a kerberos-authenticated server (for instance), then we need to allow requests after a failed upgrade. I guess if the client sends an upgrade request and immediately starts speaking the new protocol without waiting for the response, we'll get a parse error, but there isn't much we can do. This commit achieves this by unwinding the changes to the parser. The declined-upgrade test is flipped round so that it proves subsequent requests *are* possible, rather than *are not* possible.

dhouse-js · 2021-05-24T11:01:22Z

Now that #172 is merged into master, I updated this to be based off of the master branch. There is no longer any spurious diff in this PR.

…ines. In these cases, the correct behaviour is to send the response back to the client and then close the connection. The client may have sent us bytes after the request header which are not HTTP, so we have no hope of recovering this connection. This is achieved by modifying the parser to stop if it detects that the client has requested an upgrade. This ensures that, regardless of whether or not [Reqd.input_state] is [Upgrade], the parser will not attempt to consume bytes that might not be HTTP.

The previous commit asserted that the correct response to the server declining to upgrade was to close the connection. This isn't right. For instance, if the web server is using the Negotiate authentication scheme (https://tools.ietf.org/html/rfc4559), the workflow looks like this: - The client sends a request - The server responds with Not_authorized and 'WWW-Authenticate: Negotiate' - The client re-sends the same request with a 'Authorization: Negotiate <creds>' header. So, if someone wants to do websockets on a kerberos-authenticated server (for instance), then we need to allow requests after a failed upgrade. I guess if the client sends an upgrade request and immediately starts speaking the new protocol without waiting for the response, we'll get a parse error, but there isn't much we can do. This commit achieves this by unwinding the changes to the parser. The declined-upgrade test is flipped round so that it proves subsequent requests *are* possible, rather than *are not* possible.

The removed indirection lets you trace responses to their conditions more easily.

dinosaure · 2021-07-05T19:07:16Z

@seliopou do you have any plan for this PR?

dhouse-js · 2021-07-08T14:03:16Z

@seliopou do you have any plan for this PR?

I've been working with @seliopou and @dpatti to get this merged. We need to land a couple of other PRs first, but it's being worked on. I think hopefully within a couple of weeks.

dinosaure · 2021-09-06T08:39:44Z

Any news? If you want some helps, I will happy to spend some times to test/participate 👍.

dinosaure · 2021-11-17T15:01:38Z

Gently ping about this PR. Do you have any news?

dhouse-js · 2021-11-17T19:29:39Z

Hi @dinosaure ! Sorry for the radio silence here. I have discussed this PR with @seliopou and @dpatti and we have agreed a way forward, but unfortunately I have been pretty slammed for time in the last few months. The good news is that we're using my patch internally at Jane Street and it seems to work well. I need to tidy it up a bit and then get it merged up into httpaf. The bad news is that I'm unlikely to have the time to do so until late December or early January. But I'll report back here as soon as I have progress :)

dinosaure · 2021-11-22T10:24:04Z

Thanks you 👍 I will very happy to see a release with this PR to be able then to homogenize my MirageOS stack. Again, if you need some fuels (ping @seliopou), I'm able to work a bit on this PR. Thanks again.

No intended change to functionality

async/httpaf_async.ml

lib/reqd.ml

lib/server_connection.ml

We don't want to invoke the handler separately in the read loop and write loop, we want to wait until both are upgraded and then invoke. I included a script that wraps `nc` as a way of testing. We don't have upgrade support on the clients yet, and we can't use something like `wscat` without also implementing the entire websocket stack, which is more trouble than it seems worth right now. This still has an issue that you can easily demonstrate with the included script -- the "hello" message gets read into the `Buffer` along with the headers and never passed along to the client. I don't know how we should really address this, but we can't put data back onto the `fd` that we intend to pass on to the handler, so we have to surface the buffered data somehow.

dpatti · 2022-01-17T17:11:40Z

@seliopou I pushed what I think is a more correct implementation for the async library. If you read the commit message, you'll see it has a buffering issue that I don't know how to resolve other than using a Reader.t throughout the lifetime of the connection. Maybe you have some ideas.

I think we need a similar treatment for the lwt side of things, but other than trying to translate the thing I did in async 1:1, I'm not really sure how to approach it.

Give it the same treatment as the async implementation in the parent commit.

We weren't handling empty reads correctly, so that meant it would just spin at 100% once it got EOF.

dpatti · 2022-03-12T17:49:50Z

There's one issue with the two runtime implementations: we can potentially pull more bytes off the socket than we need to parse the headers, at which point you drop those bytes on the floor when you hand the socket off to the user.

This is purely so that we can attach the big comment that describes why this is weird. The "improvements" that I mentioned in one of the comments are lost to time, but I believe it involved allowing the reader to be put into a yielded state once the parsing was complete and resumed once more input was requested. Ultimately, we won't spend the time trying to implement such a thing.

dpatti · 2022-06-22T23:33:57Z

Just needs the fixes to the runtimes (or assertions saying it's a known issue) and we're good to merge.

rawleyfowler · 2022-11-16T21:41:54Z

Is this still being worked on?

dinosaure · 2023-01-09T18:26:56Z

A gently ping 🙂 , again, if you need any help for the release process (testing, etc.), I'm available.

dpatti and others added 5 commits April 22, 2021 15:27

Implement HTTP upgrades.

0bab909

Implement HTTP upgrades.

b761c7e

dhouse-js mentioned this pull request Apr 26, 2021

Support upgrades via I/O operations #159

Open

dpatti force-pushed the refactor-request-queue branch from cce55fd to 77f216b Compare April 26, 2021 19:58

dhouse-js marked this pull request as ready for review April 27, 2021 10:09

dhouse-js added 4 commits April 27, 2021 11:17

Merge remote-tracking branch 'upstream/refactor-request-queue' into r…

a49ea3c

…efactor-request-queue

Merge branch 'refactor-request-queue' into http-upgrades

949d7d9

refactor

004e128

dhouse-js mentioned this pull request Apr 29, 2021

Fix parse error at EOF #207

Merged

dhouse-js added 2 commits April 29, 2021 14:55

More fixes for when the upgrade request is declined

d1ff221

dhouse-js changed the base branch from refactor-request-queue to master May 24, 2021 10:57

Merge branch 'master' into http-upgrades

2a36491

dhouse-js and others added 9 commits May 24, 2021 12:16

Fix compilation of examples

d45b85c

Implement HTTP upgrades.

05c797d

Implement HTTP upgrades.

e6cda20

refactor

40ae33c

More fixes for when the upgrade request is declined

5fcb69a

Fix compilation of examples

90de89e

update reader coercion in client tests

203f06c

dpatti force-pushed the http-upgrades branch from d45b85c to 203f06c Compare June 3, 2021 00:26

dpatti added 2 commits June 5, 2021 15:44

http-upgrades: require no request body

cee330c

http-upgrades: refactor input_state logic

81b4808

The removed indirection lets you trace responses to their conditions more easily.

dpatti and others added 6 commits June 5, 2021 15:46

http-upgrades: add tests and fix read loop issue

9faf49c

resolve conflicts

a978744

merge with master

6167c51

fix build

d3377dd

improve comment

5264434

fix conflicts

052bdcf

dinosaure mentioned this pull request Jul 6, 2021

Create an esy branch with an esy.json and no vendored directories aantron/dream#111

Open

minor refactoring

31a6c1a

No intended change to functionality

dpatti reviewed Jan 12, 2022

View reviewed changes

async/httpaf_async.ml Outdated Show resolved Hide resolved

lib/reqd.ml Show resolved Hide resolved

lib/server_connection.ml Outdated Show resolved Hide resolved

dpatti added 3 commits January 12, 2022 18:06

expose is_upgrade and fix up documentation

0c2ede1

Merge remote-tracking branch 'origin/master' into http-upgrades

0da6f10

dpatti added 3 commits March 7, 2022 14:48

fix lwt implementation and add upgrade helper

5294105

Give it the same treatment as the async implementation in the parent commit.

switch async upgrade to pass the socket

c7eeda5

fix lwt_echo_upgrade example

71e0486

We weren't handling empty reads correctly, so that meant it would just spin at 100% once it got EOF.

hannesm mentioned this pull request Mar 10, 2023

Wrap vendored modules to avoid build conflicts aantron/dream#250

Merged

dinosaure mentioned this pull request Sep 11, 2023

Upstream Jane Street Changes #227

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Http upgrades #203

Http upgrades #203

dhouse-js commented Apr 26, 2021

dhouse-js commented May 24, 2021

dinosaure commented Jul 5, 2021

dhouse-js commented Jul 8, 2021

dinosaure commented Sep 6, 2021

dinosaure commented Nov 17, 2021

dhouse-js commented Nov 17, 2021

dinosaure commented Nov 22, 2021

dpatti commented Jan 17, 2022

dpatti commented Mar 12, 2022

dpatti commented Jun 22, 2022

rawleyfowler commented Nov 16, 2022

dinosaure commented Jan 9, 2023

Http upgrades #203

Are you sure you want to change the base?

Http upgrades #203

Conversation

dhouse-js commented Apr 26, 2021

dhouse-js commented May 24, 2021

dinosaure commented Jul 5, 2021

dhouse-js commented Jul 8, 2021

dinosaure commented Sep 6, 2021

dinosaure commented Nov 17, 2021

dhouse-js commented Nov 17, 2021

dinosaure commented Nov 22, 2021

dpatti commented Jan 17, 2022

dpatti commented Mar 12, 2022

dpatti commented Jun 22, 2022

rawleyfowler commented Nov 16, 2022

dinosaure commented Jan 9, 2023