Add Lwt UNIX adapter (httpaf-lwt-unix). #53

paurkedal · 2018-05-27T08:37:23Z

Testing is limited to manual use use of the two provided examples (including with content greater than the buffer size).

aantron · 2018-06-13T20:07:51Z

Nice. I have another Lwt adapter locally. As part of it, I improved Faraday's Lwt writev, which gave a significant performance improvement. I can post that.

paurkedal · 2018-06-13T21:11:09Z

@aantron That sounds good, is there also a reason to prefer your httpaf adaptor over this PR, or would that those be equivalent given the Faraday improvement?

aantron · 2018-06-13T21:15:26Z

I haven't compared the two deeply enough to say, but I would expect our two adapters to be equivalent, and I'm happy to defer to yours and help review it. I can post mine as a PR for comparison, if you wish. The Faraday improvement should help your PR just as well as mine, it just consists of using Lwt's actual writev instead of writing only the first iovec, so it's orthogonal to the adapters.

paurkedal · 2018-06-14T06:32:10Z

A parallel PR is not necessary on my behalf then, but can you send your PR to faraday?

seliopou

The read loops in the client and server must check the return value of the read operation and buffer and unconsumed bytes for the next read. (I think this may actually be a problem in the async version as well. Gotta check that.)

Besides that, I think it looks good.

seliopou · 2018-06-14T11:32:37Z

lwt-unix/httpaf_lwt_unix.ml

+        Lwt_bytes.read sock buffer 0 (Lwt_bytes.length buffer) >>= fun len ->
+        (if len = 0
+         then Server_connection.shutdown_reader conn
+         else Server_connection.read conn buffer ~off:0 ~len |> ignore);


The server connection might not consume the entire buffer here. For example, if the input does not end on a token boundary. The code needs to check the return value of read and hold onto any bytes that were not consumed for the next call to read.

I'll fix that and update the benchmarks post.

seliopou · 2018-06-14T11:33:02Z

lwt-unix/httpaf_lwt_unix.ml

+        Lwt_bytes.read sock buffer 0 (Lwt_bytes.length buffer) >>= fun len ->
+        (if len = 0
+         then Client_connection.shutdown_reader conn
+         else Client_connection.read conn buffer ~off:0 ~len |> ignore);


Same problem as in the server case.

seliopou · 2018-06-14T11:36:26Z

Ah, right. It's not a problem in the async runtime. See the Buffer module in there.

aantron · 2018-06-14T20:57:01Z

Just posted the alternative adapter in #54, which does have a Buffer module. Not sure if it's actually right, though.

paurkedal · 2018-06-16T11:13:38Z

The alternative adaptor PR was very helpful, thanks! I pushed a revised commit including:

A benchmark, and thinks for pointing to the wrk2 tool, which I had missed.
Missing travis tests.
A fix to concurrency of the lwt echo server to accept more than one batch at a time.

I compared async and lwt adaptor of this PR using lwt-4.0.1, faraday-lwt-unix pinned to PR inhabitedtype/faraday#41, and the following parameters:

../wrk2/wrk \
    --rate 100K \
    --connections 1K \
    --timeout 5m \
    --duration 1m \
    --threads 4 \
    --latency \
    -H 'Connection: keep-alive' \
    http://127.0.0.1:8080

For async, which was left with 401 failed connections after wrk exited, I got:

Running 1m test @ http://127.0.0.1:8080
  4 threads and 1000 connections
  Thread calibration: mean lat.: 1256.413ms, rate sampling interval: 5099ms
  Thread calibration: mean lat.: 1702.336ms, rate sampling interval: 6062ms
  Thread calibration: mean lat.: 1631.083ms, rate sampling interval: 5734ms
  Thread calibration: mean lat.: 1614.525ms, rate sampling interval: 5693ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.54s     4.30s   18.79s    58.03%
    Req/Sec    17.39k   109.73    17.58k    60.61%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%   10.59s 
 75.000%   14.24s 
 90.000%   16.46s 
 99.000%   18.07s 
 99.900%   18.63s 
 99.990%   18.78s 
 99.999%   18.79s 
100.000%   18.81s 

  Detailed Percentile spectrum:
[...]

#[Mean    =    10538.543, StdDeviation   =     4301.066]
#[Max     =    18792.448, Total count    =      3390787]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  4111592 requests in 1.00m, 8.02GB read
Requests/sec:  68526.99
Transfer/sec:    136.85MB

and [updated after fixing reader threads,] for Lwt, which was left with 402 failed connections after wrk exited, I got:

Running 1m test @ http://127.0.0.1:8080
  4 threads and 1000 connections
  Thread calibration: mean lat.: 1474.271ms, rate sampling interval: 5402ms
  Thread calibration: mean lat.: 1434.490ms, rate sampling interval: 5238ms
  Thread calibration: mean lat.: 1436.178ms, rate sampling interval: 5238ms
  Thread calibration: mean lat.: 1071.930ms, rate sampling interval: 4370ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.14s     3.78s   16.28s    57.33%
    Req/Sec    18.41k   184.11    18.72k    60.53%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    9.14s 
 75.000%   12.44s 
 90.000%   14.34s 
 99.000%   15.79s 
 99.900%   16.24s 
 99.990%   16.28s 
 99.999%   16.29s 
100.000%   16.29s 

  Detailed Percentile spectrum:
[...]

#[Mean    =     9138.203, StdDeviation   =     3780.392]
#[Max     =    16277.504, Total count    =      3589723]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  4352011 requests in 1.00m, 8.49GB read
Requests/sec:  72533.64
Transfer/sec:    144.85MB

The lwt version fails with the following if it receives more than 1017 connections:

Fatal error: exception Unix.Unix_error(Unix.EINVAL, "select", "")
Raised by primitive operation at file "src/unix/lwt_engine.ml", line 419, characters 26-60
Called from file "src/unix/lwt_engine.ml", line 360, characters 8-19
Called from file "src/unix/lwt_main.ml", line 49, characters 4-78
Called from file "benchmarks/wrk_lwt_benchmark.ml", line 64, characters 2-50

I think this is due to a maximum FD_SETSIZE for select(2).

aantron · 2018-06-16T14:31:40Z

@paurkedal, you should install libev on your system and install conf-libev from opam to fix the fd limit.

paurkedal · 2018-06-16T16:03:09Z

Thanks, then I can present the same benchmark with --connections 10K. Async:

Running 1m test @ http://127.0.0.1:8080
  4 threads and 10000 connections
  Thread calibration: mean lat.: 178.965ms, rate sampling interval: 586ms
  Thread calibration: mean lat.: 180.601ms, rate sampling interval: 609ms
  Thread calibration: mean lat.: 177.598ms, rate sampling interval: 580ms
  Thread calibration: mean lat.: 179.642ms, rate sampling interval: 584ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   877.39ms    4.28s    0.86m    96.02%
    Req/Sec    14.44k   297.57    14.98k    75.00%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%   99.26ms
 75.000%  102.59ms
 90.000%  107.78ms
 99.000%   24.97s 
 99.900%   40.93s 
 99.990%   49.81s 
 99.999%    0.86m 
100.000%    0.86m 

  Detailed Percentile spectrum:
[...]
#[Mean    =      877.390, StdDeviation   =     4279.343]
#[Max     =    51838.976, Total count    =      2163624]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  3215609 requests in 1.00m, 6.27GB read
  Socket errors: connect 0, read 256, write 0, timeout 77366
Requests/sec:  53591.13
Transfer/sec:    107.02MB

Lwt:

Running 1m test @ http://127.0.0.1:8080
  4 threads and 10000 connections
  Thread calibration: mean lat.: 1446.249ms, rate sampling interval: 7196ms
  Thread calibration: mean lat.: 1445.937ms, rate sampling interval: 7192ms
  Thread calibration: mean lat.: 1446.922ms, rate sampling interval: 7192ms
  Thread calibration: mean lat.: 1447.244ms, rate sampling interval: 7196ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.67s     3.68s   17.14s    58.02%
    Req/Sec    16.47k   122.45    16.66k    55.00%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%   10.68s 
 75.000%   13.87s 
 90.000%   15.75s 
 99.000%   16.97s 
 99.900%   17.14s 
 99.990%   17.15s 
 99.999%   17.15s 
100.000%   17.15s 

  Detailed Percentile spectrum:
[...]
#[Mean    =    10674.276, StdDeviation   =     3677.675]
#[Max     =    17137.664, Total count    =      2469712]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  3672008 requests in 1.00m, 7.16GB read
Requests/sec:  61198.17
Transfer/sec:    122.21MB

Now, there is actually a difference, the Lwt adaptor has a much more even latency distribution.

paurkedal · 2018-06-16T17:46:33Z

The last two pushes:

Recover from failure while handing an accept in the server loop of the benchmark.
Trigger a shutdown also for reader threads, and condition shutdowns on fd being open. This was already present in the async adaptor when I wrote the first sketch, but omitted for no good reason. The details are similar to @aantron's Lwt adaptor.

But, I am worried though about the leaking connections after wrk is done. By repeated re-connects, I can drive up the number of connections reported by the benchmark server until it reaches the ulimit -n and starts failing the accepts. Such lingering connections are also reported by the async benchmark, though it does not seem to cause the async benchmark to run out of file descriptors.

- Exit and relaunch reader and writer threads on `Yield. - Handle ENOTCONN, presumably needed when remote end shuts down first.

paurkedal · 2018-06-17T09:00:30Z

I push another commit which modifies the `Yield handling to restart IO threads instead of using Lwt.wait. This improves performance a bit. Here is a new benchmark using the same parameters as above:

Running 1m test @ http://127.0.0.1:8080
  4 threads and 10000 connections
  Thread calibration: mean lat.: 1347.739ms, rate sampling interval: 6643ms
  Thread calibration: mean lat.: 1348.357ms, rate sampling interval: 6643ms
  Thread calibration: mean lat.: 1348.356ms, rate sampling interval: 6643ms
  Thread calibration: mean lat.: 1348.387ms, rate sampling interval: 6643ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.88s     3.42s   15.98s    58.05%
    Req/Sec    17.08k   134.63    17.22k    60.00%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    9.93s 
 75.000%   12.83s 
 90.000%   14.59s 
 99.000%   15.73s 
 99.900%   15.90s 
 99.990%   15.96s 
 99.999%   15.97s 
100.000%   15.99s 

  Detailed Percentile spectrum:
[...]
#[Mean    =     9883.346, StdDeviation   =     3415.952]
#[Max     =    15982.592, Total count    =      2559610]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  3795075 requests in 1.00m, 7.40GB read
Requests/sec:  63249.22
Transfer/sec:    126.31MB

This does not solve the connection leak issue though.

anmonteiro · 2018-06-28T02:42:53Z

Just like in my comment in #54, I ran the benchmark for this PR locally with the following results:

./wrk2/wrk \
    --rate 100K \
    --connections 10K \
    --timeout 5m \
    --duration 1m \
    --threads 4 \
    --latency \
    -H 'Connection: keep-alive' \
    http://127.0.0.1:8080

Running 1m test @ http://127.0.0.1:8080
  4 threads and 10000 connections
  Thread calibration: mean lat.: 91.703ms, rate sampling interval: 340ms
  Thread calibration: mean lat.: 92.328ms, rate sampling interval: 344ms
  Thread calibration: mean lat.: 91.787ms, rate sampling interval: 340ms
  Thread calibration: mean lat.: 92.775ms, rate sampling interval: 346ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   563.58ms  393.20ms   1.67s    51.36%
    Req/Sec    11.80k     0.98k   13.84k    74.77%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%  744.96ms
 75.000%  868.86ms
 90.000%    1.01s
 99.000%    1.26s
 99.900%    1.48s
 99.990%    1.67s
 99.999%    1.67s
100.000%    1.67s

  Detailed Percentile spectrum:
[...]

#[Mean    =      563.578, StdDeviation   =      393.199]
#[Max     =     1671.168, Total count    =      1766904]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  2706203 requests in 1.00m, 5.28GB read
  Socket errors: connect 5143, read 0, write 0, timeout 118289
Requests/sec:  45098.99
Transfer/sec:     90.06MB

paurkedal · 2018-06-28T07:22:52Z

Thanks. I'll also do some benchmarking across different parameters and present a plot it in the afternoon.

paurkedal · 2018-06-28T21:42:29Z

Here are my benchmarks:

The colors are blue for async, green for this PR, and red for #54. The three lines per color represent average, the average plus standard deviation, and the maximum. I used the Rec/Sec, rather than the one at the Requests/s at the bottom of the outputs. The parameters are the same as cited above, apart from the varying --rate.

aantron · 2018-06-29T15:54:16Z

@seliopou I think this PR and #54 have pretty much converged in terms of features and correctness, so I recommend merging this one.

I still sometimes see better performance from #54, but that could be because I am on WSL, and #54 is somehow implicitly over-optimized for my machine. Anyway, either PR can be easily tweaked later until it has the performance of the other, so this is not a big deal. Also, WSL is not important.

@paurkedal I suggest rewriting the echo server using Lwt_io.establish_server_with_client_socket, which was just released yesterday in Lwt 4.1.0.

paurkedal · 2018-07-01T20:31:55Z

I updated the echo server and benchmark to use Lwt_io.establish_server_with_client_socket and renamed the benchmark for consistency. I added a version constraint for lwt >= 4.1.0, which I can see from the CI conflicts with the current faraday-lwt-unix constraint, so this must be sorted out before releasing the next httpaf. A am still running with faraday-lwt-unix pinned to bd1a932.

For good measures, here is the new benchmark results using the same setup as before:

Notably the new version uses accept instead of accept_n, and as can be seen this has no noticeable impact on performance.

In case someone wants to reproduce the benchmarks, I published a gist containing the code. I noticed after writing it that wrk has scripting capabilities, so this could probably have been simplified.

paurkedal · 2018-07-16T07:02:23Z

In case you want to merge this variant, the CI tests should recover after a rerun now that faraday has been updated.

seliopou · 2018-07-26T23:47:34Z

Indeed it does build now. Given that #54 supports buffering the input, I'm gonna close this PR in favor of that one. Thanks for opening this up and getting the ball rolling on lwt support!

paurkedal · 2018-07-27T06:13:12Z

Sure!

In some cases (especially when receiving a response with a close delimited body), the peer closes the connection before we attempt to SSL shutdown, leading to unnecessary reporting of errors such as `Unix.Unix_error(Unix.EBADF)`

seliopou reviewed Jun 14, 2018

View reviewed changes

aantron mentioned this pull request Jun 14, 2018

Alternative Lwt adapter #54

Merged

aantron mentioned this pull request Jun 14, 2018

Use Lwt_unix.writev inhabitedtype/faraday#41

Merged

paurkedal force-pushed the lwt-unix branch from 68f7559 to 6a96ed6 Compare June 16, 2018 10:48

paurkedal force-pushed the lwt-unix branch 3 times, most recently from b935fd7 to 13a41a7 Compare June 16, 2018 13:45

paurkedal force-pushed the lwt-unix branch from 13a41a7 to acf0e10 Compare June 16, 2018 16:21

Add Lwt UNIX adapter (httpaf-lwt-unix).

49393ad

paurkedal force-pushed the lwt-unix branch from acf0e10 to 49393ad Compare June 16, 2018 17:35

Improve performance of lwt-unix adaptor.

dcc6dc2

- Exit and relaunch reader and writer threads on `Yield. - Handle ENOTCONN, presumably needed when remote end shuts down first.

aantron mentioned this pull request Jun 20, 2018

Lwt_io.establish_server (TCP servers): expose client socket to connection-handling callback ocsigen/lwt#586

Merged

paurkedal mentioned this pull request Jun 21, 2018

Connections leaks in server #57

Closed

Establish server with Lwt_io and rename benchmark.

1a91324

paurkedal force-pushed the lwt-unix branch from f92acfe to 1a91324 Compare July 1, 2018 19:50

seliopou closed this Jul 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Lwt UNIX adapter (httpaf-lwt-unix). #53

Add Lwt UNIX adapter (httpaf-lwt-unix). #53

paurkedal commented May 27, 2018

aantron commented Jun 13, 2018

paurkedal commented Jun 13, 2018

aantron commented Jun 13, 2018

paurkedal commented Jun 14, 2018

seliopou left a comment

seliopou Jun 14, 2018

paurkedal Jun 16, 2018

seliopou Jun 14, 2018

seliopou commented Jun 14, 2018

aantron commented Jun 14, 2018

paurkedal commented Jun 16, 2018 •

edited

Loading

aantron commented Jun 16, 2018

paurkedal commented Jun 16, 2018

paurkedal commented Jun 16, 2018

paurkedal commented Jun 17, 2018

anmonteiro commented Jun 28, 2018 •

edited

Loading

paurkedal commented Jun 28, 2018

paurkedal commented Jun 28, 2018

aantron commented Jun 29, 2018

paurkedal commented Jul 1, 2018 •

edited

Loading

paurkedal commented Jul 16, 2018

seliopou commented Jul 26, 2018

paurkedal commented Jul 27, 2018

Add Lwt UNIX adapter (httpaf-lwt-unix). #53

Add Lwt UNIX adapter (httpaf-lwt-unix). #53

Conversation

paurkedal commented May 27, 2018

aantron commented Jun 13, 2018

paurkedal commented Jun 13, 2018

aantron commented Jun 13, 2018

paurkedal commented Jun 14, 2018

seliopou left a comment

Choose a reason for hiding this comment

seliopou Jun 14, 2018

Choose a reason for hiding this comment

paurkedal Jun 16, 2018

Choose a reason for hiding this comment

seliopou Jun 14, 2018

Choose a reason for hiding this comment

seliopou commented Jun 14, 2018

aantron commented Jun 14, 2018

paurkedal commented Jun 16, 2018 • edited Loading

aantron commented Jun 16, 2018

paurkedal commented Jun 16, 2018

paurkedal commented Jun 16, 2018

paurkedal commented Jun 17, 2018

anmonteiro commented Jun 28, 2018 • edited Loading

paurkedal commented Jun 28, 2018

paurkedal commented Jun 28, 2018

aantron commented Jun 29, 2018

paurkedal commented Jul 1, 2018 • edited Loading

paurkedal commented Jul 16, 2018

seliopou commented Jul 26, 2018

paurkedal commented Jul 27, 2018

paurkedal commented Jun 16, 2018 •

edited

Loading

anmonteiro commented Jun 28, 2018 •

edited

Loading

paurkedal commented Jul 1, 2018 •

edited

Loading