Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Lwt UNIX adapter (httpaf-lwt-unix). #53

Closed
wants to merge 3 commits into from

Conversation

paurkedal
Copy link

Testing is limited to manual use use of the two provided examples (including with content greater than the buffer size).

@aantron
Copy link
Contributor

aantron commented Jun 13, 2018

Nice. I have another Lwt adapter locally. As part of it, I improved Faraday's Lwt writev, which gave a significant performance improvement. I can post that.

@paurkedal
Copy link
Author

@aantron That sounds good, is there also a reason to prefer your httpaf adaptor over this PR, or would that those be equivalent given the Faraday improvement?

@aantron
Copy link
Contributor

aantron commented Jun 13, 2018

I haven't compared the two deeply enough to say, but I would expect our two adapters to be equivalent, and I'm happy to defer to yours and help review it. I can post mine as a PR for comparison, if you wish. The Faraday improvement should help your PR just as well as mine, it just consists of using Lwt's actual writev instead of writing only the first iovec, so it's orthogonal to the adapters.

@paurkedal
Copy link
Author

A parallel PR is not necessary on my behalf then, but can you send your PR to faraday?

Copy link
Member

@seliopou seliopou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read loops in the client and server must check the return value of the read operation and buffer and unconsumed bytes for the next read. (I think this may actually be a problem in the async version as well. Gotta check that.)

Besides that, I think it looks good.

Lwt_bytes.read sock buffer 0 (Lwt_bytes.length buffer) >>= fun len ->
(if len = 0
then Server_connection.shutdown_reader conn
else Server_connection.read conn buffer ~off:0 ~len |> ignore);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server connection might not consume the entire buffer here. For example, if the input does not end on a token boundary. The code needs to check the return value of read and hold onto any bytes that were not consumed for the next call to read.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix that and update the benchmarks post.

Lwt_bytes.read sock buffer 0 (Lwt_bytes.length buffer) >>= fun len ->
(if len = 0
then Client_connection.shutdown_reader conn
else Client_connection.read conn buffer ~off:0 ~len |> ignore);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem as in the server case.

@seliopou
Copy link
Member

Ah, right. It's not a problem in the async runtime. See the Buffer module in there.

@aantron aantron mentioned this pull request Jun 14, 2018
@aantron
Copy link
Contributor

aantron commented Jun 14, 2018

Just posted the alternative adapter in #54, which does have a Buffer module. Not sure if it's actually right, though.

@paurkedal
Copy link
Author

paurkedal commented Jun 16, 2018

The alternative adaptor PR was very helpful, thanks! I pushed a revised commit including:

  • A benchmark, and thinks for pointing to the wrk2 tool, which I had missed.
  • Missing travis tests.
  • A fix to concurrency of the lwt echo server to accept more than one batch at a time.

I compared async and lwt adaptor of this PR using lwt-4.0.1, faraday-lwt-unix pinned to PR inhabitedtype/faraday#41, and the following parameters:

../wrk2/wrk \
    --rate 100K \
    --connections 1K \
    --timeout 5m \
    --duration 1m \
    --threads 4 \
    --latency \
    -H 'Connection: keep-alive' \
    http://127.0.0.1:8080

For async, which was left with 401 failed connections after wrk exited, I got:

Running 1m test @ http://127.0.0.1:8080
  4 threads and 1000 connections
  Thread calibration: mean lat.: 1256.413ms, rate sampling interval: 5099ms
  Thread calibration: mean lat.: 1702.336ms, rate sampling interval: 6062ms
  Thread calibration: mean lat.: 1631.083ms, rate sampling interval: 5734ms
  Thread calibration: mean lat.: 1614.525ms, rate sampling interval: 5693ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.54s     4.30s   18.79s    58.03%
    Req/Sec    17.39k   109.73    17.58k    60.61%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%   10.59s 
 75.000%   14.24s 
 90.000%   16.46s 
 99.000%   18.07s 
 99.900%   18.63s 
 99.990%   18.78s 
 99.999%   18.79s 
100.000%   18.81s 

  Detailed Percentile spectrum:
[...]

#[Mean    =    10538.543, StdDeviation   =     4301.066]
#[Max     =    18792.448, Total count    =      3390787]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  4111592 requests in 1.00m, 8.02GB read
Requests/sec:  68526.99
Transfer/sec:    136.85MB

and [updated after fixing reader threads,] for Lwt, which was left with 402 failed connections after wrk exited, I got:

Running 1m test @ http://127.0.0.1:8080
  4 threads and 1000 connections
  Thread calibration: mean lat.: 1474.271ms, rate sampling interval: 5402ms
  Thread calibration: mean lat.: 1434.490ms, rate sampling interval: 5238ms
  Thread calibration: mean lat.: 1436.178ms, rate sampling interval: 5238ms
  Thread calibration: mean lat.: 1071.930ms, rate sampling interval: 4370ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.14s     3.78s   16.28s    57.33%
    Req/Sec    18.41k   184.11    18.72k    60.53%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    9.14s 
 75.000%   12.44s 
 90.000%   14.34s 
 99.000%   15.79s 
 99.900%   16.24s 
 99.990%   16.28s 
 99.999%   16.29s 
100.000%   16.29s 

  Detailed Percentile spectrum:
[...]

#[Mean    =     9138.203, StdDeviation   =     3780.392]
#[Max     =    16277.504, Total count    =      3589723]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  4352011 requests in 1.00m, 8.49GB read
Requests/sec:  72533.64
Transfer/sec:    144.85MB

The lwt version fails with the following if it receives more than 1017 connections:

Fatal error: exception Unix.Unix_error(Unix.EINVAL, "select", "")
Raised by primitive operation at file "src/unix/lwt_engine.ml", line 419, characters 26-60
Called from file "src/unix/lwt_engine.ml", line 360, characters 8-19
Called from file "src/unix/lwt_main.ml", line 49, characters 4-78
Called from file "benchmarks/wrk_lwt_benchmark.ml", line 64, characters 2-50

I think this is due to a maximum FD_SETSIZE for select(2).

@paurkedal paurkedal force-pushed the lwt-unix branch 3 times, most recently from b935fd7 to 13a41a7 Compare June 16, 2018 13:45
@aantron
Copy link
Contributor

aantron commented Jun 16, 2018

@paurkedal, you should install libev on your system and install conf-libev from opam to fix the fd limit.

@paurkedal
Copy link
Author

Thanks, then I can present the same benchmark with --connections 10K. Async:

Running 1m test @ http://127.0.0.1:8080
  4 threads and 10000 connections
  Thread calibration: mean lat.: 178.965ms, rate sampling interval: 586ms
  Thread calibration: mean lat.: 180.601ms, rate sampling interval: 609ms
  Thread calibration: mean lat.: 177.598ms, rate sampling interval: 580ms
  Thread calibration: mean lat.: 179.642ms, rate sampling interval: 584ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   877.39ms    4.28s    0.86m    96.02%
    Req/Sec    14.44k   297.57    14.98k    75.00%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%   99.26ms
 75.000%  102.59ms
 90.000%  107.78ms
 99.000%   24.97s 
 99.900%   40.93s 
 99.990%   49.81s 
 99.999%    0.86m 
100.000%    0.86m 

  Detailed Percentile spectrum:
[...]
#[Mean    =      877.390, StdDeviation   =     4279.343]
#[Max     =    51838.976, Total count    =      2163624]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  3215609 requests in 1.00m, 6.27GB read
  Socket errors: connect 0, read 256, write 0, timeout 77366
Requests/sec:  53591.13
Transfer/sec:    107.02MB

Lwt:

Running 1m test @ http://127.0.0.1:8080
  4 threads and 10000 connections
  Thread calibration: mean lat.: 1446.249ms, rate sampling interval: 7196ms
  Thread calibration: mean lat.: 1445.937ms, rate sampling interval: 7192ms
  Thread calibration: mean lat.: 1446.922ms, rate sampling interval: 7192ms
  Thread calibration: mean lat.: 1447.244ms, rate sampling interval: 7196ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.67s     3.68s   17.14s    58.02%
    Req/Sec    16.47k   122.45    16.66k    55.00%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%   10.68s 
 75.000%   13.87s 
 90.000%   15.75s 
 99.000%   16.97s 
 99.900%   17.14s 
 99.990%   17.15s 
 99.999%   17.15s 
100.000%   17.15s 

  Detailed Percentile spectrum:
[...]
#[Mean    =    10674.276, StdDeviation   =     3677.675]
#[Max     =    17137.664, Total count    =      2469712]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  3672008 requests in 1.00m, 7.16GB read
Requests/sec:  61198.17
Transfer/sec:    122.21MB

Now, there is actually a difference, the Lwt adaptor has a much more even latency distribution.

@paurkedal
Copy link
Author

The last two pushes:

  • Recover from failure while handing an accept in the server loop of the benchmark.
  • Trigger a shutdown also for reader threads, and condition shutdowns on fd being open. This was already present in the async adaptor when I wrote the first sketch, but omitted for no good reason. The details are similar to @aantron's Lwt adaptor.

But, I am worried though about the leaking connections after wrk is done. By repeated re-connects, I can drive up the number of connections reported by the benchmark server until it reaches the ulimit -n and starts failing the accepts. Such lingering connections are also reported by the async benchmark, though it does not seem to cause the async benchmark to run out of file descriptors.

- Exit and relaunch reader and writer threads on `Yield.
- Handle ENOTCONN, presumably needed when remote end shuts down first.
@paurkedal
Copy link
Author

I push another commit which modifies the `Yield handling to restart IO threads instead of using Lwt.wait. This improves performance a bit. Here is a new benchmark using the same parameters as above:

Running 1m test @ http://127.0.0.1:8080
  4 threads and 10000 connections
  Thread calibration: mean lat.: 1347.739ms, rate sampling interval: 6643ms
  Thread calibration: mean lat.: 1348.357ms, rate sampling interval: 6643ms
  Thread calibration: mean lat.: 1348.356ms, rate sampling interval: 6643ms
  Thread calibration: mean lat.: 1348.387ms, rate sampling interval: 6643ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.88s     3.42s   15.98s    58.05%
    Req/Sec    17.08k   134.63    17.22k    60.00%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    9.93s 
 75.000%   12.83s 
 90.000%   14.59s 
 99.000%   15.73s 
 99.900%   15.90s 
 99.990%   15.96s 
 99.999%   15.97s 
100.000%   15.99s 

  Detailed Percentile spectrum:
[...]
#[Mean    =     9883.346, StdDeviation   =     3415.952]
#[Max     =    15982.592, Total count    =      2559610]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  3795075 requests in 1.00m, 7.40GB read
Requests/sec:  63249.22
Transfer/sec:    126.31MB

This does not solve the connection leak issue though.

@anmonteiro
Copy link
Contributor

anmonteiro commented Jun 28, 2018

Just like in my comment in #54, I ran the benchmark for this PR locally with the following results:

./wrk2/wrk \
    --rate 100K \
    --connections 10K \
    --timeout 5m \
    --duration 1m \
    --threads 4 \
    --latency \
    -H 'Connection: keep-alive' \
    http://127.0.0.1:8080

Running 1m test @ http://127.0.0.1:8080
  4 threads and 10000 connections
  Thread calibration: mean lat.: 91.703ms, rate sampling interval: 340ms
  Thread calibration: mean lat.: 92.328ms, rate sampling interval: 344ms
  Thread calibration: mean lat.: 91.787ms, rate sampling interval: 340ms
  Thread calibration: mean lat.: 92.775ms, rate sampling interval: 346ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   563.58ms  393.20ms   1.67s    51.36%
    Req/Sec    11.80k     0.98k   13.84k    74.77%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%  744.96ms
 75.000%  868.86ms
 90.000%    1.01s
 99.000%    1.26s
 99.900%    1.48s
 99.990%    1.67s
 99.999%    1.67s
100.000%    1.67s

  Detailed Percentile spectrum:
[...]

#[Mean    =      563.578, StdDeviation   =      393.199]
#[Max     =     1671.168, Total count    =      1766904]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  2706203 requests in 1.00m, 5.28GB read
  Socket errors: connect 5143, read 0, write 0, timeout 118289
Requests/sec:  45098.99
Transfer/sec:     90.06MB

@paurkedal
Copy link
Author

Thanks. I'll also do some benchmarking across different parameters and present a plot it in the afternoon.

@paurkedal
Copy link
Author

Here are my benchmarks:

latency
requestrate

The colors are blue for async, green for this PR, and red for #54. The three lines per color represent average, the average plus standard deviation, and the maximum. I used the Rec/Sec, rather than the one at the Requests/s at the bottom of the outputs. The parameters are the same as cited above, apart from the varying --rate.

@aantron
Copy link
Contributor

aantron commented Jun 29, 2018

@seliopou I think this PR and #54 have pretty much converged in terms of features and correctness, so I recommend merging this one.

I still sometimes see better performance from #54, but that could be because I am on WSL, and #54 is somehow implicitly over-optimized for my machine. Anyway, either PR can be easily tweaked later until it has the performance of the other, so this is not a big deal. Also, WSL is not important.

@paurkedal I suggest rewriting the echo server using Lwt_io.establish_server_with_client_socket, which was just released yesterday in Lwt 4.1.0.

@paurkedal
Copy link
Author

paurkedal commented Jul 1, 2018

I updated the echo server and benchmark to use Lwt_io.establish_server_with_client_socket and renamed the benchmark for consistency. I added a version constraint for lwt >= 4.1.0, which I can see from the CI conflicts with the current faraday-lwt-unix constraint, so this must be sorted out before releasing the next httpaf. A am still running with faraday-lwt-unix pinned to bd1a932.

For good measures, here is the new benchmark results using the same setup as before:
latency
requestrate
Notably the new version uses accept instead of accept_n, and as can be seen this has no noticeable impact on performance.

In case someone wants to reproduce the benchmarks, I published a gist containing the code. I noticed after writing it that wrk has scripting capabilities, so this could probably have been simplified.

@paurkedal
Copy link
Author

In case you want to merge this variant, the CI tests should recover after a rerun now that faraday has been updated.

@seliopou
Copy link
Member

Indeed it does build now. Given that #54 supports buffering the input, I'm gonna close this PR in favor of that one. Thanks for opening this up and getting the ball rolling on lwt support!

@seliopou seliopou closed this Jul 26, 2018
@paurkedal
Copy link
Author

Sure!

Lupus pushed a commit to Lupus/httpaf that referenced this pull request Dec 8, 2020
In some cases (especially when receiving a response with a close
delimited body), the peer closes the connection before we attempt to SSL
shutdown, leading to unnecessary reporting of errors such as
`Unix.Unix_error(Unix.EBADF)`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants