Fsharp giraffe fortunes #3863

dv00d00 · 2018-06-15T13:58:29Z

Added fortunes test via default and custom HTML rendering.
Orm is Dapper.

Implemented custom renderer to write HTML into memory stream.

Html renderers comparison

Method	Mean	Error	StdDev	Scaled	Gen 0	Allocated
Standard	105.23 us	1.5500 us	2.1729 us	1.00	68.3594	105.15 KB
Custom	11.70 us	0.0697 us	0.1044 us	0.11	7.1411	10.99 KB

separated custom implementation from stock

dv00d00 · 2018-06-15T16:14:00Z

would appreciate a review @benaadams @dustinmoris @isaacabraham @gerardtoconnor @isaacabraham

benaadams · 2018-06-15T21:09:37Z

frameworks/FSharp/giraffe/benchmark_config.json

-        "approach": "Realistic",
-        "classification": "fullstack",
-        "database": "None",
+        "approach": "Stripped",


Its still Realistic and likely to become the defacto method once Utf8String becomes mainline dotnet/corefxlab#2350

In recent run its 1,347,868.6 rps vs 1,220,305.2 rps for giraffe-utf8plaintext vs giraffe

Also stripped won't show up in the results by default, so choose it carefully...

benaadams · 2018-06-15T21:10:48Z

frameworks/FSharp/giraffe/src/Tests/BenchmarkTests.fsproj

@@ -0,0 +1,20 @@
+<Project Sdk="Microsoft.NET.Sdk">


Benchmarks in the benchmarks? 😄

benaadams · 2018-06-15T21:11:40Z

frameworks/FSharp/giraffe/benchmark_config.json

        "notes": "",
        "versus": "aspcore"
      }
    },
    {
-      "utf8plaintext": {
+      "stripped": {


utf8direct?

gerardtoconnor

Looks good for the most part, biggest part to probably look into is the use of temporary Memory stream for rendering, try write to response body stream instead if possible.

gerardtoconnor · 2018-06-16T15:12:27Z

frameworks/FSharp/giraffe/src/App/StatefullRendering.fs

+        | ParentNode (e, nodes) -> do writeParentNode target e nodes
+        | VoidElement (n, attrs) -> do writeStartElement target n attrs
+
+    let renderHtmlToStream (encoding:Encoding) node = 


Why is there a separate memory stream being created rather then using the asp.net core buffers in response? The Memory stream is allocating a resizing byte array internally so writing into it, only to later flush into the response is just waste. we should look to write directly into the response body stream, example of a similar implementation to above is here ByteViewEngine and outperforms most due to writing directly body stream. Is there a important reason MS is being used I have overlooked?

Writing to the http out would mean I need to use task api for every write, which assuming recursive walk would result in more allocations for tasks and binds than memory stream where I could write without the need of async in the first place.

The other bit is that I am not entirely sure about the consequences of writing to the http out stream without knowing the content length upfront, I think kestrell would fallback into chunked content encoding, @benaadams might give a clue here.

This is why I provided the example of my ByteViewEngine, it does not use async api and therefore only makes small compromise on blocking. The writing of the small chunks will be hotpath into the kestrel buffers with no blocking until the buffer fills and needs to flush, the buffer is big enough and pages small enough that there will be minimal buffer flushes, on the flushes, unlike a DB roundtrip or file read request, there is no waiting on the return so minimal blocking. The whole operation can be a task, but it is performant on a single sync thread.

On the content length, not sure of the upfront content length omission, that is a little trickier but doable if needed.

We can pool & reuse the memory streams, .SetLength(0) will reset it once done, a more complex concise but painful way ... If we pre-compute & Cache the template length with no variables at start, we can iterate the variables to calculate their additional size each time before writing out to the buffer. the double iteration wont add much more time as the array (List) will already be CPU cached from prior run and was just loaded anyway from DB call, importantly no extra big MS array allocs.

Either way, upfront content length setting is not simple so if anyone has insight into the chunked content encoding, would save us wasting time on any complex work arounds.

The aspnetcore platform test does a straight write into the underline Kestrel pipeline (bypassing most of the aspnetcore stack)
content-length => 294,934 rps, ~3.12Gb/s

The middleware test writes to a (cached) stringbuiler, then does a single async write to the body
content-length => 238,135 rps

The mvc test goes via a controller, then renders via a view; so is the full kitchen sink
chunked => 105,687 rps

With h2o at 421,972 rps and doing interesting things with its database driver (which is being investigated aspnet/DataAccessPerformance#42); a lot may be lost in the db driver currently.

I assume; that even if giraffe went chunked, it should be faster than mvc as its closer to middleware in terms of layers of api stack involved?

Thanks @benaadams , this is helpful, fully agree, more often then not, DB calls/driver are the bottleneck and can make these other micro optimisations immaterial.

I think we should try avoid the chunking in principal, (despite being thin layer) it's bit lazy/cheat when pooling a MemoryStream isn't too difficult. I prefer the pooled MS over SB given there is wasted CPU & memory on converting to UTF-16 string, when the ultimate output is UTF-8, best to write & store as UTF-8 as the c# middleware test inefficiently is having to iterate & convert the SB string (utf-16) twice to first count the bytes (UTF-8), and then write the string (utf-16) to UTF-8.

The idea around pre-computing the base template size and then at runtime only calculating the incremental GetByteCount Sum of variables is probably a level of runtime complexity that will be slower then simply pooling buffers so can be ignored.

@dv00d00 do you think MemoryStream pooling way to go then, means you can leave current templating engine as is, just getting/returning MS from pool? It's probably reasonable to pre-allocate pool on start up (50 MemoryStreams of size 1024 bytes?).

fully agree, more often then not, DB calls/driver are the bottleneck and can make these other micro optimisations immaterial.

aspnetcore still can output ~3.12Gb/s; which is 30% of network capacity; h20 is at ~4.46Gb/s or 45%; assuming the Db transfer is similar they are both 60% - 90% of the network. Possibly for the next round that's bumping up to 100Gb/s network, so that may all change.

Also aspnetcore platform -> aspnetcore mvc on the same db driver is a 64% drop in perf; so its probably worth tweaking that last little bit out 😉

Also aspnetcore platform -> aspnetcore mvc on the same db driver is a 64% drop in perf; so its probably worth tweaking that last little bit out 😉

That's what we did with Giraffe, MVC being the last little bit out 😉.

With Giraffe we will continue to micro optimise don't worry, in addition to utilizing all the biggest wins in aspnetcore, it's best of both worlds for us, you're doing all the heavy lifting!

Results for current (merged) version of giraffe are available https://twitter.com/ben_a_adams/status/1008406784212656130

gerardtoconnor · 2018-06-16T15:16:28Z

frameworks/FSharp/giraffe/src/App/Stock.fs

+
+                let view = 
+                    data 
+                    |> Seq.append extra'


Using Seq module operations are poor efficiency as involve a lot of temp structures, the Custom implementation of adding extra is better (using lists), why not used here? Is it to be most "intuitive" F# implementation?

This is probably how I would write it without thinking of perf, will replace with list bit

gerardtoconnor · 2018-06-16T15:19:13Z

frameworks/FSharp/giraffe/src/App/Stock.fs

+                return! ctx.WriteBytesAsync bytes
+            }
+
+    GET >=> choose [


The "GET" Method filter is more correct but adding unneeded bloat to the pipeline when it is not a test condition is just wasted resource, I would recommend leaving the method filter out to squeeze tiny bit more perf.

thx for the spot, it should go

gerardtoconnor · 2018-06-16T15:22:15Z

frameworks/FSharp/giraffe/src/App/Stock.fs

+    GET >=> choose [
+        route "/plaintext" >=> text "Hello, World!" 
+        route "/json" >=> json { JsonMessage.message = "Hello, World!" }
+        route "/fortunes" >=> fortunes


The standard router is testing each route, one by one till it finds a match so this is inefficient for routes latter in the list like "/fortunes". Would be better to use TokenRouter (needs to be pushed out of beta to be used I believe), or as this is custom, create a custom routing function with a match/Dictionary on path as no complicated parsing required.

Will definitely take a look into it, thanks.

gerardtoconnor · 2018-06-16T15:34:31Z

frameworks/FSharp/giraffe/src/App/Custom.fs

+            ctx.Response.StatusCode <- 200
+            task {
+                do! ctx.Response.Body.WriteAsync(bytes, 0, bytes.Length)
+                return none


as mentioned above ^^: I am guessing you are not using "text" helper function to allow caching of known string bytes? Although "none" can be returned and will work, this is generally wrong usage as the "choice" functions will continue to try match routes etc until it gets "Some ctx", therefore, to prevent testing of further routes etc, best to return "Some ctx"

gerardtoconnor · 2018-06-16T15:35:56Z

frameworks/FSharp/giraffe/src/App/Custom.fs

+            ctx.Response.StatusCode <- 200
+            task {
+                do! ctx.Response.Body.WriteAsync(bytes, 0, bytes.Length)
+                return none


I am guessing you are not using "json" helper function to allow caching of known serialized data bytes? Although "none" can be returned and will work, this is generally wrong usage as the "choice" functions will continue to try match routes etc until it gets "Some ctx", therefore, to prevent testing of further routes etc, best to return "Some ctx"

gerardtoconnor · 2018-06-16T15:36:50Z

frameworks/FSharp/giraffe/src/App/Custom.fs

+        fun (_ : HttpFunc) (ctx : HttpContext) ->
+
+                let conn = new NpgsqlConnection(ConnectionString)
+                ctx.Response.RegisterForDispose conn


Out of curiosity, why is this used rather then "task { use conn = new … }" for disposable?

My reasoning was task build will create a closure for every bind internally, so I am trying to reduce this to the max.

gerardtoconnor · 2018-06-16T15:40:19Z

frameworks/FSharp/giraffe/src/App/StatefullRendering.fs

+
+        match attributes with
+        | [||] -> 
+            do target


why are the so many explicit "do" statements used in this module? is there an error if not explicitly included? usually "do" is only required for iteration body statements and constructor initialization code?

added posibility to pass output stream to the html renderer simplified routing for stock and custom cases

dv00d00 · 2018-06-17T20:12:57Z

Thanks for the review guys, will continue to work on it during next week.
I am a bit interested in memory stream pooling as well.

I've benchamrked pooled impl as in https://www.nuget.org/packages/Microsoft.IO.RecyclableMemoryStream/

The results are:

Method	Mean	Error	StdDev	Scaled	Gen 0	Allocated
NewMemoryStream	9.211 us	0.0491 us	0.0459 us	1.00	3.5858	11.06 KB
PooledMemoryStream	11.287 us	0.0776 us	0.0725 us	1.23	2.5177	7.78 KB

benaadams · 2018-06-17T21:33:53Z

Could use a ThreadStatic cache with a rent and return? (Is how I'd do it in C#); the RecyclableMemoryStream is more for large content streams; whereas this data never gets big.

gerardtoconnor · 2018-06-17T21:55:45Z

Updates look good.

Thanks for doing initial benchmark, I'm not too surprised, there is usually a large tipping point where these pools pay off. The high Thread lock/syncing on this is a bottleneck so Ben's suggestion is good, we need frequent single small buffers per thread rather then massive irregular ones.

Seems The standard benchmarkdotnet run will have less allocs & GC then TechEmpower , we can also tweak the pool configuration for blocksize etc. When full aspnetcore stack is running with it's memory footprint, there is less free to alloc before GC so in micro benchmarks like this, only minimal GC needed, will be higher in full application test.

I actually remembered, one of my test view engines might work better in this case TemplateViewEngine, as I could tweak it to do my original complex suggestion of counting the pre-complied elements and using a GetByteCount summation on the model and add it to the pre-computed model length? bit of a radical one but just throwing out there, all static elements of template are built into a rendering tree of byte arrays, with model rendered into the gaps writing direct to body.

dv00d00 · 2018-06-18T10:01:11Z

Ported @benaadams cache
Results:

Method	Mean	Error	StdDev	Median	Scaled	ScaledSD	Gen 0	Allocated
NewMemoryStream	11.91 us	0.2376 us	0.4404 us	11.69 us	1.00	0.00	7.0038	10.78 KB
CustomPool	11.54 us	0.2295 us	0.4366 us	11.49 us	0.97	0.05	4.7760	7.34 KB
MSPool	14.55 us	0.2907 us	0.8295 us	14.26 us	1.22	0.08	5.0507	7.78 KB

win win, will adopt it

dv00d00 · 2018-06-18T10:10:53Z

@gerardtoconnor I would really like to try out yours TemplateViews engine. Is it published on nuget?

not sure if we really need stock samples for plaintext and json

dv00d00 · 2018-06-18T15:38:47Z

I think it is ready.
Would appreciate a review by TechEmpower team.

Also not entirely sure about "stripped" approach for custom fortunes test, would "micro" classification be enough to justify custom html rendering? Rest of the code does not feel like something too wild?

benaadams · 2018-06-18T16:59:04Z

Would appreciate a review by TechEmpower team.

Also not entirely sure about "stripped" approach for custom fortunes test, would "micro" classification be enough to justify custom html rendering? Rest of the code does not feel like something too wild?

/cc @msmith-techempower @nbrady-techempower

NateBrady23 · 2018-06-18T23:07:37Z

@dv00d00 I think "stripped" approach makes sense here. Micro classification is saying you're providing just enough middleware or whatever to make the implementation work. Stripped approach is saying you're writing a lot more custom code specific for this use case, either because the framework doesn't provide the utilities or you're testing out different methods closer to the metal that you might want to incorporate into the framework. What you have now looks right to me. @msmith-techempower anything to add?

gerardtoconnor · 2018-06-19T20:43:13Z

@dv00d00 I have not got that TemplateViewEngine on nuget, it was only a POC, without the pre-counting of the model bytes to set the contentlength before so as is, it'll chunk so not much use, will have a look at tweaking it to be able to count & set content length.

For now I think your adapted MS pool from Ben's design is a great solution.

msmith-techempower · 2018-06-19T21:07:29Z

frameworks/FSharp/giraffe/benchmark_config.json

        "platform": ".NET",
        "flavor": "CoreCLR",
        "webserver": "Kestrel",
        "os": "Linux",
        "database_os": "Linux",
-        "display_name": "Giraffe",
+        "display_name": "Giraffe, Dapper",


Not a big fan of this, but since we don't have a key for the name of your database approach/library, I guess it will do.

Dmitry Kushnir added 22 commits June 12, 2018 14:27

initial commit

949ed89

application

3e0c166

removed static content

3c6d0b7

tidy up

e3f19f6

utf8json

9cebdac

typos

51fa142

added FSharp lang

5e99be4

typos

9b0b536

react to code review

3d3823a

cleanup

626ebfa

struct + generic on jsonutf8

4840227

renamed config name + minor cleanup

b8fad79

updated README.md

db2817a

renamed dockerfile

f7189e3

simplified routing, default behaviour for missing route is 404

8cd915f

removed boxing from utf8json handler

8243388

separated custom implementation from stock

stock and custom implementations

74ae1d7

benchmarks

7a71975

added docker file + configs

c517b24

merged master

e8b610b

cleanup

cbacce8

readme

ff5cb17

dv00d00 changed the title ~~Fsharp giraffe fortunes~~ WIP Fsharp giraffe fortunes Jun 15, 2018

Dmitry Kushnir added 5 commits June 15, 2018 17:19

config fixes

7e08d65

fixed config again, sigh...

6769441

connection string

d2f92a3

missing th

a7cc240

added missing utf-8 content type in response headers

11d00e1

dv00d00 changed the title ~~WIP Fsharp giraffe fortunes~~ RFC Fsharp giraffe fortunes Jun 15, 2018

cleanup + mini perf optimizations

0265682

benaadams reviewed Jun 15, 2018

View reviewed changes

gerardtoconnor reviewed Jun 16, 2018

View reviewed changes

gerardtoconnor mentioned this pull request Jun 16, 2018

FSharp giraffe #3859

Merged

Dmytro Kushnir added 2 commits June 17, 2018 22:49

react to codereview

5826708

added posibility to pass output stream to the html renderer simplified routing for stock and custom cases

cleanup

78b1ebe

dv00d00 changed the title ~~RFC Fsharp giraffe fortunes~~ WIP Fsharp giraffe fortunes Jun 17, 2018

custom memory stream pooling

a8ea10b

extracted fortunes into their own category

7584295

not sure if we really need stock samples for plaintext and json

dv00d00 changed the title ~~WIP Fsharp giraffe fortunes~~ Fsharp giraffe fortunes Jun 18, 2018

dv00d00 changed the title ~~Fsharp giraffe fortunes~~ WIP Fsharp giraffe fortunes Jun 18, 2018

cleanup

0bb0e5f

dv00d00 changed the title ~~WIP Fsharp giraffe fortunes~~ Fsharp giraffe fortunes Jun 18, 2018

msmith-techempower approved these changes Jun 19, 2018

View reviewed changes

msmith-techempower merged commit 047245a into TechEmpower:master Jun 19, 2018

dv00d00 mentioned this pull request Jun 28, 2018

Html view rendering performance giraffe-fsharp/Giraffe#281

Closed

roberthusak pushed a commit to roberthusak/FrameworkBenchmarks that referenced this pull request Nov 6, 2018

Fsharp giraffe fortunes (TechEmpower#3863)

26ce033

Fsharp giraffe fortunes #3863

Fsharp giraffe fortunes #3863

Conversation

dv00d00 commented Jun 15, 2018

dv00d00 commented Jun 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerardtoconnor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benaadams Jun 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dv00d00 commented Jun 17, 2018

benaadams commented Jun 17, 2018

gerardtoconnor commented Jun 17, 2018

dv00d00 commented Jun 18, 2018

dv00d00 commented Jun 18, 2018

dv00d00 commented Jun 18, 2018

benaadams commented Jun 18, 2018

NateBrady23 commented Jun 18, 2018

gerardtoconnor commented Jun 19, 2018

Choose a reason for hiding this comment

benaadams Jun 17, 2018 •

edited

Loading