-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fsharp giraffe fortunes #3863
Fsharp giraffe fortunes #3863
Conversation
separated custom implementation from stock
would appreciate a review @benaadams @dustinmoris @isaacabraham @gerardtoconnor @isaacabraham |
"approach": "Realistic", | ||
"classification": "fullstack", | ||
"database": "None", | ||
"approach": "Stripped", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its still Realistic
and likely to become the defacto method once Utf8String
becomes mainline dotnet/corefxlab#2350
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In recent run its 1,347,868.6 rps vs 1,220,305.2 rps for giraffe-utf8plaintext
vs giraffe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also stripped
won't show up in the results by default, so choose it carefully...
@@ -0,0 +1,20 @@ | |||
<Project Sdk="Microsoft.NET.Sdk"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarks in the benchmarks? 😄
"notes": "", | ||
"versus": "aspcore" | ||
} | ||
}, | ||
{ | ||
"utf8plaintext": { | ||
"stripped": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utf8direct
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for the most part, biggest part to probably look into is the use of temporary Memory stream for rendering, try write to response body stream instead if possible.
| ParentNode (e, nodes) -> do writeParentNode target e nodes | ||
| VoidElement (n, attrs) -> do writeStartElement target n attrs | ||
|
||
let renderHtmlToStream (encoding:Encoding) node = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a separate memory stream being created rather then using the asp.net core buffers in response? The Memory stream is allocating a resizing byte array internally so writing into it, only to later flush into the response is just waste. we should look to write directly into the response body stream, example of a similar implementation to above is here ByteViewEngine and outperforms most due to writing directly body stream. Is there a important reason MS is being used I have overlooked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing to the http out would mean I need to use task api for every write, which assuming recursive walk would result in more allocations for tasks and binds than memory stream where I could write without the need of async in the first place.
The other bit is that I am not entirely sure about the consequences of writing to the http out stream without knowing the content length upfront, I think kestrell would fallback into chunked content encoding, @benaadams might give a clue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why I provided the example of my ByteViewEngine, it does not use async api and therefore only makes small compromise on blocking. The writing of the small chunks will be hotpath into the kestrel buffers with no blocking until the buffer fills and needs to flush, the buffer is big enough and pages small enough that there will be minimal buffer flushes, on the flushes, unlike a DB roundtrip or file read request, there is no waiting on the return so minimal blocking. The whole operation can be a task, but it is performant on a single sync thread.
On the content length, not sure of the upfront content length omission, that is a little trickier but doable if needed.
We can pool & reuse the memory streams, .SetLength(0) will reset it once done, a more complex concise but painful way ... If we pre-compute & Cache the template length with no variables at start, we can iterate the variables to calculate their additional size each time before writing out to the buffer. the double iteration wont add much more time as the array (List) will already be CPU cached from prior run and was just loaded anyway from DB call, importantly no extra big MS array allocs.
Either way, upfront content length setting is not simple so if anyone has insight into the chunked content encoding, would save us wasting time on any complex work arounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aspnetcore platform test does a straight write into the underline Kestrel pipeline (bypassing most of the aspnetcore stack)
content-length => 294,934 rps, ~3.12Gb/s
The middleware test writes to a (cached) stringbuiler, then does a single async write to the body
content-length => 238,135 rps
The mvc test goes via a controller, then renders via a view; so is the full kitchen sink
chunked => 105,687 rps
With h2o at 421,972 rps and doing interesting things with its database driver (which is being investigated aspnet/DataAccessPerformance#42); a lot may be lost in the db driver currently.
I assume; that even if giraffe went chunked, it should be faster than mvc as its closer to middleware in terms of layers of api stack involved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @benaadams , this is helpful, fully agree, more often then not, DB calls/driver are the bottleneck and can make these other micro optimisations immaterial.
I think we should try avoid the chunking in principal, (despite being thin layer) it's bit lazy/cheat when pooling a MemoryStream isn't too difficult. I prefer the pooled MS over SB given there is wasted CPU & memory on converting to UTF-16 string, when the ultimate output is UTF-8, best to write & store as UTF-8 as the c# middleware test inefficiently is having to iterate & convert the SB string (utf-16) twice to first count the bytes (UTF-8), and then write the string (utf-16) to UTF-8.
The idea around pre-computing the base template size and then at runtime only calculating the incremental GetByteCount Sum of variables is probably a level of runtime complexity that will be slower then simply pooling buffers so can be ignored.
@dv00d00 do you think MemoryStream pooling way to go then, means you can leave current templating engine as is, just getting/returning MS from pool? It's probably reasonable to pre-allocate pool on start up (50 MemoryStreams of size 1024 bytes?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fully agree, more often then not, DB calls/driver are the bottleneck and can make these other micro optimisations immaterial.
aspnetcore still can output ~3.12Gb/s; which is 30% of network capacity; h20 is at ~4.46Gb/s or 45%; assuming the Db transfer is similar they are both 60% - 90% of the network. Possibly for the next round that's bumping up to 100Gb/s network, so that may all change.
Also aspnetcore platform -> aspnetcore mvc on the same db driver is a 64% drop in perf; so its probably worth tweaking that last little bit out 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also aspnetcore platform -> aspnetcore mvc on the same db driver is a 64% drop in perf; so its probably worth tweaking that last little bit out 😉
That's what we did with Giraffe, MVC being the last little bit out 😉.
With Giraffe we will continue to micro optimise don't worry, in addition to utilizing all the biggest wins in aspnetcore, it's best of both worlds for us, you're doing all the heavy lifting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Results for current (merged) version of giraffe are available https://twitter.com/ben_a_adams/status/1008406784212656130
|
||
let view = | ||
data | ||
|> Seq.append extra' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Seq module operations are poor efficiency as involve a lot of temp structures, the Custom implementation of adding extra is better (using lists), why not used here? Is it to be most "intuitive" F# implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably how I would write it without thinking of perf, will replace with list bit
return! ctx.WriteBytesAsync bytes | ||
} | ||
|
||
GET >=> choose [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "GET" Method filter is more correct but adding unneeded bloat to the pipeline when it is not a test condition is just wasted resource, I would recommend leaving the method filter out to squeeze tiny bit more perf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx for the spot, it should go
GET >=> choose [ | ||
route "/plaintext" >=> text "Hello, World!" | ||
route "/json" >=> json { JsonMessage.message = "Hello, World!" } | ||
route "/fortunes" >=> fortunes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The standard router is testing each route, one by one till it finds a match so this is inefficient for routes latter in the list like "/fortunes". Would be better to use TokenRouter (needs to be pushed out of beta to be used I believe), or as this is custom, create a custom routing function with a match/Dictionary on path as no complicated parsing required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will definitely take a look into it, thanks.
ctx.Response.StatusCode <- 200 | ||
task { | ||
do! ctx.Response.Body.WriteAsync(bytes, 0, bytes.Length) | ||
return none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned above ^^: I am guessing you are not using "text" helper function to allow caching of known string bytes? Although "none" can be returned and will work, this is generally wrong usage as the "choice" functions will continue to try match routes etc until it gets "Some ctx", therefore, to prevent testing of further routes etc, best to return "Some ctx"
ctx.Response.StatusCode <- 200 | ||
task { | ||
do! ctx.Response.Body.WriteAsync(bytes, 0, bytes.Length) | ||
return none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am guessing you are not using "json" helper function to allow caching of known serialized data bytes? Although "none" can be returned and will work, this is generally wrong usage as the "choice" functions will continue to try match routes etc until it gets "Some ctx", therefore, to prevent testing of further routes etc, best to return "Some ctx"
fun (_ : HttpFunc) (ctx : HttpContext) -> | ||
|
||
let conn = new NpgsqlConnection(ConnectionString) | ||
ctx.Response.RegisterForDispose conn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why is this used rather then "task { use conn = new … }" for disposable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning was task build will create a closure for every bind internally, so I am trying to reduce this to the max.
|
||
match attributes with | ||
| [||] -> | ||
do target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are the so many explicit "do" statements used in this module? is there an error if not explicitly included? usually "do" is only required for iteration body statements and constructor initialization code?
added posibility to pass output stream to the html renderer simplified routing for stock and custom cases
Thanks for the review guys, will continue to work on it during next week. I've benchamrked pooled impl as in https://www.nuget.org/packages/Microsoft.IO.RecyclableMemoryStream/ The results are:
|
Could use a |
Updates look good. Thanks for doing initial benchmark, I'm not too surprised, there is usually a large tipping point where these pools pay off. The high Thread lock/syncing on this is a bottleneck so Ben's suggestion is good, we need frequent single small buffers per thread rather then massive irregular ones. Seems The standard benchmarkdotnet run will have less allocs & GC then TechEmpower , we can also tweak the pool configuration for blocksize etc. When full aspnetcore stack is running with it's memory footprint, there is less free to alloc before GC so in micro benchmarks like this, only minimal GC needed, will be higher in full application test. I actually remembered, one of my test view engines might work better in this case TemplateViewEngine, as I could tweak it to do my original complex suggestion of counting the pre-complied elements and using a GetByteCount summation on the model and add it to the pre-computed model length? bit of a radical one but just throwing out there, all static elements of template are built into a rendering tree of byte arrays, with model rendered into the gaps writing direct to body. |
Ported @benaadams cache
win win, will adopt it |
@gerardtoconnor I would really like to try out yours TemplateViews engine. Is it published on nuget? |
not sure if we really need stock samples for plaintext and json
I think it is ready. Also not entirely sure about "stripped" approach for custom fortunes test, would "micro" classification be enough to justify custom html rendering? Rest of the code does not feel like something too wild? |
/cc @msmith-techempower @nbrady-techempower |
@dv00d00 I think "stripped" approach makes sense here. Micro classification is saying you're providing just enough middleware or whatever to make the implementation work. Stripped approach is saying you're writing a lot more custom code specific for this use case, either because the framework doesn't provide the utilities or you're testing out different methods closer to the metal that you might want to incorporate into the framework. What you have now looks right to me. @msmith-techempower anything to add? |
@dv00d00 I have not got that TemplateViewEngine on nuget, it was only a POC, without the pre-counting of the model bytes to set the contentlength before so as is, it'll chunk so not much use, will have a look at tweaking it to be able to count & set content length. For now I think your adapted MS pool from Ben's design is a great solution. |
"platform": ".NET", | ||
"flavor": "CoreCLR", | ||
"webserver": "Kestrel", | ||
"os": "Linux", | ||
"database_os": "Linux", | ||
"display_name": "Giraffe", | ||
"display_name": "Giraffe, Dapper", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big fan of this, but since we don't have a key for the name of your database approach/library, I guess it will do.
Added fortunes test via default and custom HTML rendering.
Orm is Dapper.
Implemented custom renderer to write HTML into memory stream.
Html renderers comparison