Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Initial WebSocket protocol implementation #73

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

mattnischan
Copy link
Contributor

@mattnischan mattnischan commented Jan 30, 2020

Addresses #62. This is not fully complete yet. The read side API looks like how I want it, but the write side is still very basic and not the shape I would like yet. Needs a bunch more tests and some final protocol details.

  • Initial read-side API
  • Read-side smoke tests
  • Create pipe style write API
  • Write-side smoke tests
  • Final protocol details (close handling tests, UTF-8 validation)
  • Expanded test suite

@mattnischan
Copy link
Contributor Author

mattnischan commented Jan 30, 2020

Initial benchmarks on the read side without digging into optimization:

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362
AMD Ryzen Threadripper 1950X, 1 CPU, 32 logical and 16 physical cores
.NET Core SDK=3.1.100
  [Host]     : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT
  DefaultJob : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT


|                      Method |       Mean |    Error |  StdDev | Ratio | RatioSD |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------------------- |-----------:|---------:|--------:|------:|--------:|-------:|------:|------:|----------:|
|         WebSocketReadMasked |   597.4 ns |  7.14 ns | 6.68 ns |  1.00 |    0.00 | 0.0277 |     - |     - |     120 B |
| WebSocketProtocolReadMasked | 1,210.5 ns | 10.04 ns | 9.39 ns |  2.03 |    0.03 | 0.0153 |     - |     - |      64 B |

@mattnischan
Copy link
Contributor Author

mattnischan commented Jan 30, 2020

Got rid of SequenceReader in WebSocketFrameReader and optimized some of the sync ValueTask paths in WebSocketMessageReader to avoid some state machine creation. More to do there.

Did a little work on lowering the garbage. Was getting a bunch of boxing of WebSocketFrameReader since ProtocolReader.ReadAsync() takes an interface. No sense in keeping those structs if they're just gonna get boxed. I can probably do that too with the payload reader, but will need to add methods to it to reset its state so it can be shared.

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362
AMD Ryzen Threadripper 1950X, 1 CPU, 32 logical and 16 physical cores
.NET Core SDK=3.1.100
  [Host]     : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT
  DefaultJob : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT


|                      Method |     Mean |    Error |   StdDev | Ratio | RatioSD |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------------------- |---------:|---------:|---------:|------:|--------:|-------:|------:|------:|----------:|
|         WebSocketReadMasked | 617.6 ns | 12.34 ns | 16.48 ns |  1.00 |    0.00 | 0.0286 |     - |     - |     120 B |
| WebSocketProtocolReadMasked | 969.3 ns | 14.48 ns | 12.84 ns |  1.57 |    0.05 | 0.0095 |     - |     - |      40 B |

@davidfowl
Copy link
Owner

Got rid of SequenceReader in WebSocketFrameReader and optimized some of the sync ValueTask paths in WebSocketMessageReader to avoid some state machine creation. More to do there.

Would be good to understand why #69

@mattnischan
Copy link
Contributor Author

Creating a SequenceReader was showing up as a hot path. Just getting rid of creating an instance of it and moving to parsing via Span got me a 20% win by itself.

If we're forced to take that overload and always create a SequenceReader it's going to be hard to hit BCL numbers. As is, right now, the BCL version is just so much more at the raw level that even after mangling my code such that all the sync paths don't allocate any more async state machines (except in ProtocolReader, which I haven't touched), that's only getting me an additional 5-10% over what you see in the latest bench.

@mattnischan
Copy link
Contributor Author

Here's the latest with a quiet machine: looks like maybe not even that 5-10%. Comes up allocation free, now, though (although if you actually profile, ProtocolReader.Read allocates a state machine still).

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362
AMD Ryzen Threadripper 1950X, 1 CPU, 32 logical and 16 physical cores
.NET Core SDK=3.1.100
  [Host]     : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT
  DefaultJob : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT


|                      Method | Categories |     Mean |   Error |  StdDev | Ratio | RatioSD |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------------------- |----------- |---------:|--------:|--------:|------:|--------:|-------:|------:|------:|----------:|
|               WebSocketRead |   Unmasked | 476.0 ns | 4.86 ns | 4.55 ns |  1.00 |    0.00 | 0.0286 |     - |     - |     120 B |
|       WebSocketProtocolRead |   Unmasked | 744.0 ns | 4.38 ns | 4.10 ns |  1.56 |    0.02 |      - |     - |     - |         - |
|                             |            |          |         |         |       |         |        |       |       |           |
|         WebSocketReadMasked |     Masked | 559.8 ns | 3.30 ns | 2.76 ns |  1.00 |    0.00 | 0.0286 |     - |     - |     120 B |
| WebSocketProtocolReadMasked |     Masked | 882.4 ns | 4.02 ns | 3.76 ns |  1.58 |    0.01 |      - |     - |     - |         - |

@davidfowl
Copy link
Owner

#20

@mattnischan
Copy link
Contributor Author

Yeah, I was thinking on how to eliminate that. I can do it for the sync path just by doing what I've done here and check IsCompletedSuccessfully and have a non-state machine path, but that won't help the async path (nor does it for my code currently). I have another idea but I'll throw it on that issue.

//We need to at least be able to read the start of frame header
if (input.Length < 2)
{
message = default;
return false;
}

reader.TryRead(out var finOpcodeByte);
reader.TryRead(out var maskLengthByte);
if (input.IsSingleSegment || input.FirstSpan.Length >= 14)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to encapsulate this logic in a helper (it's impossible with stackalloc though).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following you, what's a helper in this context and why do we need to?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A method in the framework that can get you a span of bytes given a minimum length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I gotcha. Yeah, it's hard to picture a way of implementing that without having to either go to the pool or allocate a byte array, because you can't leak the stackalloc'd span.

@mattnischan mattnischan force-pushed the features/websocket-protocol branch from c373ed0 to 5e64028 Compare February 17, 2020 20:50
@mattnischan mattnischan force-pushed the features/websocket-protocol branch from 5e64028 to 925e8c7 Compare February 17, 2020 20:52
@mattnischan mattnischan force-pushed the features/websocket-protocol branch from 925e8c7 to b55d182 Compare February 17, 2020 21:01
@mattnischan
Copy link
Contributor Author

Got the shape of the write API in a decent place finally and got some smoke tests implemented. Been taking longer than I wanted.

Should be getting some benchmark numbers in the next day or two.

@davidfowl
Copy link
Owner

Next year or 2 😄

@mattnischan
Copy link
Contributor Author

😬 I know...this year got a little crazy...

Are you thinking of bringing this project back into the forefront?

Span<byte> tempSpan = stackalloc byte[14];

var bytesToCopy = Math.Min(input.Length, tempSpan.Length);
input.Slice(0, bytesToCopy).CopyTo(tempSpan);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this fail with ArgumentException that bytesToCopy is outside the range if input has length of less than 14 bytes?
I think there should be a check somewhere right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants