Feature suggestion: zero-allocation "params" via Span<T> / ReadOnlySpan<T> #6489

mgravell · 2017-05-04T09:04:40Z

mgravell
May 4, 2017
Collaborator

This is broadly related to #178 and #179

C# has the params keyword that allows convenient varadic signatures, for example:

    var tokens = line.Split(',',':');

Here, the signature is string[] string.Split(params char[] separator).

This usage is awkward because the compiler emits a new char[] every time it is invoked; this might be GEN-0 collected if you get lucky, but it still has overhead.

The advent of Span<T> offers new possibilities. For example, what if the following signature was possible:

    SomeReturnType SomeMethod(params Span<int> values) // should also work with ReadOnlySpan<T>

Then:

    var foo = SomeMethod(1, 2, 3);

Could be compiled as the equivalent of:

    SomeType foo;
    unsafe
    {
        var vals = stackalloc int[3];
        vals[0] = 1;
        vals[1] = 2;
        vals[2] = 3;
        foo = SomeMethod(new Span<int>(vals, 3));
    }

Since this is a compiler generated implementation, it should bypass project-level unsafe restrictions
and use the Span<T> type guarantees to ensure stack-referring safety, etc.

This makes it possible to have zero-allocation params varadic usage. Obviously the usual "last parameter" etc rules would apply.

If unsafe is not available (because of iterator blocks, async, etc), it can be implemented as a span over an array instead, just like regular params might have emitted:

    var foo = SomeMethod(new Span<int>(new int[] { 1, 2, 3 }));

although actually, I'm struggling to think of a scenario involving primitives where this would be needed, even in an iterator block etc, since the call-site is so localized; it might be possible to avoid this entirely. If the target method can't use spans because of async etc, then the target method won't have span in the signature, so: bullet dodged. Note that if this array version is required, and if the values are constant / literals and the target is ReadOnlySpan<T> (not Span<T>), then this array could be cached as a static:

    static int[] __backingField; // generated unpronounceable name

    var foo = SomeMethod(new ReadOnlySpan<int>(
        __backingField ?? (__backingField = new int[] { 1, 2, 3 })));

Thoughts....?

Additional idea for extending pre-existing APIs: if (haven't thought it through) it is possible to have two params overloads for params Foo[] and params Span<Foo> against the same target, then the params Span<Foo> should take precedence if the caller is using compiler-provided params support, since it can be called in a lower overhead way. Meaning, given:

    SomeType Foo(params int[] values);
    SomeType Foo(params Span<int> values);

then:

    var x = Foo(1, 2, 3);

would call the second method. However, if the caller is more explicit with an int[] local / expression, then the first method is called.

Additional unknown: what to do if the T is not one that can normally be stackalloc'd (so: not a known primitive), for example string? Just use the array version? or use something more subtle like:

    var vals = stackalloc byte[{len} * sizeof(string)]; // sizeof here not valid C#, but is valid IL
    var span = new Span<sting>(vals, {len});
    span[0] = ...

mgravell · 2017-05-04T13:49:24Z

mgravell
May 4, 2017
Collaborator Author

Quick bit of exploration; at the moment the Span<T> version is marginally slower, but zero allocations:

           Method |      Mean |     Error |    StdDev |          Op/s |  Gen 0 |  Gen 1 | Allocated |
----------------- |----------:|----------:|----------:|--------------:|-------:|-------:|----------:|
      ParamsArray |  9.178 ns | 0.1797 ns | 0.3002 ns | 108,957,295.6 | 0.0076 |      - |      48 B |
    ExplicitArray |  9.116 ns | 0.1726 ns | 0.1530 ns | 109,700,882.9 | 0.0076 | 0.0000 |      48 B |
             Span | 10.123 ns | 0.2090 ns | 0.1955 ns |  98,789,162.3 |      - |      - |       0 B |
 StaticBackedSpan |  6.126 ns | 0.1170 ns | 0.1149 ns | 163,230,300.8 |      - |      - |       0 B |

(that is comparing Execute(1,2,3,4,5) vs Execute(new int[] {1,2,3,4,5}) vs Execute(new Span<int>(args, 5)) as per full gist here)

Update: these numbers are out of date now; there's a better way - see SpanTuple etc

0 replies

mikedn · 2017-05-04T13:57:35Z

mikedn
May 4, 2017

Additional unknown: what to do if the T is not one that can normally be stackalloc'd (so: not a known primitive), for example string? Just use the array version? or use something more subtle like:

Create a struct containing the necessary number of T fields on the stack and somehow (DangerousCreate?) build a span pointing to the first field of the struct.

0 replies

mgravell · 2017-05-04T14:02:02Z

mgravell
May 4, 2017
Collaborator Author

@mikedn that would definitely work, but I'm not sure that it is necessary to add the additional struct, when the raw data can also just be stackalloc'd - certainly a valid consideration, though

0 replies

mikedn · 2017-05-04T14:13:07Z

mikedn
May 4, 2017

when the raw data can also just be stackalloc'd - certainly a valid consideration, though

For primitive types yes, you'd probably keep using stackalloc. The struct is only needed for reference types. Or who knows, maybe the struct would be also good for primitive types, it depends which construct is best for the JIT. Probably neither since both imply taking the address of a local :)

0 replies

mattnischan · 2017-05-04T14:19:52Z

mattnischan
May 4, 2017

Yeah, and in these cases you're really just stackalloc'ing the references, which are a fixed size (sizeof(IntPtr)). The struct would be an interesting take, but I'm not sure what it gains you. It is a cool idea, though.

0 replies

mikedn · 2017-05-04T14:25:37Z

mikedn
May 4, 2017

Yeah, and in these cases you're really just stackalloc'ing the references, which are a fixed size (sizeof(IntPtr)).

The GC needs to be able to track those references so they can't be in stackallocated memory, the GC doesn't track that.

Anyway, all this sounds very tempting but it looks like the main problem is that the resulting IL code is unverifiable/unsafe/unanyhting.

0 replies

mgravell · 2017-05-04T14:30:57Z

mgravell
May 4, 2017
Collaborator Author

added benchmark for static-backed ReadOnlySpan<int> version for literals etc; is now faster than params int[] version - I see this as the idea for when fixed values are used as the args

for the stackalloc case, would need to think about when the stackalloc space is recovered, especially in the case of a loop like above; is that allocating on the stack once? or many times separately? A quick test done by bumping REPEATS_PER_ITEM up to 1M shows that it would currently blow the stack if used naively.

0 replies

mgravell · 2017-05-04T14:35:03Z

mgravell
May 4, 2017
Collaborator Author

@mikedn once they're inside the span, they should be essentially ref T; and once they're ref T accessible via ref T on the stack (not the stackalloc part), they should be fully reachable from the root, no?. This does show that we need to get the span into a formal local before populating it, though (in the case of reference types).

0 replies

mikedn · 2017-05-04T14:44:33Z

mikedn
May 4, 2017

for the stackalloc case, would need to think about when the stackalloc space is recovered, especially in the case of a loop like above; is that allocating on the stack once? or many times separately?

Stackallocated space cannot be freed before the method returns. That is, if you do allocate in a loop you'll end up with stack overflow. That's actually one reason to stick to struct because the same local could be reused. At the risk of getting some funny side effects if the called method somehow manages to persist the span for later use.

once they're on the span, they should be essentially ref T; and once they're ref T via a parameter on the stack (not the stackalloc part), they should be fully reachable from the root, no?.

Hmm, I doubt that. That would imply that GC understands span and scans it but AFAIK that's not the case. The way things work is that the GC scans the array/object the span points to. But in our case the span doesn't point to a GC object, it points to random memory so the GC can't do anything. Sure, random memory in this case is actually stack memory and the GC scans the stack but in a different manner.

0 replies

benaadams · 2017-05-04T14:53:19Z

benaadams
May 4, 2017

ValueTuple?

0 replies

mattnischan · 2017-05-04T15:04:03Z

mattnischan
May 4, 2017

@benaadams I was thinking that too. But you'd essentially need some kind of stand in type in the method signature to tell the compiler to make a ValueTuple of the right length. You would probably also want some struct based ValueTuple iterator for consumption on the receiving end.

0 replies

mgravell · 2017-05-04T15:24:05Z

mgravell
May 4, 2017
Collaborator Author

@mattnischan it is simpler than that, I think - just something like (this syntax won't compile currently - is purely for indication):

var _ = (1,2,3,4,5);
SomeMethod(new Span<int>(ref _.Item1, 5));

This does assume that ValueTuple<...> has specific layout; probably OK, but...

Testing with SpanTuple doing:

var args = (1,2,3,4,5);
total += Execute(new Span<int>(Unsafe.AsPointer(ref args.Item1), 5));

gives:

           Method |     Mean |     Error |    StdDev |          Op/s |  Gen 0 |  Gen 1 | Allocated |
----------------- |---------:|----------:|----------:|--------------:|-------:|-------:|----------:|
      ParamsArray | 8.225 ns | 0.1822 ns | 0.1705 ns | 121,579,023.0 | 0.0076 | 0.0000 |      48 B |
    ExplicitArray | 8.421 ns | 0.1631 ns | 0.1812 ns | 118,748,775.0 | 0.0076 |      - |      48 B |
             Span | 9.736 ns | 0.1167 ns | 0.1092 ns | 102,707,992.8 |      - |      - |       0 B |
        SpanTuple | 5.812 ns | 0.1052 ns | 0.0984 ns | 172,067,653.3 |      - |      - |       0 B |
 StaticBackedSpan | 5.816 ns | 0.0776 ns | 0.0726 ns | 171,930,191.3 |      - |      - |       0 B |

which is really good

0 replies

mattnischan · 2017-05-04T15:33:53Z

mattnischan
May 4, 2017

I see what you mean. I was thinking SomeMethod(params ValueTuple<?> things).

I think that layout stuff gets tricky with Rest, maybe.

0 replies

HaloFour · 2017-05-04T15:35:21Z

HaloFour
May 4, 2017

Each tuple size is a completely different and incompatible type. You'd be forced to pass the tuple representing the maximum number of elements.

0 replies

mgravell · 2017-05-04T15:36:45Z

mgravell
May 4, 2017
Collaborator Author

@benaadams see SpanTuple perf above (in an edit); also cc @mikedn - if ValueTuple isn't an option, then the example above but with your idea of a per-call-site struct would also work great.

0 replies

mikedn · 2017-05-05T06:52:39Z

mikedn
May 5, 2017

Yep, as I suspected this stack-referring thing is rather infectious. It's cool and at the same time kind of scary, many C# developers will probably have trouble understanding all this wizardry. Sometimes I wonder what C# language turns into :)

0 replies

mgravell · 2017-05-05T07:47:18Z

mgravell
May 5, 2017
Collaborator Author

@mikedn I think of it very much like pointers: most c# devs barely know they are a thing in c#, but for certain areas they are heavily used. As long as the scenarios "most devs" are likely to see are clear and simple, the subtle depths don't worry me. Still better than pointers.

0 replies

omariom · 2017-05-06T11:02:41Z

omariom
May 6, 2017

Sometimes I wonder what C# language turns into :)

When C# was born MS said it was closer to C++ than to Java :)

0 replies

ygc369 · 2017-07-04T02:20:29Z

ygc369
Jul 4, 2017

I like this proposal.

0 replies

alrz · 2017-12-29T19:46:08Z

alrz
Dec 29, 2017

I think #179 cover this proposal (if we allow any type for params that is assignable from T[]), we just need to emit stackalloc where possible (as an optimization).

0 replies

mgravell · 2017-12-29T21:29:46Z

mgravell
Dec 29, 2017
Collaborator Author

#179 talks about IEnumerable-T, which is ref-type. It is absolutely impossible to store a span or a stackalloc in a ref-type, so this is not something that can added as simply an optimization - this is a fundamentally different scenario. Additionally, the key point of this suggestion is to not allocate - and by definition an IEnumerable-T requires an allocation (unless it is a constained-T variant). So: I don't think the overlap is as big as it might appear.

…

On 29 Dec 2017 7:46 pm, "Alireza Habibi" ***@***.***> wrote: I think #179 <#179> cover this proposal (if we allow any type for params that is assignable from T[]), we just need to emit stackalloc where possible (as an optimization). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#535 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABDsDaKubEG8vIt0DrxFCtphG22-KCRks5tFUGJgaJpZM4NQZgf> .

0 replies

alrz · 2017-12-29T21:37:56Z

alrz
Dec 29, 2017

I was talking about semantics where these are all permitted, but I missed the fact that Span is a ref struct and doesn't implement any interfaces. so it should be a special case anyways.

0 replies

bbarry · 2018-01-03T20:54:55Z

bbarry
Jan 3, 2018

Presumably a compiler generated nested struct would be safer than relying on the internals of ValueTuple right?

[Benchmark(OperationsPerInvoke = REPEATS_PER_ITEM)]
public unsafe int SpanGenerated()
{
    int total = 0;
    for (int i = 0; i < REPEATS_PER_ITEM; i++)
    {
        var args = new Generated5Int(1, 2, 3, 4, 5);
        total += Execute(new Span<int>(Unsafe.AsPointer(ref args.I1), 5));
    }
    return total;
}
[Benchmark(OperationsPerInvoke = REPEATS_PER_ITEM)]
public unsafe int ReadOnlySpanGenerated()
{
    int total = 0;
    for (int i = 0; i < REPEATS_PER_ITEM; i++)
    {
        var args = new Generated5Int(1, 2, 3, 4, 5);
        total += Execute(new ReadOnlySpan<int>(Unsafe.AsPointer(ref args.I1), 5));
    }
    return total;
}

[StructLayout(LayoutKind.Sequential)]
struct Generated5Int
{
    public int I1;
    int I2;
    int I3;
    int I4;
    int I5;
    public Generated5Int(int i1, int i2, int i3, int i4, int i5) {
        I1 = i1;
        I2 = i2;
        I3 = i3;
        I4 = i4;
        I5 = i5;
    }
}

                Method |     Mean |     Error |    StdDev |          Op/s |  Gen 0 |  Gen 1 | Allocated |
---------------------- |---------:|----------:|----------:|--------------:|-------:|-------:|----------:|
           ParamsArray | 6.000 ns | 0.0219 ns | 0.0205 ns | 166,673,417.6 | 0.0114 | 0.0000 |      48 B |
         ExplicitArray | 5.943 ns | 0.0159 ns | 0.0141 ns | 168,271,717.2 | 0.0114 | 0.0000 |      48 B |
                  Span | 9.500 ns | 0.0217 ns | 0.0203 ns | 105,262,022.6 |      - |      - |       0 B |
             SpanTuple | 2.182 ns | 0.0056 ns | 0.0052 ns | 458,245,714.4 |      - |      - |       0 B |
      StaticBackedSpan | 2.177 ns | 0.0056 ns | 0.0050 ns | 459,410,418.4 |      - |      - |       0 B |
         SpanGenerated | 2.181 ns | 0.0044 ns | 0.0041 ns | 458,545,682.8 |      - |      - |       0 B |
 ReadOnlySpanGenerated | 2.181 ns | 0.0040 ns | 0.0037 ns | 458,582,157.2 |      - |      - |       0 B |

0 replies

alrz · 2018-01-15T23:05:47Z

alrz
Jan 15, 2018

I started working on stackalloc initializers (dotnet/roslyn#24249), now we should be able to reuse its lowering for params Span. For reference types we fall back to array though.

Is there anyone from LDT interested to champion this proposal?

0 replies

bbarry · 2018-01-16T00:15:57Z

bbarry
Jan 16, 2018

I don't know if stackalloc initializers are worth doing with their performance cost relative to allocating an array (Span vs ParamsArray in various tables above).

A generated struct used for a Span pointer would be nice (relying on Tuple is suspect due to auto layout) but that has the same problem that #992, #1147 and https://github.com/dotnet/corefx/issues/26228 do, doesn't it?

This is valid code (and almost certainly buggy):

public unsafe Span<int> SpanGenerated()
{    
    var args = new Generated5Int(1, 2, 3, 4, 5);
    return Execute(new Span<int>(Unsafe.AsPointer(ref args.I1), 5));
}
Span<int> Execute(/*params*/ Span<int> input) => input;

Does stackalloc have that issue as well?

Edit: what I mean is supporting Span<T> would be nice for params in your patch but I don't know if it is worth waiting for zero allocation methods of doing it assuming you would be supporting other array like params things.

0 replies

benaadams · 2018-01-16T00:22:23Z

benaadams
Jan 16, 2018

Does stackalloc have that issue as well?

stackalloc is fixed memory so it doesn't have the problem; however you can't return it from the method as that stack space is then invalid; as it gets unwound - though you can call into other methods with it (as its still valid going deeper into the stack)

0 replies

alrz · 2018-01-16T05:30:05Z

alrz
Jan 16, 2018

@bbarry

supporting Span<T> would be nice for params in your patch but I don't know if it is worth waiting for zero allocation methods of doing it assuming you would be supporting other array like params things.

Not sure if I understand correctly, do you mean runtime helpers that would allow block init? I guess "other array like params things" refers to #179? and what it has to do with those methods?

0 replies

bbarry · 2018-01-16T15:43:45Z

bbarry
Jan 16, 2018

Sorry I was just saying #179 should be unrelated to this even if #179 were to support Foo(params Span<T>) signatures via a Foo(new Span<int>(new int[] {1, 2, 3})) calling convention.

As I am seeing it, safe code stackalloc seems pretty dead in the water, so do the other zero allocation strategies presented here, without some fundamental new feature for tracking lifetimes (like #1130) to enable #535 (this issue), #923, #992, #1147, and https://github.com/dotnet/corefx/issues/26228.

0 replies

alrz · 2018-01-19T17:12:45Z

alrz
Jan 19, 2018

This is how I imagine we can make this work:

Currently array initializers are lowered to a static field of a struct with a size of all elements combined, iff all values are constants (plus a few other conditions), for example, new int[3] {1, 2, 3} produces the following:

// Code size 19 (0x13)
.maxstack 8

IL_0000: ldc.i4.3
IL_0001: newarr [mscorlib]System.Int32
IL_0006: dup
IL_0007: ldtoken field valuetype '<PrivateImplementationDetails>'/'__StaticArrayInitTypeSize=12' '<PrivateImplementationDetails>'::E429CCA3F703A39CC5954A6572FEC9086135B34E
IL_000c: call void [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [mscorlib]System.Array, valuetype [mscorlib]System.RuntimeFieldHandle)
IL_0011: pop
IL_0012: ret

stackalloc initializers can take advantage of the same mechanism but instead of RuntimeHelpers.InitializeArray we use memcpy (the cpblk instruction) to copy the content of that field directly to the stackalloc variable (this is implemented in the PR), for example stackalloc int[3] {1, 2, 3} produces the following:

// Code size       21 (0x15)
.maxstack  4
IL_0000:  ldc.i4.s   12
IL_0002:  conv.u
IL_0003:  localloc
IL_0005:  dup
IL_0006:  ldsflda    ""<PrivateImplementationDetails>.__StaticArrayInitTypeSize=12 <PrivateImplementationDetails>.E429CCA3F703A39CC5954A6572FEC9086135B34E""
IL_000b:  ldc.i4.s   12
IL_000d:  cpblk
IL_000f:  call       ""void C.Print(int*)""
IL_0014:  ret

Span<T> conversion just works out of the box since we're passing an already-initialized pointer to the constructor.

Finally, params on arrays is using the same lowering logic for array construction so we get all optimizations from there. I propose we use the above lowering for params Span<T> where possible, for reference types we can either fallback to the array variation or use the suggested lowering if feasible,

var vals = stackalloc byte[{len} * sizeof(string)]; // sizeof here not valid C#, but is valid IL
var span = new Span<sting>(vals, {len});
span[0] = ...

0 replies

333fred · 2022-09-26T21:50:59Z

333fred
Sep 26, 2022
Maintainer

Converting this issue to a discussion. The tracked champion for params Span<T> is #1757.

0 replies

Feature suggestion: zero-allocation "params" via Span<T> / ReadOnlySpan<T> #6489

mgravell May 4, 2017 Collaborator

Replies: 52 comments

mgravell May 4, 2017 Collaborator Author

mgravell May 4, 2017 Collaborator Author

mgravell May 4, 2017 Collaborator Author

mgravell May 4, 2017 Collaborator Author

mgravell May 4, 2017 Collaborator Author

mgravell May 4, 2017 Collaborator Author

mgravell May 5, 2017 Collaborator Author

mgravell Dec 29, 2017 Collaborator Author

333fred Sep 26, 2022 Maintainer

mgravell
May 4, 2017
Collaborator

mgravell
May 4, 2017
Collaborator Author

mgravell
May 4, 2017
Collaborator Author

mgravell
May 4, 2017
Collaborator Author

mgravell
May 4, 2017
Collaborator Author

mgravell
May 4, 2017
Collaborator Author

mgravell
May 4, 2017
Collaborator Author

mgravell
May 5, 2017
Collaborator Author

mgravell
Dec 29, 2017
Collaborator Author

333fred
Sep 26, 2022
Maintainer