Sve: Preliminary support for agnostic VL for JIT scenarios #115948

kunalspathak · 2025-05-23T18:28:11Z

Overview

In .NET 9, we added SVE support to work on hardware that has vector length (VL) of 16-bytes (16B) long. This prohibits developer from using SVE feature on hardware that supports different vector lengths or for NativeAOT scenarios, where binaries once compiled for a particular VL, will need recompilation to run on hardware having different VL. This PR adds the preliminary support of limited vector lengths (32 bytes and 64 bytes) for JIT scenario. There will be follow-up PRs to include support for other vector lengths as well as for NativeAOT.

Vector<T> is the .NET's vector length agnostic type and we will leverage this type to generate SVE instructions. Currently, the heuristics is set such that Vector<T> will continue to generate NEON instructions if underlying VL is 16B. Only if VL > 16B, we will start generating SVE instructions for them.

TYP_SIMD*

SVE has variable length vectors ranging from 16B ~ 256B and should be power of 2. So applicable vector lengths can be 16B, 32B, 64B, 128B and 256B. This PR adds preliminary support for agnostic VL by reusing some of the existing logic of xarch around TYP_SIMD32 and TYP_SIMD64 and can be further expanded to TYP_SIMD128 and TYP_SIMD256. It was easier to port the logic at various places using existing higher vector length types rather than creating a type whose size will be determined at runtime and then handling the new type throughout the code base specially around value numbering.

Vector

Today, Vector<T> type is mapped to corresponding Vector128<T> intrinsics methods to generate NEON instructions. This is because NEON instructions operate on 16B data. We will detect the vector length and if it is > 16B, we will use SVE instructions. To do that, we will stop mapping Vector<T> -> Vector128<T>, but instead, introduced new intrinsics based on Vector<T>. These intrinsics correspond to the methods available on Vector<T>. Next, we will propagate these intrinsics throughout the code base. During codegen, when we see an intrinsic of Vector<T> type, we would know that we need to generate SVE instruction instead of NEON instruction.

Register allocation

In .NET 9, we adopted custom ABI for SVE registers. For now, we will continue to use that ABI. At call boundary, only lower-half of v8~v15 is callee-saved and today, we preserve the upper-half of live SIMD registers into those registers. Since SVE registers are wider, we might need more than v8~v15 to preserve the upper portion of the killed registers. Hence, I decided to just spill them on stack. In future, when we fine tune our ABI, we will update this design.

Other optimizations

In xarch, there are several other optimizations like ReadUtf8 or Memmove that takes benefit of higher VL. I tried to enable them for Arm64 with higher VL, but for some of them, I was not able to find an optimal equivalent SVE instructions. Some needed support of SVE2 instructions. Hence, I decided to not do any optimization around this. We will enable them in future incrementally.

Testing

I have introduced a DEBUG flag DOTNET_UseSveForVectorT. When this is set, we will hardcode the VL to 32B in order to kick off the Vector<T>/SVE path I mentioned above. This approach will work for superpmi / jitstress testing. I need to still validate its functioning during actual execution on Cobalt machines that just have 16B VL. I thought about introducing a flag like DOTNET_MinVectorTLengthForSve, which basically will specify what is the minimum vector length needed to trigger SVE instructions, and during testing, we could have set it to 16B, however I soon realized that there were lot of code paths, that takes dependency on TYP_SIMD16 and generate NEON instructions. Having DOTNET_UseSveForVectorT felt like better approach.

TODOs

There are several TODOs that I will address before marking the PR for review, but others might have to be done incrementally.

Reference: #115037

Examples

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static bool Test2(Vector<int> a, Vector<int> b)
    {
        return Vector.LessThanAll(a, b);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Test()
    {
        var a = GetVector<int>(5);
        var b = GetVector<int>(5);
        Vector<int> c = a + b;
        Consume(c);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Test(Vector<int> a)
    {
        var b = GetVector<int>(5);
        Vector<int> c = a + b;
        Consume(c);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static Vector<int> Test(Vector<int> a)
    {
        var b = GetVector<int>(5);
        var c = a << 8;
        Consume(c);
        return Cond() ? c : b;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Test(Vector<int> a)
    {
        Vector<float> b = GetVector<float>(5.9f);
        Vector<float> c = GetVector<float>(5.9f);
        var result = Sve.CompareGreaterThan(b, c);
        Consume(result);
    }

dotnet-policy-service · 2025-05-23T18:29:25Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

kunalspathak added 30 commits March 28, 2025 11:12

Capture g_sve_length and compVectorTLength

d22af4f

Add InstructionSet_Vector

41a1d05

Add CORINFO_HFA_ELEM_VECTOR_VL

c7d8ede

Update the type of TYP_SIMD

926eb69

Passing Vector<T> to args and returns

2b39810

Rename TYP_SIMD -> TYP_SIMDVL

cf9ea60

Fix code to save/restore upper registers of VL

21f364b

misc changes

7a513ed

Bring TYP_SIMD32 and TYP_SIMD64 for Arm64

b1c9833

Eliminate TYP_SIMDVL

4f92c23

basic scneario of calling args/returning args

6e63a3c

returning Vectors

1eb159f

fix a bug

df7203f

standalone fix to generate sve mov instead of NEON mov

734aba5

standalone fix to generate ldr/str when emit_RR is called

a71b8de

Support Vector.Create

2e8cfd5

Do not do sve_mov for scalar variant

1d74f82

Support Vector.As

699d2e1

Support Vector.Abs

7f8ff24

Support Vector.Add

3d19d51

Introduce VariableVectorLength env variable

70c09f9

Support Vector.AndNot

53df3d7

Support Vector.As*

b1d4ce9

Support Vector.BitwiseAnd/BitwiseOr

29564cb

Support Vector.ConvertTo*

45ab7b9

Add CreateFalseMaskAll intrinsic

3837693

Temporary fix for scratch register size calculation. Need to revisit

ca1675c

Fix to squash in 9542e9cd047

7774e07

Support Vector.Equals*, GreaterThan*, LessThan*

c170a7e

Support Vector.Max/MaxNative

15f0384

kunalspathak added 18 commits April 8, 2025 12:03

Fix bug for Vector.ConvertToDouble

15bb8a4

Add jit-ee GetTargetVectorLength()

9e99f27

Use MinVectorLengthForSve()

a9367ad

Fix correct type in LSRA

9d9b20b

Introduce for now FakeVectorLength environment variable

8d8ba75

Convert all checks to use varTypeIsSIMDVL()

41c7629

Merge remote-tracking branch 'origin/main' into variable-vl-3

6e6cc12

Merge remote-tracking branch 'origin/main' into variable-vl-3

9cc2794

wip

c03bb1c

Merge remote-tracking branch 'origin/main' into variable-vl-3

8afd32a

gen.bat update

df8c7ab

Refactor to UseSveFor*()

8ee5339

build failure

abd6e21

more build failure fix

c212d25

more build failure

7b11beb

Handle vector length in methodtablebuilder

5dcd5e9

simplify the logic of UseSveForVectorT

c6c6671

minor cleanup

a4d5a9b

kunalspathak added the NO-REVIEW Experimental/testing PR, do NOT review it label May 23, 2025

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 23, 2025

dotnet-policy-service bot assigned kunalspathak May 23, 2025

kunalspathak added 2 commits May 25, 2025 08:04

Merge remote-tracking branch 'origin/main' into variable-vl-3

e5f308f

jit format

c2e5c23

This was referenced May 25, 2025

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

kunalspathak added 4 commits May 27, 2025 10:26

Merge remote-tracking branch 'origin/main' into variable-vl-3

decd987

resolve merge conflict

be418ae

Do some tracking of simdType

1a33102

Remove constraint of vector being only 16 bytes

a5889f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sve: Preliminary support for agnostic VL for JIT scenarios #115948

Sve: Preliminary support for agnostic VL for JIT scenarios #115948

kunalspathak commented May 23, 2025

Uh oh!

dotnet-policy-service bot commented May 23, 2025

Uh oh!

Uh oh!

Sve: Preliminary support for agnostic VL for JIT scenarios #115948

Are you sure you want to change the base?

Sve: Preliminary support for agnostic VL for JIT scenarios #115948

Conversation

kunalspathak commented May 23, 2025

Overview

TYP_SIMD*

Vector

Register allocation

Other optimizations

Testing

TODOs

Examples

Uh oh!

dotnet-policy-service bot commented May 23, 2025

Uh oh!

Uh oh!