Skip to content

Sve: Preliminary support for agnostic VL for JIT scenarios #115948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 87 commits into
base: main
Choose a base branch
from

Conversation

kunalspathak
Copy link
Member

Overview

In .NET 9, we added SVE support to work on hardware that has vector length (VL) of 16-bytes (16B) long. This prohibits developer from using SVE feature on hardware that supports different vector lengths or for NativeAOT scenarios, where binaries once compiled for a particular VL, will need recompilation to run on hardware having different VL. This PR adds the preliminary support of limited vector lengths (32 bytes and 64 bytes) for JIT scenario. There will be follow-up PRs to include support for other vector lengths as well as for NativeAOT.

Vector<T> is the .NET's vector length agnostic type and we will leverage this type to generate SVE instructions. Currently, the heuristics is set such that Vector<T> will continue to generate NEON instructions if underlying VL is 16B. Only if VL > 16B, we will start generating SVE instructions for them.
 

TYP_SIMD*

SVE has variable length vectors ranging from 16B ~ 256B and should be power of 2. So applicable vector lengths can be 16B, 32B, 64B, 128B and 256B. This PR adds preliminary support for agnostic VL by reusing some of the existing logic of xarch around TYP_SIMD32 and TYP_SIMD64 and can be further expanded to TYP_SIMD128 and TYP_SIMD256. It was easier to port the logic at various places using existing higher vector length types rather than creating a type whose size will be determined at runtime and then handling the new type throughout the code base specially around value numbering.

Vector

Today, Vector<T> type is mapped to corresponding Vector128<T> intrinsics methods to generate NEON instructions. This is because NEON instructions operate on 16B data. We will detect the vector length and if it is > 16B, we will use SVE instructions. To do that, we will stop mapping Vector<T> -> Vector128<T>, but instead, introduced new intrinsics based on Vector<T>. These intrinsics correspond to the methods available on Vector<T>. Next, we will propagate these intrinsics throughout the code base. During codegen, when we see an intrinsic of Vector<T> type, we would know that we need to generate SVE instruction instead of NEON instruction.

Register allocation

In .NET 9, we adopted custom ABI for SVE registers. For now, we will continue to use that ABI. At call boundary, only lower-half of v8~v15 is callee-saved and today, we preserve the upper-half of live SIMD registers into those registers. Since SVE registers are wider, we might need more than v8~v15 to preserve the upper portion of the killed registers. Hence, I decided to just spill them on stack. In future, when we fine tune our ABI, we will update this design.

Other optimizations

In xarch, there are several other optimizations like ReadUtf8 or Memmove that takes benefit of higher VL. I tried to enable them for Arm64 with higher VL, but for some of them, I was not able to find an optimal equivalent SVE instructions. Some needed support of SVE2 instructions. Hence, I decided to not do any optimization around this. We will enable them in future incrementally.

Testing

I have introduced a DEBUG flag DOTNET_UseSveForVectorT. When this is set, we will hardcode the VL to 32B in order to kick off the Vector<T>/SVE path I mentioned above. This approach will work for superpmi / jitstress testing. I need to still validate its functioning during actual execution on Cobalt machines that just have 16B VL. I thought about introducing a flag like DOTNET_MinVectorTLengthForSve, which basically will specify what is the minimum vector length needed to trigger SVE instructions, and during testing, we could have set it to 16B, however I soon realized that there were lot of code paths, that takes dependency on TYP_SIMD16 and generate NEON instructions. Having DOTNET_UseSveForVectorT felt like better approach.

TODOs

There are several TODOs that I will address before marking the PR for review, but others might have to be done incrementally.

Reference: #115037

Examples

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static bool Test2(Vector<int> a, Vector<int> b)
    {
        return Vector.LessThanAll(a, b);
    }

image

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Test()
    {
        var a = GetVector<int>(5);
        var b = GetVector<int>(5);
        Vector<int> c = a + b;
        Consume(c);
    }

image

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Test(Vector<int> a)
    {
        var b = GetVector<int>(5);
        Vector<int> c = a + b;
        Consume(c);
    }

image

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static Vector<int> Test(Vector<int> a)
    {
        var b = GetVector<int>(5);
        var c = a << 8;
        Consume(c);
        return Cond() ? c : b;
    }

image

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Test(Vector<int> a)
    {
        Vector<float> b = GetVector<float>(5.9f);
        Vector<float> c = GetVector<float>(5.9f);
        var result = Sve.CompareGreaterThan(b, c);
        Consume(result);
    }

image

@kunalspathak kunalspathak added the NO-REVIEW Experimental/testing PR, do NOT review it label May 23, 2025
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 23, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI NO-REVIEW Experimental/testing PR, do NOT review it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant