-
Notifications
You must be signed in to change notification settings - Fork 5k
Sve: Preliminary support for agnostic VL for JIT scenarios #115948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
kunalspathak
wants to merge
87
commits into
dotnet:main
Choose a base branch
from
kunalspathak:variable-vl-3
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,945
−252
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area-CodeGen-coreclr
CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
NO-REVIEW
Experimental/testing PR, do NOT review it
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
In .NET 9, we added SVE support to work on hardware that has vector length (VL) of 16-bytes (16B) long. This prohibits developer from using SVE feature on hardware that supports different vector lengths or for NativeAOT scenarios, where binaries once compiled for a particular VL, will need recompilation to run on hardware having different VL. This PR adds the preliminary support of limited vector lengths (32 bytes and 64 bytes) for JIT scenario. There will be follow-up PRs to include support for other vector lengths as well as for NativeAOT.
Vector<T>
is the .NET's vector length agnostic type and we will leverage this type to generate SVE instructions. Currently, the heuristics is set such thatVector<T>
will continue to generate NEON instructions if underlying VL is 16B. Only if VL > 16B, we will start generating SVE instructions for them.TYP_SIMD*
SVE has variable length vectors ranging from 16B ~ 256B and should be power of 2. So applicable vector lengths can be 16B, 32B, 64B, 128B and 256B. This PR adds preliminary support for agnostic VL by reusing some of the existing logic of xarch around
TYP_SIMD32
andTYP_SIMD64
and can be further expanded toTYP_SIMD128
andTYP_SIMD256
. It was easier to port the logic at various places using existing higher vector length types rather than creating a type whose size will be determined at runtime and then handling the new type throughout the code base specially around value numbering.Vector
Today,
Vector<T>
type is mapped to correspondingVector128<T>
intrinsics methods to generate NEON instructions. This is because NEON instructions operate on 16B data. We will detect the vector length and if it is> 16B
, we will use SVE instructions. To do that, we will stop mappingVector<T> -> Vector128<T>
, but instead, introduced new intrinsics based onVector<T>
. These intrinsics correspond to the methods available onVector<T>
. Next, we will propagate these intrinsics throughout the code base. During codegen, when we see an intrinsic ofVector<T>
type, we would know that we need to generate SVE instruction instead of NEON instruction.Register allocation
In .NET 9, we adopted custom ABI for SVE registers. For now, we will continue to use that ABI. At call boundary, only lower-half of
v8~v15
is callee-saved and today, we preserve the upper-half of live SIMD registers into those registers. Since SVE registers are wider, we might need more thanv8~v15
to preserve the upper portion of the killed registers. Hence, I decided to just spill them on stack. In future, when we fine tune our ABI, we will update this design.Other optimizations
In xarch, there are several other optimizations like
ReadUtf8
orMemmove
that takes benefit of higher VL. I tried to enable them for Arm64 with higher VL, but for some of them, I was not able to find an optimal equivalent SVE instructions. Some needed support of SVE2 instructions. Hence, I decided to not do any optimization around this. We will enable them in future incrementally.Testing
I have introduced a DEBUG flag
DOTNET_UseSveForVectorT
. When this is set, we will hardcode the VL to 32B in order to kick off theVector<T>
/SVE path I mentioned above. This approach will work for superpmi / jitstress testing. I need to still validate its functioning during actual execution on Cobalt machines that just have 16B VL. I thought about introducing a flag likeDOTNET_MinVectorTLengthForSve
, which basically will specify what is the minimum vector length needed to trigger SVE instructions, and during testing, we could have set it to16B
, however I soon realized that there were lot of code paths, that takes dependency onTYP_SIMD16
and generate NEON instructions. HavingDOTNET_UseSveForVectorT
felt like better approach.TODOs
There are several TODOs that I will address before marking the PR for review, but others might have to be done incrementally.
Reference: #115037
Examples