Skip to content

Sve: Preliminary support for agnostic VL for JIT scenarios #115948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 87 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
d22af4f
Capture g_sve_length and compVectorTLength
kunalspathak Mar 19, 2025
41a1d05
Add InstructionSet_Vector
kunalspathak Mar 19, 2025
c7d8ede
Add CORINFO_HFA_ELEM_VECTOR_VL
kunalspathak Mar 19, 2025
926eb69
Update the type of TYP_SIMD
kunalspathak Mar 19, 2025
2b39810
Passing Vector<T> to args and returns
kunalspathak Mar 19, 2025
cf9ea60
Rename TYP_SIMD -> TYP_SIMDVL
kunalspathak Mar 19, 2025
21f364b
Fix code to save/restore upper registers of VL
kunalspathak Mar 19, 2025
7a513ed
misc changes
kunalspathak Mar 20, 2025
b1c9833
Bring TYP_SIMD32 and TYP_SIMD64 for Arm64
kunalspathak Mar 20, 2025
4f92c23
Eliminate TYP_SIMDVL
kunalspathak Mar 21, 2025
6e63a3c
basic scneario of calling args/returning args
kunalspathak Mar 21, 2025
1eb159f
returning Vectors
kunalspathak Mar 22, 2025
df7203f
fix a bug
kunalspathak Mar 22, 2025
734aba5
standalone fix to generate sve mov instead of NEON mov
kunalspathak Mar 22, 2025
a71b8de
standalone fix to generate ldr/str when emit_RR is called
kunalspathak Mar 24, 2025
2e8cfd5
Support Vector.Create
kunalspathak Mar 24, 2025
1d74f82
Do not do sve_mov for scalar variant
kunalspathak Mar 25, 2025
699d2e1
Support Vector.As
kunalspathak Mar 25, 2025
7f8ff24
Support Vector.Abs
kunalspathak Mar 25, 2025
3d19d51
Support Vector.Add
kunalspathak Mar 25, 2025
70c09f9
Introduce VariableVectorLength env variable
kunalspathak Mar 25, 2025
53df3d7
Support Vector.AndNot
kunalspathak Mar 25, 2025
b1d4ce9
Support Vector.As*
kunalspathak Mar 26, 2025
29564cb
Support Vector.BitwiseAnd/BitwiseOr
kunalspathak Mar 26, 2025
45ab7b9
Support Vector.ConvertTo*
kunalspathak Mar 26, 2025
3837693
Add CreateFalseMaskAll intrinsic
kunalspathak Mar 27, 2025
ca1675c
Temporary fix for scratch register size calculation. Need to revisit
kunalspathak Mar 28, 2025
7774e07
Fix to squash in 9542e9cd047
kunalspathak Mar 28, 2025
c170a7e
Support Vector.Equals*, GreaterThan*, LessThan*
kunalspathak Mar 28, 2025
15f0384
Support Vector.Max/MaxNative
kunalspathak Mar 28, 2025
84d7bf3
Support Vector.Min/MinNative
kunalspathak Mar 28, 2025
2dff8b8
Support Vector.MinNumber/MaxNumber
kunalspathak Mar 28, 2025
58c872c
Support Vector.IsPositive/IsNegative/IsPositiveInfinity
kunalspathak Mar 29, 2025
d6d197d
Support Vector.get_Zero/One/AllBitsSet
kunalspathak Mar 29, 2025
ad47578
Support Vector.get_Indices/Sve.Index
kunalspathak Mar 29, 2025
fafee9a
Support Vector.Multiply
kunalspathak Mar 29, 2025
b475834
Support Vector.Subtract
kunalspathak Mar 29, 2025
37a78d7
Support Vector.Divide
kunalspathak Mar 29, 2025
e9eeca6
Support Vector.op_Xor
kunalspathak Mar 29, 2025
8e90959
Support Vector.op_OnesComplement/op_UnaryNegation/op_UnaryPlus
kunalspathak Mar 31, 2025
e00d016
Support Vector.MultiplyAddEstimate
kunalspathak Mar 31, 2025
f14f792
Support Vector.IsZero/IsNaN
kunalspathak Mar 31, 2025
e976b40
Support Vector.Floor
kunalspathak Mar 31, 2025
cb68fb9
Support Vector.FusedMultiplyAdd
kunalspathak Mar 31, 2025
fe633ed
Support Vector.Ceiling
kunalspathak Mar 31, 2025
2285a07
Support Vector.Round
kunalspathak Mar 31, 2025
9bdb3b9
Support Vector.LoadVector*
kunalspathak Mar 31, 2025
5c6392c
Support Vector.Store*
kunalspathak Mar 31, 2025
bf9991c
Support Vector.WidenLower/WidenUpper
kunalspathak Mar 31, 2025
a04d52b
Support Vector.Truncate
kunalspathak Mar 31, 2025
8376fc1
Support Vector.ConditionalSelect
kunalspathak Mar 31, 2025
1cebe09
Support Vector.Create/Add Sve_DuplicateScalarToVector
kunalspathak Apr 1, 2025
c626047
Support Vector.CreateSequence/Fix Sve_Index
kunalspathak Apr 1, 2025
62a2d9f
Support Vector.LeftShift/Add Sve_ShiftLeftLogicalImm
kunalspathak Apr 2, 2025
cd17e41
Support Vector.ShiftRightLogical/RightShift Add Sve.ShiftRight*Imm
kunalspathak Apr 3, 2025
f9567fd
Support Vector.ToScalar
kunalspathak Apr 3, 2025
9145170
Support Vector.Sum
kunalspathak Apr 3, 2025
4a76f71
build errors fix
kunalspathak Apr 4, 2025
a102b6f
Make GetScalableHWIntrinsicId() to all platforms to avoid #ifdef in c…
kunalspathak Apr 4, 2025
eead7d7
For unroll strategy, continue using 16B size
kunalspathak Apr 7, 2025
6d139ee
Fix some errors for Vector_opEquality
kunalspathak Apr 7, 2025
715a2c0
Disable optimizations for unroll/memcopy, etc.
kunalspathak Apr 8, 2025
b5d4460
Add comments in runtime where correct VectorT size should be reflected
kunalspathak Apr 8, 2025
15bb8a4
Fix bug for Vector.ConvertToDouble
kunalspathak Apr 8, 2025
9e99f27
Add jit-ee GetTargetVectorLength()
kunalspathak Apr 8, 2025
a9367ad
Use MinVectorLengthForSve()
kunalspathak Apr 10, 2025
9d9b20b
Fix correct type in LSRA
kunalspathak Apr 11, 2025
8d8ba75
Introduce for now FakeVectorLength environment variable
kunalspathak Apr 12, 2025
41c7629
Convert all checks to use varTypeIsSIMDVL()
kunalspathak Apr 12, 2025
6e6cc12
Merge remote-tracking branch 'origin/main' into variable-vl-3
kunalspathak May 16, 2025
9cc2794
Merge remote-tracking branch 'origin/main' into variable-vl-3
kunalspathak May 16, 2025
c03bb1c
wip
kunalspathak May 20, 2025
8afd32a
Merge remote-tracking branch 'origin/main' into variable-vl-3
kunalspathak May 20, 2025
df8c7ab
gen.bat update
kunalspathak May 20, 2025
8ee5339
Refactor to UseSveFor*()
kunalspathak May 21, 2025
abd6e21
build failure
kunalspathak May 21, 2025
c212d25
more build failure fix
kunalspathak May 21, 2025
7b11beb
more build failure
kunalspathak May 22, 2025
5dcd5e9
Handle vector length in methodtablebuilder
kunalspathak May 22, 2025
c6c6671
simplify the logic of UseSveForVectorT
kunalspathak May 23, 2025
a4d5a9b
minor cleanup
kunalspathak May 23, 2025
e5f308f
Merge remote-tracking branch 'origin/main' into variable-vl-3
kunalspathak May 25, 2025
c2e5c23
jit format
kunalspathak May 25, 2025
decd987
Merge remote-tracking branch 'origin/main' into variable-vl-3
kunalspathak May 27, 2025
be418ae
resolve merge conflict
kunalspathak May 27, 2025
1a33102
Do some tracking of simdType
kunalspathak May 28, 2025
a5889f6
Remove constraint of vector being only 16 bytes
kunalspathak May 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/coreclr/inc/clrconfigvalues.h
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,8 @@ CONFIG_DWORD_INFO(INTERNAL_GCUseGlobalAllocationContext, W("GCUseGlobalAllocatio
///
CONFIG_DWORD_INFO(INTERNAL_JitBreakEmit, W("JitBreakEmit"), (DWORD)-1, "")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_JitDebuggable, W("JitDebuggable"), 0, "If set, suppress JIT optimizations that make debugging code difficult")
CONFIG_DWORD_INFO(INTERNAL_UseSveForVectorT, W("UseSveForVectorT"), 0, "Prefer SVE instructions for VectorT")

#if !defined(DEBUG) && !defined(_DEBUG)
#define INTERNAL_JitEnableNoWayAssert_Default 0
#else
Expand Down
2 changes: 2 additions & 0 deletions src/coreclr/inc/corhdr.h
Original file line number Diff line number Diff line change
Expand Up @@ -1754,6 +1754,8 @@ typedef enum CorInfoHFAElemType : unsigned {
CORINFO_HFA_ELEM_DOUBLE,
CORINFO_HFA_ELEM_VECTOR64,
CORINFO_HFA_ELEM_VECTOR128,
CORINFO_HFA_ELEM_VECTOR256,
CORINFO_HFA_ELEM_VECTOR512,
} CorInfoHFAElemType;

//
Expand Down
41 changes: 23 additions & 18 deletions src/coreclr/inc/corinfoinstructionset.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,24 +25,25 @@ enum CORINFO_InstructionSet
InstructionSet_Sha1=7,
InstructionSet_Sha256=8,
InstructionSet_Atomics=9,
InstructionSet_Vector64=10,
InstructionSet_Vector128=11,
InstructionSet_Dczva=12,
InstructionSet_Rcpc=13,
InstructionSet_VectorT128=14,
InstructionSet_Rcpc2=15,
InstructionSet_Sve=16,
InstructionSet_Sve2=17,
InstructionSet_ArmBase_Arm64=18,
InstructionSet_AdvSimd_Arm64=19,
InstructionSet_Aes_Arm64=20,
InstructionSet_Crc32_Arm64=21,
InstructionSet_Dp_Arm64=22,
InstructionSet_Rdm_Arm64=23,
InstructionSet_Sha1_Arm64=24,
InstructionSet_Sha256_Arm64=25,
InstructionSet_Sve_Arm64=26,
InstructionSet_Sve2_Arm64=27,
InstructionSet_Vector=10,
InstructionSet_Vector64=11,
InstructionSet_Vector128=12,
InstructionSet_Dczva=13,
InstructionSet_Rcpc=14,
InstructionSet_VectorT128=15,
InstructionSet_Rcpc2=16,
InstructionSet_Sve=17,
InstructionSet_Sve2=18,
InstructionSet_ArmBase_Arm64=19,
InstructionSet_AdvSimd_Arm64=20,
InstructionSet_Aes_Arm64=21,
InstructionSet_Crc32_Arm64=22,
InstructionSet_Dp_Arm64=23,
InstructionSet_Rdm_Arm64=24,
InstructionSet_Sha1_Arm64=25,
InstructionSet_Sha256_Arm64=26,
InstructionSet_Sve_Arm64=27,
InstructionSet_Sve2_Arm64=28,
#endif // TARGET_ARM64
#ifdef TARGET_RISCV64
InstructionSet_RiscV64Base=1,
Expand Down Expand Up @@ -459,6 +460,8 @@ inline CORINFO_InstructionSetFlags EnsureInstructionSetFlagsAreValid(CORINFO_Ins
resultflags.RemoveInstructionSet(InstructionSet_Sve);
if (resultflags.HasInstructionSet(InstructionSet_Sve2) && !resultflags.HasInstructionSet(InstructionSet_Sve))
resultflags.RemoveInstructionSet(InstructionSet_Sve2);
if (resultflags.HasInstructionSet(InstructionSet_Vector) && !resultflags.HasInstructionSet(InstructionSet_Sve))
resultflags.RemoveInstructionSet(InstructionSet_Vector);
#endif // TARGET_ARM64
#ifdef TARGET_RISCV64
if (resultflags.HasInstructionSet(InstructionSet_Zbb) && !resultflags.HasInstructionSet(InstructionSet_RiscV64Base))
Expand Down Expand Up @@ -883,6 +886,8 @@ inline const char *InstructionSetToString(CORINFO_InstructionSet instructionSet)
return "Sha256_Arm64";
case InstructionSet_Atomics :
return "Atomics";
case InstructionSet_Vector :
return "Vector";
case InstructionSet_Vector64 :
return "Vector64";
case InstructionSet_Vector128 :
Expand Down
2 changes: 2 additions & 0 deletions src/coreclr/inc/corjit.h
Original file line number Diff line number Diff line change
Expand Up @@ -438,6 +438,8 @@ class ICorJitInfo : public ICorDynamicInfo
//
virtual uint32_t getExpectedTargetArchitecture() = 0;

virtual uint32_t getTargetVectorLength() = 0;

// Fetches extended flags for a particular compilation instance. Returns
// the number of bytes written to the provided buffer.
virtual uint32_t getJitFlags(
Expand Down
2 changes: 2 additions & 0 deletions src/coreclr/inc/icorjitinfoimpl_generated.h
Original file line number Diff line number Diff line change
Expand Up @@ -736,6 +736,8 @@ uint16_t getRelocTypeHint(

uint32_t getExpectedTargetArchitecture() override;

uint32_t getTargetVectorLength() override;

uint32_t getJitFlags(
CORJIT_FLAGS* flags,
uint32_t sizeInBytes) override;
Expand Down
10 changes: 5 additions & 5 deletions src/coreclr/inc/jiteeversionguid.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,11 @@

#include <minipal/guid.h>

constexpr GUID JITEEVersionIdentifier = { /* bffedb4e-ed47-4df3-8156-7ad8fc8521f1 */
0xbffedb4e,
0xed47,
0x4df3,
{0x81, 0x56, 0x7a, 0xd8, 0xfc, 0x85, 0x21, 0xf1}
constexpr GUID JITEEVersionIdentifier = { /* 49287d16-74bd-42e9-9d47-132d7a5f67eb */
0x49287d16,
0x74bd,
0x42e9,
{0x9d, 0x47, 0x13, 0x2d, 0x7a, 0x5f, 0x67, 0xeb}
};

#endif // JIT_EE_VERSIONING_GUID_H
1 change: 1 addition & 0 deletions src/coreclr/jit/ICorJitInfo_names_generated.h
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,7 @@ DEF_CLR_API(recordCallSite)
DEF_CLR_API(recordRelocation)
DEF_CLR_API(getRelocTypeHint)
DEF_CLR_API(getExpectedTargetArchitecture)
DEF_CLR_API(getTargetVectorLength)
DEF_CLR_API(getJitFlags)
DEF_CLR_API(getSpecialCopyHelper)

Expand Down
8 changes: 8 additions & 0 deletions src/coreclr/jit/ICorJitInfo_wrapper_generated.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1723,6 +1723,14 @@ uint32_t WrapICorJitInfo::getExpectedTargetArchitecture()
return temp;
}

uint32_t WrapICorJitInfo::getTargetVectorLength()
{
API_ENTER(getTargetVectorLength);
uint32_t temp = wrapHnd->getTargetVectorLength();
API_LEAVE(getTargetVectorLength);
return temp;
}

uint32_t WrapICorJitInfo::getJitFlags(
CORJIT_FLAGS* flags,
uint32_t sizeInBytes)
Expand Down
10 changes: 9 additions & 1 deletion src/coreclr/jit/abi.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,15 @@ var_types ABIPassingSegment::GetRegisterType() const
#ifdef FEATURE_SIMD
case 16:
return TYP_SIMD16;
#endif
#ifdef TARGET_ARM64
case 32:
assert(Compiler::SizeMatchesVectorTLength(Size));
return TYP_SIMD32;
case 64:
assert(Compiler::SizeMatchesVectorTLength(Size));
return TYP_SIMD64;
#endif // TARGET_ARM64
#endif // FEATURE_SIMD
default:
assert(!"Unexpected size for floating point register");
return TYP_UNDEF;
Expand Down
7 changes: 4 additions & 3 deletions src/coreclr/jit/assertionprop.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,8 @@ bool IntegralRange::Contains(int64_t value) const
// Example: IntCns = 42 gives [0..127] with a non -precise range, [42,42] with a precise range.
return {SymbolicIntegerValue::Zero, SymbolicIntegerValue::ByteMax};
#elif defined(TARGET_ARM64)
case NI_Vector_op_Equality:
case NI_Vector_op_Inequality:
case NI_Vector64_op_Equality:
case NI_Vector64_op_Inequality:
case NI_Vector128_op_Equality:
Expand Down Expand Up @@ -2995,8 +2997,7 @@ GenTree* Compiler::optVNBasedFoldConstExpr(BasicBlock* block, GenTree* parent, G
conValTree = vecCon;
break;
}

#if defined(TARGET_XARCH)
#if defined(TARGET_XARCH) || defined(TARGET_ARM64)
case TYP_SIMD32:
{
simd32_t value = vnStore->ConstantValue<simd32_t>(vnCns);
Expand All @@ -3020,7 +3021,7 @@ GenTree* Compiler::optVNBasedFoldConstExpr(BasicBlock* block, GenTree* parent, G
}
break;

#endif // TARGET_XARCH
#endif // TARGET_XARCH || TARGET_ARM64
#endif // FEATURE_SIMD

#if defined(FEATURE_MASKED_HW_INTRINSICS)
Expand Down
138 changes: 131 additions & 7 deletions src/coreclr/jit/codegenarm64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2329,6 +2329,92 @@ void CodeGen::genSetRegToConst(regNumber targetReg, var_types targetType, GenTre
}
break;
}
case TYP_SIMD32:
{
// Use scalable registers
if (vecCon->IsAllBitsSet())
{
// Use Scalable_B because for Ones, it doesn't matter.
emit->emitIns_R_I(INS_sve_mov, EA_SCALABLE, targetReg, -1, INS_OPTS_SCALABLE_B);
}
else if (vecCon->IsZero())
{
// Use Scalable_B because for Zero, it doesn't matter.
emit->emitIns_R_I(INS_sve_mov, EA_SCALABLE, targetReg, 0, INS_OPTS_SCALABLE_B);
}
else
{
simd32_t val = vecCon->gtSimd32Val;
if (ElementsAreSame(val.i8, 32))
{
emit->emitIns_R_I(INS_sve_dup, EA_SCALABLE, targetReg, val.i8[0], INS_OPTS_SCALABLE_B);
}
else if (ElementsAreSame(val.i16, 16))
{
emit->emitIns_R_I(INS_sve_dup, EA_SCALABLE, targetReg, val.i16[0], INS_OPTS_SCALABLE_H);
}
else if (ElementsAreSame(val.i32, 8))
{
emit->emitIns_R_I(INS_sve_dup, EA_SCALABLE, targetReg, val.i32[0], INS_OPTS_SCALABLE_S);
}
else
{
// Get a temp integer register to compute long address.
regNumber addrReg = internalRegisters.GetSingle(tree);
CORINFO_FIELD_HANDLE hnd;
hnd = emit->emitSimdConst(&vecCon->gtSimdVal, emitTypeSize(tree->TypeGet()));
emit->emitIns_R_C(INS_sve_ldr, attr, targetReg, addrReg, hnd, 0);
// emit->emitIns_R_C(INS_adr, EA_8BYTE, addrReg, REG_NA, hnd, 0);
// emit->emitIns_R_R_R_I(INS_sve_ld1b, EA_SCALABLE, targetReg, REG_P1, addrReg, 0,
// INS_OPTS_SCALABLE_B);
}
}
break;
}
case TYP_SIMD64:
{
// Use scalable registers
if (vecCon->IsAllBitsSet())
{
// Use Scalable_B because for Ones, it doesn't matter.
emit->emitIns_R_I(INS_sve_mov, EA_SCALABLE, targetReg, -1, INS_OPTS_SCALABLE_B);
}
else if (vecCon->IsZero())
{
// Use Scalable_B because for Zero, it doesn't matter.
emit->emitIns_R_I(INS_sve_mov, EA_SCALABLE, targetReg, 0, INS_OPTS_SCALABLE_B);
}
else
{
simd64_t val = vecCon->gtSimd64Val;
if (ElementsAreSame(val.i32, 16) && emitter::isValidSimm_MultipleOf<8, 256>(val.i32[0]))
{
emit->emitIns_R_I(INS_sve_mov, EA_SCALABLE, targetReg, val.i32[0], INS_OPTS_SCALABLE_S,
INS_SCALABLE_OPTS_IMM_BITMASK);
}
else if (ElementsAreSame(val.i16, 32) && emitter::isValidSimm_MultipleOf<8, 256>(val.i16[0]))
{
emit->emitIns_R_I(INS_sve_mov, EA_SCALABLE, targetReg, val.i16[0], INS_OPTS_SCALABLE_H,
INS_SCALABLE_OPTS_IMM_BITMASK);
}
else if (ElementsAreSame(val.i8, 64) && emitter::isValidSimm<8>(val.i8[0]))
{
emit->emitIns_R_I(INS_sve_mov, EA_SCALABLE, targetReg, val.i8[0], INS_OPTS_SCALABLE_B,
INS_SCALABLE_OPTS_IMM_BITMASK);
}
else
{
// Get a temp integer register to compute long address.
regNumber addrReg = internalRegisters.GetSingle(tree);
CORINFO_FIELD_HANDLE hnd;
simd64_t constValue;
memcpy(&constValue, &vecCon->gtSimdVal, sizeof(simd64_t));
hnd = emit->emitSimdConst(&vecCon->gtSimdVal, emitTypeSize(tree->TypeGet()));
emit->emitIns_R_C(INS_sve_ldr, attr, targetReg, addrReg, hnd, 0);
}
}
break;
}

default:
{
Expand Down Expand Up @@ -2955,7 +3041,16 @@ void CodeGen::genSimpleReturn(GenTree* treeNode)
}
}
emitAttr attr = emitActualTypeSize(targetType);
GetEmitter()->emitIns_Mov(INS_mov, attr, retReg, op1->GetRegNum(), /* canSkip */ !movRequired);
if (attr == EA_SCALABLE)
{
// TODO-VL: Should we check the baseType or it doesn't matter because it is just reg->reg move
GetEmitter()->emitIns_Mov(INS_sve_mov, attr, retReg, op1->GetRegNum(), /* canSkip */ !movRequired,
INS_OPTS_SCALABLE_Q);
}
else
{
GetEmitter()->emitIns_Mov(INS_mov, attr, retReg, op1->GetRegNum(), /* canSkip */ !movRequired);
}
}

/***********************************************************************************************
Expand Down Expand Up @@ -5248,14 +5343,28 @@ void CodeGen::genSimdUpperSave(GenTreeIntrinsic* node)

GenTreeLclVar* lclNode = op1->AsLclVar();
LclVarDsc* varDsc = compiler->lvaGetDesc(lclNode);
assert(emitTypeSize(varDsc->GetRegisterType(lclNode)) == 16);

regNumber tgtReg = node->GetRegNum();
assert(tgtReg != REG_NA);
unsigned varSize = emitTypeSize(varDsc->GetRegisterType(lclNode));
assert((varSize == 16) || (Compiler::SizeMatchesVectorTLength(varSize)));

regNumber op1Reg = genConsumeReg(op1);
assert(op1Reg != REG_NA);

regNumber tgtReg = node->GetRegNum();
#ifdef TARGET_ARM64
// TODO-VL: Write a helper to do this check for LclVars*, GenTree*, etc.
if (Compiler::UseSveForType(op1->TypeGet()))
{
// Until we custom ABI for SVE, we will just store entire contents of Z* registers
// on stack. If we don't do it, we will need multiple free registers to save the
// contents of everything but lower 8-bytes.
assert(tgtReg == REG_NA);

GetEmitter()->emitIns_S_R(INS_sve_str, EA_SCALABLE, op1Reg, lclNode->GetLclNum(), 0);
return;
}
#endif // TARGET_ARM64
assert(tgtReg != REG_NA);

GetEmitter()->emitIns_R_R_I_I(INS_mov, EA_8BYTE, tgtReg, op1Reg, 0, 1);

if ((node->gtFlags & GTF_SPILL) != 0)
Expand Down Expand Up @@ -5304,10 +5413,12 @@ void CodeGen::genSimdUpperRestore(GenTreeIntrinsic* node)

GenTreeLclVar* lclNode = op1->AsLclVar();
LclVarDsc* varDsc = compiler->lvaGetDesc(lclNode);
assert(emitTypeSize(varDsc->GetRegisterType(lclNode)) == 16);

unsigned varSize = emitTypeSize(varDsc->GetRegisterType(lclNode));
assert((varSize == 16) || (Compiler::SizeMatchesVectorTLength(varSize)));

regNumber srcReg = node->GetRegNum();
assert(srcReg != REG_NA);
assert((srcReg != REG_NA) || (Compiler::UseSveForType(node->TypeGet())));

regNumber lclVarReg = genConsumeReg(lclNode);
assert(lclVarReg != REG_NA);
Expand All @@ -5319,6 +5430,19 @@ void CodeGen::genSimdUpperRestore(GenTreeIntrinsic* node)
// The localVar must have a stack home.
assert(varDsc->lvOnFrame);

#ifdef TARGET_ARM64
// TODO-VL: Write a helper to do this check for LclVars*, GenTree*, etc.
if (Compiler::UseSveForType(op1->TypeGet()))
{
// Until we custom ABI for SVE, we will just store entire contents of Z* registers
// on stack. If we don't do it, we will need multiple free registers to save the
// contents of everything but lower 8-bytes.

GetEmitter()->emitIns_R_S(INS_sve_ldr, EA_SCALABLE, lclVarReg, varNum, 0);
return;
}
#endif // TARGET_ARM64

// We will load this from the upper 8 bytes of this localVar's home.
int offset = 8;

Expand Down
14 changes: 14 additions & 0 deletions src/coreclr/jit/codegencommon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3225,8 +3225,15 @@ void CodeGen::genHomeRegisterParams(regNumber initReg, bool* initRegStillZeroed)
busyRegs |= genRegMask(node->copiedReg);

instruction ins = ins_Copy(node->reg, copyType);
#ifdef TARGET_ARM64
insOpts opts = Compiler::UseSveForType(copyType) ? INS_OPTS_SCALABLE_D : INS_OPTS_NONE;
GetEmitter()->emitIns_Mov(ins, emitActualTypeSize(copyType), node->copiedReg, node->reg,
/* canSkip */ false, opts);
#else
GetEmitter()->emitIns_Mov(ins, emitActualTypeSize(copyType), node->copiedReg, node->reg,
/* canSkip */ false);
#endif

if (node->copiedReg == initReg)
{
*initRegStillZeroed = false;
Expand All @@ -3243,8 +3250,15 @@ void CodeGen::genHomeRegisterParams(regNumber initReg, bool* initRegStillZeroed)

regNumber sourceReg = edge->from->copiedReg != REG_NA ? edge->from->copiedReg : edge->from->reg;
instruction ins = ins_Copy(sourceReg, genActualType(edge->type));
#ifdef TARGET_ARM64
insOpts opts = Compiler::UseSveForType(edge->type) ? INS_OPTS_SCALABLE_D : INS_OPTS_NONE;
GetEmitter()->emitIns_Mov(ins, emitActualTypeSize(edge->type), node->reg, sourceReg,
/* canSkip */ true, opts);
#else
GetEmitter()->emitIns_Mov(ins, emitActualTypeSize(edge->type), node->reg, sourceReg,
/* canSkip */ true);
#endif

break;
}

Expand Down
Loading
Loading