Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] Lowering of load atomic vector of float via cast #117189

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10956,8 +10956,8 @@ If the ``load`` is marked as ``atomic``, it takes an extra :ref:`ordering
<ordering>` and optional ``syncscope("<target-scope>")`` argument. The
``release`` and ``acq_rel`` orderings are not valid on ``load`` instructions.
Atomic loads produce :ref:`defined <memmodel>` results when they may see
multiple atomic stores. The type of the pointee must be an integer, pointer, or
floating-point type whose bit width is a power of two greater than or equal to
multiple atomic stores. The type of the pointee must be an integer, pointer,
floating-point, or vector type whose bit width is a power of two greater than or equal to
eight and less than or equal to a target-specific size limit. ``align`` must be
explicitly specified on atomic loads. Note: if the alignment is not greater or
equal to the size of the `<value>` type, the atomic operation is likely to
Expand Down Expand Up @@ -11097,8 +11097,8 @@ If the ``store`` is marked as ``atomic``, it takes an extra :ref:`ordering
<ordering>` and optional ``syncscope("<target-scope>")`` argument. The
``acquire`` and ``acq_rel`` orderings aren't valid on ``store`` instructions.
Atomic loads produce :ref:`defined <memmodel>` results when they may see
multiple atomic stores. The type of the pointee must be an integer, pointer, or
floating-point type whose bit width is a power of two greater than or equal to
multiple atomic stores. The type of the pointee must be an integer, pointer,
floating-point, or vector type whose bit width is a power of two greater than or equal to
eight and less than or equal to a target-specific size limit. ``align`` must be
explicitly specified on atomic stores. Note: if the alignment is not greater or
equal to the size of the `<value>` type, the atomic operation is likely to
Expand Down
14 changes: 8 additions & 6 deletions llvm/lib/IR/Verifier.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4255,9 +4255,10 @@ void Verifier::visitLoadInst(LoadInst &LI) {
Check(LI.getOrdering() != AtomicOrdering::Release &&
LI.getOrdering() != AtomicOrdering::AcquireRelease,
"Load cannot have Release ordering", &LI);
Check(ElTy->isIntOrPtrTy() || ElTy->isFloatingPointTy(),
"atomic load operand must have integer, pointer, or floating point "
"type!",
Check(ElTy->getScalarType()->isIntOrPtrTy() ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR shouldn't touch the verifier, only handle codegen for atomic load / store

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is stacked on top of #117625. Please take a look. This PR handles vectors only -- renamed the title.

ElTy->getScalarType()->isFloatingPointTy(),
"atomic load operand must have integer, pointer, floating point, "
"or vector type!",
ElTy, &LI);
checkAtomicMemAccessSize(ElTy, &LI);
} else {
Expand All @@ -4281,9 +4282,10 @@ void Verifier::visitStoreInst(StoreInst &SI) {
Check(SI.getOrdering() != AtomicOrdering::Acquire &&
SI.getOrdering() != AtomicOrdering::AcquireRelease,
"Store cannot have Acquire ordering", &SI);
Check(ElTy->isIntOrPtrTy() || ElTy->isFloatingPointTy(),
"atomic store operand must have integer, pointer, or floating point "
"type!",
Check(ElTy->getScalarType()->isIntOrPtrTy() ||
ElTy->getScalarType()->isFloatingPointTy(),
"atomic store operand must have integer, pointer, floating point, "
"or vector type!",
ElTy, &SI);
checkAtomicMemAccessSize(ElTy, &SI);
} else {
Expand Down
8 changes: 8 additions & 0 deletions llvm/lib/Target/X86/X86ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31158,6 +31158,14 @@ bool X86TargetLowering::needsCmpXchgNb(Type *MemType) const {
return false;
}

TargetLoweringBase::AtomicExpansionKind
X86TargetLowering::shouldCastAtomicLoadInIR(LoadInst *LI) const {
if (const auto VT = dyn_cast<VectorType>(LI->getType()))
if (VT->getElementType()->isFloatingPointTy())
Comment on lines +31163 to +31164
Copy link
Contributor

@arsenm arsenm Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move this into the default implementation. Targets can opt out as the hook is removed. Also this need not be limited to FP vectors. All vectors should cast to int

return AtomicExpansionKind::CastToInteger;
jofrn marked this conversation as resolved.
Show resolved Hide resolved
return TargetLowering::shouldCastAtomicLoadInIR(LI);
}
Comment on lines +31161 to +31167
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be moving in the opposite direction. The existing tests atomic FP load already works, and this is regressing it to use the IR expansion (which should be deleted)

Copy link
Contributor Author

@jofrn jofrn Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one didn't handle atomic vectors. This PR only needs atomic load of vector type to pass. Casting it here is simple for now since the hook still exists.


TargetLoweringBase::AtomicExpansionKind
X86TargetLowering::shouldExpandAtomicStoreInIR(StoreInst *SI) const {
Type *MemType = SI->getValueOperand()->getType();
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/Target/X86/X86ISelLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -1808,6 +1808,8 @@ namespace llvm {
const MCPhysReg *getScratchRegisters(CallingConv::ID CC) const override;
ArrayRef<MCPhysReg> getRoundingControlRegisters() const override;

TargetLoweringBase::AtomicExpansionKind
shouldCastAtomicLoadInIR(LoadInst *LI) const override;
TargetLoweringBase::AtomicExpansionKind
shouldExpandAtomicLoadInIR(LoadInst *LI) const override;
TargetLoweringBase::AtomicExpansionKind
Expand Down
112 changes: 112 additions & 0 deletions llvm/test/CodeGen/X86/atomicvec-float.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc < %s --mtriple=x86_64 | FileCheck %s
jofrn marked this conversation as resolved.
Show resolved Hide resolved

define <1 x float> @load_atomic_vector1_float(ptr %src) {
; CHECK-LABEL: load_atomic_vector1_float:
; CHECK: # %bb.0:
; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; CHECK-NEXT: retq
%ret = load atomic <1 x float>, ptr %src acquire, align 4
ret <1 x float> %ret
}

define <2 x float> @load_atomic_vector2_float(ptr %src) {
; CHECK-LABEL: load_atomic_vector2_float:
; CHECK: # %bb.0:
; CHECK-NEXT: pushq %rax
; CHECK-NEXT: .cfi_def_cfa_offset 16
; CHECK-NEXT: movq %rdi, %rsi
; CHECK-NEXT: movq %rsp, %rdx
; CHECK-NEXT: movl $8, %edi
; CHECK-NEXT: movl $2, %ecx
; CHECK-NEXT: callq __atomic_load@PLT
; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; CHECK-NEXT: popq %rax
; CHECK-NEXT: .cfi_def_cfa_offset 8
; CHECK-NEXT: retq
jofrn marked this conversation as resolved.
Show resolved Hide resolved
%ret = load atomic <2 x float>, ptr %src acquire, align 4
ret <2 x float> %ret
}

define <1 x double> @load_atomic_vector1_double(ptr %src) {
; CHECK-LABEL: load_atomic_vector1_double:
; CHECK: # %bb.0:
; CHECK-NEXT: pushq %rax
; CHECK-NEXT: .cfi_def_cfa_offset 16
; CHECK-NEXT: movq %rdi, %rsi
; CHECK-NEXT: movq %rsp, %rdx
; CHECK-NEXT: movl $8, %edi
; CHECK-NEXT: movl $2, %ecx
; CHECK-NEXT: callq __atomic_load@PLT
; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; CHECK-NEXT: popq %rax
; CHECK-NEXT: .cfi_def_cfa_offset 8
; CHECK-NEXT: retq
%ret = load atomic <1 x double>, ptr %src acquire, align 4
ret <1 x double> %ret
}

define <2 x double> @load_atomic_vector2_double(ptr %src) {
; CHECK-LABEL: load_atomic_vector2_double:
; CHECK: # %bb.0:
; CHECK-NEXT: subq $24, %rsp
; CHECK-NEXT: .cfi_def_cfa_offset 32
; CHECK-NEXT: movq %rdi, %rsi
; CHECK-NEXT: movq %rsp, %rdx
; CHECK-NEXT: movl $16, %edi
; CHECK-NEXT: movl $2, %ecx
; CHECK-NEXT: callq __atomic_load@PLT
; CHECK-NEXT: movaps (%rsp), %xmm0
; CHECK-NEXT: addq $24, %rsp
; CHECK-NEXT: .cfi_def_cfa_offset 8
; CHECK-NEXT: retq
%ret = load atomic <2 x double>, ptr %src acquire, align 4
ret <2 x double> %ret
}

define <2 x half> @load_atomic_vector_half(ptr %src) {
; CHECK-LABEL: load_atomic_vector_half:
; CHECK: # %bb.0:
; CHECK-NEXT: movl (%rdi), %eax
; CHECK-NEXT: movd %eax, %xmm0
; CHECK-NEXT: retq
%ret = load atomic <2 x half>, ptr %src acquire, align 4
ret <2 x half> %ret
}

define <2 x float> @load_atomic_vector_bfloat(ptr %src) {
; CHECK-LABEL: load_atomic_vector_bfloat:
; CHECK: # %bb.0:
; CHECK-NEXT: pushq %rax
; CHECK-NEXT: .cfi_def_cfa_offset 16
; CHECK-NEXT: movq %rdi, %rsi
; CHECK-NEXT: movq %rsp, %rdx
; CHECK-NEXT: movl $8, %edi
; CHECK-NEXT: movl $2, %ecx
; CHECK-NEXT: callq __atomic_load@PLT
; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; CHECK-NEXT: popq %rax
; CHECK-NEXT: .cfi_def_cfa_offset 8
; CHECK-NEXT: retq
%ret = load atomic <2 x float>, ptr %src acquire, align 4
ret <2 x float> %ret
}

define <2 x fp128> @load_atomic_vector_fp128(ptr %src) {
; CHECK-LABEL: load_atomic_vector_fp128:
; CHECK: # %bb.0:
; CHECK-NEXT: subq $40, %rsp
; CHECK-NEXT: .cfi_def_cfa_offset 48
; CHECK-NEXT: movq %rdi, %rsi
; CHECK-NEXT: movq %rsp, %rdx
; CHECK-NEXT: movl $32, %edi
; CHECK-NEXT: movl $2, %ecx
; CHECK-NEXT: callq __atomic_load@PLT
; CHECK-NEXT: movaps (%rsp), %xmm0
; CHECK-NEXT: movaps {{[0-9]+}}(%rsp), %xmm1
; CHECK-NEXT: addq $40, %rsp
; CHECK-NEXT: .cfi_def_cfa_offset 8
; CHECK-NEXT: retq
%ret = load atomic <2 x fp128>, ptr %src acquire, align 4
ret <2 x fp128> %ret
}
91 changes: 91 additions & 0 deletions llvm/test/Transforms/AtomicExpand/atomicvec-float.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
; RUN: opt < %s --mtriple=x86_64 --passes=atomic-expand -S -o - | FileCheck %s

define <1 x float> @load_atomic_vector1_float(ptr %src) {
; CHECK-LABEL: define <1 x float> @load_atomic_vector1_float(
; CHECK-SAME: ptr [[SRC:%.*]]) {
; CHECK-NEXT: [[TMP1:%.*]] = load atomic i32, ptr [[SRC]] acquire, align 4
; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[TMP1]] to <1 x float>
; CHECK-NEXT: ret <1 x float> [[TMP2]]
;
%ret = load atomic <1 x float>, ptr %src acquire, align 4
ret <1 x float> %ret
}

define <2 x float> @load_atomic_vector2_float(ptr %src) {
; CHECK-LABEL: define <2 x float> @load_atomic_vector2_float(
; CHECK-SAME: ptr [[SRC:%.*]]) {
; CHECK-NEXT: [[TMP1:%.*]] = alloca <2 x float>, align 8
; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 8, ptr [[TMP1]])
; CHECK-NEXT: call void @__atomic_load(i64 8, ptr [[SRC]], ptr [[TMP1]], i32 2)
; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, ptr [[TMP1]], align 8
; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 8, ptr [[TMP1]])
; CHECK-NEXT: ret <2 x float> [[TMP2]]
;
%ret = load atomic <2 x float>, ptr %src acquire, align 4
ret <2 x float> %ret
}

define <1 x double> @load_atomic_vector1_double(ptr %src) {
; CHECK-LABEL: define <1 x double> @load_atomic_vector1_double(
; CHECK-SAME: ptr [[SRC:%.*]]) {
; CHECK-NEXT: [[TMP1:%.*]] = alloca <1 x double>, align 8
; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 8, ptr [[TMP1]])
; CHECK-NEXT: call void @__atomic_load(i64 8, ptr [[SRC]], ptr [[TMP1]], i32 2)
; CHECK-NEXT: [[TMP2:%.*]] = load <1 x double>, ptr [[TMP1]], align 8
; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 8, ptr [[TMP1]])
; CHECK-NEXT: ret <1 x double> [[TMP2]]
;
%ret = load atomic <1 x double>, ptr %src acquire, align 4
ret <1 x double> %ret
}

define <2 x double> @load_atomic_vector2_double(ptr %src) {
; CHECK-LABEL: define <2 x double> @load_atomic_vector2_double(
; CHECK-SAME: ptr [[SRC:%.*]]) {
; CHECK-NEXT: [[TMP1:%.*]] = alloca <2 x double>, align 16
; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 16, ptr [[TMP1]])
; CHECK-NEXT: call void @__atomic_load(i64 16, ptr [[SRC]], ptr [[TMP1]], i32 2)
; CHECK-NEXT: [[TMP2:%.*]] = load <2 x double>, ptr [[TMP1]], align 16
; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 16, ptr [[TMP1]])
; CHECK-NEXT: ret <2 x double> [[TMP2]]
;
%ret = load atomic <2 x double>, ptr %src acquire, align 4
ret <2 x double> %ret
}

define <2 x half> @load_atomic_vector_half(ptr %src) {
; CHECK-LABEL: define <2 x half> @load_atomic_vector_half(
; CHECK-SAME: ptr [[SRC:%.*]]) {
; CHECK-NEXT: [[TMP1:%.*]] = load atomic i32, ptr [[SRC]] acquire, align 4
; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[TMP1]] to <2 x half>
; CHECK-NEXT: ret <2 x half> [[TMP2]]
;
%ret = load atomic <2 x half>, ptr %src acquire, align 4
ret <2 x half> %ret
}

define <2 x bfloat> @load_atomic_vector_bfloat(ptr %src) {
; CHECK-LABEL: define <2 x bfloat> @load_atomic_vector_bfloat(
; CHECK-SAME: ptr [[SRC:%.*]]) {
; CHECK-NEXT: [[TMP1:%.*]] = load atomic i32, ptr [[SRC]] acquire, align 4
; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[TMP1]] to <2 x bfloat>
; CHECK-NEXT: ret <2 x bfloat> [[TMP2]]
;
%ret = load atomic <2 x bfloat>, ptr %src acquire, align 4
ret <2 x bfloat> %ret
}

define <2 x fp128> @load_atomic_vector_fp128(ptr %src) {
; CHECK-LABEL: define <2 x fp128> @load_atomic_vector_fp128(
; CHECK-SAME: ptr [[SRC:%.*]]) {
; CHECK-NEXT: [[TMP1:%.*]] = alloca <2 x fp128>, align 16
; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 32, ptr [[TMP1]])
; CHECK-NEXT: call void @__atomic_load(i64 32, ptr [[SRC]], ptr [[TMP1]], i32 2)
; CHECK-NEXT: [[TMP2:%.*]] = load <2 x fp128>, ptr [[TMP1]], align 16
; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 32, ptr [[TMP1]])
; CHECK-NEXT: ret <2 x fp128> [[TMP2]]
;
%ret = load atomic <2 x fp128>, ptr %src acquire, align 4
ret <2 x fp128> %ret
}
15 changes: 8 additions & 7 deletions llvm/test/Verifier/atomics.ll
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
; RUN: not opt -passes=verify < %s 2>&1 | FileCheck %s
; CHECK: atomic store operand must have integer, pointer, floating point, or vector type!
; CHECK: atomic load operand must have integer, pointer, floating point, or vector type!

; CHECK: atomic store operand must have integer, pointer, or floating point type!
; CHECK: atomic load operand must have integer, pointer, or floating point type!
%ty = type { i32 };

define void @foo(ptr %P, <1 x i64> %v) {
store atomic <1 x i64> %v, ptr %P unordered, align 8
define void @foo(ptr %P, %ty %v) {
store atomic %ty %v, ptr %P unordered, align 8
ret void
}

define <1 x i64> @bar(ptr %P) {
%v = load atomic <1 x i64>, ptr %P unordered, align 8
ret <1 x i64> %v
define %ty @bar(ptr %P) {
%v = load atomic %ty, ptr %P unordered, align 8
ret %ty %v
}
Loading