Skip to content

Commit

Permalink
[X86][SSE41] Combine insertion of zero scalars into vector blends wit…
Browse files Browse the repository at this point in the history
…h zero

Part 1 of 2
This patch attempts to replace the insertion of zero scalars with a vector blend with zero, avoiding the use of the integer insertion instructions (which are particularly slow on many targets).
(Part 2 will add support for combining multiple blends-with-zero).

Differential Revision: http://reviews.llvm.org/D17483

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261743 91177308-0d34-0410-b5e6-96231b3b80d8
  • Loading branch information
RKSimon committed Feb 24, 2016
1 parent 8d04517 commit 14d8a84
Show file tree
Hide file tree
Showing 3 changed files with 189 additions and 98 deletions.
14 changes: 14 additions & 0 deletions lib/Target/X86/X86ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12301,6 +12301,7 @@ SDValue X86TargetLowering::LowerINSERT_VECTOR_ELT(SDValue Op,
SelectionDAG &DAG) const {
MVT VT = Op.getSimpleValueType();
MVT EltVT = VT.getVectorElementType();
unsigned NumElts = VT.getVectorNumElements();

if (EltVT == MVT::i1)
return InsertBitToMaskVector(Op, DAG);
Expand All @@ -12314,6 +12315,19 @@ SDValue X86TargetLowering::LowerINSERT_VECTOR_ELT(SDValue Op,
auto *N2C = cast<ConstantSDNode>(N2);
unsigned IdxVal = N2C->getZExtValue();

// If we are clearing out a element, we do this more efficiently with a
// blend shuffle than a costly integer insertion.
// TODO: would other rematerializable values (e.g. allbits) benefit as well?
// TODO: pre-SSE41 targets will tend to use bit masking - this could still
// be beneficial if we are inserting several zeros and can combine the masks.
if (X86::isZeroNode(N1) && Subtarget.hasSSE41() && NumElts <= 8) {
SmallVector<int, 8> ClearMask;
for (unsigned i = 0; i != NumElts; ++i)
ClearMask.push_back(i == IdxVal ? i + NumElts : i);
SDValue ZeroVector = getZeroVector(VT, Subtarget, DAG, dl);
return DAG.getVectorShuffle(VT, dl, N0, ZeroVector, ClearMask);
}

// If the vector is wider than 128 bits, extract the 128-bit subvector, insert
// into that, and then insert the subvector back into the result.
if (VT.is256BitVector() || VT.is512BitVector()) {
Expand Down
Loading

0 comments on commit 14d8a84

Please sign in to comment.