[LoadStoreVectorizer] Postprocess and merge equivalence classes #114501

v-klochkov · 2024-11-01T02:48:40Z

This patch introduces a new method:
void Vectorizer::mergeEquivalenceClasses(EquivalenceClassMap &EQClasses) const

The method is called at the end of Vectorizer::collectEquivalenceClasses() and is needed to merge equivalence classes that differ only by their underlying objects (UO1 and UO2), where UO1 is 1-level-indirection underlying base for UO2. This situation arises due to the limited lookup depth used during the search of underlying bases with llvm::getUnderlyingObject(ptr).

Using any fixed lookup depth can result into creation of multiple equivalence classes that only differ by 1-level indirection bases.

The new approach merges equivalence classes if they have adjacent bases (1-level indirection). If a series of equivalence classes form ladder formed of 1-step/level indirections, they are all merged into a single equivalence class. This provides more opportunities for the load-store vectorizer to generate better vectors.

github-actions · 2024-11-01T02:48:59Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-11-01T06:02:31Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Vyacheslav Klochkov (v-klochkov)

Changes

This patch introduces a new method:
void Vectorizer::mergeEquivalenceClasses(EquivalenceClassMap &EQClasses) const

The method is called at the end of Vectorizer::collectEquivalenceClasses() and is needed to merge equivalence classes that differ only by their underlying objects (UB1 and UB2), where UB1 is 1-level-indirection underlying base for UB2. This situation arises due to the limited lookup depth used during the search of underlying bases with llvm::getUnderlyingObject(ptr).

Using any fixed lookup depth can result into creation of multiple equivalence classes that only differ by 1-level indirection bases.

The new approach merges equivalence classes if they have adjucent bases (1-level indirection). If a series of equivalence classes form ladder formed of 1-step/level indirections, they are all merged into a single equivalence class. This provides more opportunities for the load-store vectorizer to generate better vectors.

Full diff: https://github.com/llvm/llvm-project/pull/114501.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp (+128)
(added) llvm/test/Transforms/LoadStoreVectorizer/X86/massive_indirection.ll (+63)

diff --git a/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp b/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
index 02ec1d5c259cd6..59c7f2239d972a 100644
--- a/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
@@ -324,6 +324,11 @@ class Vectorizer {
       Instruction *ChainElem, Instruction *ChainBegin,
       const DenseMap<Instruction *, APInt /*OffsetFromLeader*/> &ChainOffsets);
 
+  /// Merges the equivalence classes if they have uderlying objects that differ
+  /// by one level of indirection (i.e., one is a getelementptr and the other is
+  /// the base pointer in that getelementptr).
+  void mergeEquivalenceClasses(EquivalenceClassMap &EQClasses) const;
+
   /// Collects loads and stores grouped by "equivalence class", where:
   ///   - all elements in an eq class are a load or all are a store,
   ///   - they all load/store the same element size (it's OK to have e.g. i8 and
@@ -1305,6 +1310,128 @@ std::optional<APInt> Vectorizer::getConstantOffsetSelects(
   return std::nullopt;
 }
 
+void Vectorizer::mergeEquivalenceClasses(EquivalenceClassMap &EQClasses) const {
+  if (EQClasses.size() < 2) // There is nothing to merge.
+    return;
+
+  // The reduced key has all elements of the ECClassKey except the underlying
+  // object. Check that EqClassKey has 4 elements and define the reduced key.
+  static_assert(std::tuple_size_v<EqClassKey> == 4,
+                "EqClassKey has changed - EqClassReducedKey needs changes too");
+  using EqClassReducedKey =
+      std::tuple<std::tuple_element_t<1, EqClassKey> /* AddrSpace */,
+                 std::tuple_element_t<2, EqClassKey> /* Element size */,
+                 std::tuple_element_t<3, EqClassKey> /* IsLoad; */>;
+  using ECReducedKeyToUnderlyingObjectMap =
+      MapVector<EqClassReducedKey,
+                SmallPtrSet<std::tuple_element_t<0, EqClassKey>, 4>>;
+
+  // Form a map from the reduced key (without the underlying object) to the
+  // underlying objects: 1 reduced key to many underlying objects, to form
+  // groups of potentially merge-able equivalence classes.
+  ECReducedKeyToUnderlyingObjectMap RedKeyToUOMap;
+  bool FoundPotentiallyOptimizableEC = false;
+  for (const auto &EC : EQClasses) {
+    const auto &Key = EC.first;
+    EqClassReducedKey RedKey{std::get<1>(Key), std::get<2>(Key),
+                             std::get<3>(Key)};
+    RedKeyToUOMap[RedKey].insert(std::get<0>(Key));
+    if (RedKeyToUOMap[RedKey].size() > 1)
+      FoundPotentiallyOptimizableEC = true;
+  }
+  if (!FoundPotentiallyOptimizableEC)
+    return;
+
+  LLVM_DEBUG({
+    dbgs() << "LSV: mergeEquivalenceClasses: before merging:\n";
+    for (const auto &EC : EQClasses) {
+      dbgs() << "    Key: ([" << std::get<0>(EC.first)
+             << "]: " << *std::get<0>(EC.first) << ", " << std::get<1>(EC.first)
+             << ", " << std::get<2>(EC.first) << ", "
+             << static_cast<int>(std::get<3>(EC.first)) << ")\n";
+      for (const auto &Inst : EC.second)
+        dbgs() << "\tInst:\t" << *Inst << "\n";
+    }
+  });
+  LLVM_DEBUG({
+    dbgs() << "LSV: mergeEquivalenceClasses: RedKeyToUOMap:\n";
+    for (const auto &RedKeyToUO : RedKeyToUOMap) {
+      dbgs() << "    Reduced key: (" << std::get<0>(RedKeyToUO.first) << ", "
+             << std::get<1>(RedKeyToUO.first) << ", "
+             << static_cast<int>(std::get<2>(RedKeyToUO.first)) << ") --> "
+             << RedKeyToUO.second.size() << " underlying objects:\n";
+      for (auto UObject : RedKeyToUO.second)
+        dbgs() << "    [" << UObject << "]: " << *UObject << "\n";
+    }
+  });
+
+  using UObjectToUObjectMap = DenseMap<const Value *, const Value *>;
+
+  // Compute the ultimate targets for a set of underlying objects.
+  auto GetUltimateTargets =
+      [](SmallPtrSetImpl<const Value *> &UObjects) -> UObjectToUObjectMap {
+    UObjectToUObjectMap IndirectionMap;
+    for (const auto *UObject : UObjects) {
+      const unsigned MaxLookupDepth = 1; // look for 1-level indirections only
+      const auto *UltimateTarget =
+          llvm::getUnderlyingObject(UObject, MaxLookupDepth);
+      if (UltimateTarget != UObject)
+        IndirectionMap[UObject] = UltimateTarget;
+    }
+    UObjectToUObjectMap UltimateTargetsMap;
+    for (const auto *UObject : UObjects) {
+      auto Target = UObject;
+      auto It = IndirectionMap.find(Target);
+      for (; It != IndirectionMap.end(); It = IndirectionMap.find(Target))
+        Target = It->second;
+      UltimateTargetsMap[UObject] = Target;
+    }
+    return UltimateTargetsMap;
+  };
+
+  // For each item in RedKeyToUOMap, if it has more than one underlying object,
+  // try to merge the equivalence classes.
+  for (auto &RedKeyToUO : RedKeyToUOMap) {
+    auto UObjects = RedKeyToUO.second;
+    if (UObjects.size() < 2)
+      continue;
+    const auto RedKey = RedKeyToUO.first;
+    auto UTMap = GetUltimateTargets(UObjects);
+    for (const auto &UT : UTMap) {
+      const Value *UObject = UT.first;
+      const Value *UltimateTarget = UT.second;
+      if (UObject == UltimateTarget)
+        continue;
+
+      EqClassKey KeyFrom{UObject, std::get<0>(RedKey), std::get<1>(RedKey),
+                         std::get<2>(RedKey)};
+      EqClassKey KeyTo{UltimateTarget, std::get<0>(RedKey), std::get<1>(RedKey),
+                       std::get<2>(RedKey)};
+      auto VecFrom = EQClasses[KeyFrom];
+      auto VecTo = EQClasses[KeyTo];
+      SmallVector<Instruction *, 8> MergedVec;
+      std::merge(VecFrom.begin(), VecFrom.end(), VecTo.begin(), VecTo.end(),
+                 std::back_inserter(MergedVec),
+                 [](Instruction *A, Instruction *B) {
+                   return A && B && A->comesBefore(B);
+                 });
+      EQClasses[KeyTo] = std::move(MergedVec);
+      EQClasses.erase(KeyFrom);
+    }
+  }
+  LLVM_DEBUG({
+    dbgs() << "LSV: mergeEquivalenceClasses: after merging:\n";
+    for (const auto &EC : EQClasses) {
+      dbgs() << "    Key: ([" << std::get<0>(EC.first)
+             << "]: " << *std::get<0>(EC.first) << ", " << std::get<1>(EC.first)
+             << ", " << std::get<2>(EC.first) << ", "
+             << static_cast<int>(std::get<3>(EC.first)) << ")\n";
+      for (const auto &Inst : EC.second)
+        dbgs() << "\tInst:\t" << *Inst << "\n";
+    }
+  });
+}
+
 EquivalenceClassMap
 Vectorizer::collectEquivalenceClasses(BasicBlock::iterator Begin,
                                       BasicBlock::iterator End) {
@@ -1377,6 +1504,7 @@ Vectorizer::collectEquivalenceClasses(BasicBlock::iterator Begin,
         .emplace_back(&I);
   }
 
+  mergeEquivalenceClasses(Ret);
   return Ret;
 }
 
diff --git a/llvm/test/Transforms/LoadStoreVectorizer/X86/massive_indirection.ll b/llvm/test/Transforms/LoadStoreVectorizer/X86/massive_indirection.ll
new file mode 100644
index 00000000000000..ab320f02ed937d
--- /dev/null
+++ b/llvm/test/Transforms/LoadStoreVectorizer/X86/massive_indirection.ll
@@ -0,0 +1,63 @@
+; RUN: opt %s -mtriple=x86_64-unknown-linux-gnu -passes=load-store-vectorizer -mcpu=skx -S -o %t.out.ll
+; RUN: FileCheck -input-file=%t.out.ll %s
+
+; This test verifies that the vectorizer can handle an extended sequence of
+; getelementptr instructions and generate longer vectors. With special handling,
+; some elements can still be vectorized even if they require looking up the
+; common underlying object deeper than 6 levels from the original pointer.
+
+; The test below is the simplified version of actual performance oriented
+; workload; the offsets in getelementptr instructins are similar or same for
+; the test simplicity.
+
+define void @v1_v2_v4_v1_to_v8_levels_6_7_8_8(i32 %arg0, ptr align 16 %arg1) {
+; CHECK-LABEL: @v1_v2_v4_v1_to_v8_levels_6_7_8_8
+; CHECK: store <8 x half>
+
+  %level1 = getelementptr inbounds i8, ptr %arg1, i32 917504
+  %level2 = getelementptr i8, ptr %level1, i32 %arg0
+  %level3 = getelementptr i8, ptr %level2, i32 32768
+  %level4 = getelementptr inbounds i8, ptr %level3, i32 %arg0
+  %level5 = getelementptr i8, ptr %level4, i32 %arg0
+
+  %a6 = getelementptr i8, ptr %level5, i32 %arg0
+  %b7 = getelementptr i8, ptr %a6, i32 2
+  %c8 = getelementptr i8, ptr %b7, i32 8
+  %d8 = getelementptr inbounds i8, ptr %b7, i32 12
+
+  store half 0xH0000, ptr %a6, align 16
+  store <4 x half> zeroinitializer, ptr %b7, align 2
+  store <2 x half> zeroinitializer, ptr %c8, align 2
+  store half 0xH0000, ptr %d8, align 2
+  ret void
+}
+
+define void @v1x8_levels_6_7_8_9_10_11_12_13(i32 %arg0, ptr align 16 %arg1) {
+; CHECK-LABEL: @v1x8_levels_6_7_8_9_10_11_12_13
+; CHECK: store <8 x half>
+
+  %level1 = getelementptr inbounds i8, ptr %arg1, i32 917504
+  %level2 = getelementptr i8, ptr %level1, i32 %arg0
+  %level3 = getelementptr i8, ptr %level2, i32 32768
+  %level4 = getelementptr inbounds i8, ptr %level3, i32 %arg0
+  %level5 = getelementptr i8, ptr %level4, i32 %arg0
+
+  %a6 = getelementptr i8, ptr %level5, i32 %arg0
+  %b7 = getelementptr i8, ptr %a6, i32 2
+  %c8 = getelementptr i8, ptr %b7, i32 2
+  %d9 = getelementptr inbounds i8, ptr %c8, i32 2
+  %e10 = getelementptr inbounds i8, ptr %d9, i32 2
+  %f11 = getelementptr inbounds i8, ptr %e10, i32 2
+  %g12 = getelementptr inbounds i8, ptr %f11, i32 2
+  %h13 = getelementptr inbounds i8, ptr %g12, i32 2
+
+  store half 0xH0000, ptr %a6, align 16
+  store half 0xH0000, ptr %b7, align 2
+  store half 0xH0000, ptr %c8, align 2
+  store half 0xH0000, ptr %d9, align 2
+  store half 0xH0000, ptr %e10, align 8
+  store half 0xH0000, ptr %f11, align 2
+  store half 0xH0000, ptr %g12, align 2
+  store half 0xH0000, ptr %h13, align 2
+  ret void
+}

arsenm · 2024-11-01T06:15:06Z

llvm/test/Transforms/LoadStoreVectorizer/X86/massive_indirection.ll

+; common underlying object deeper than 6 levels from the original pointer.
+
+; The test below is the simplified version of actual performance oriented
+; workload; the offsets in getelementptr instructins are similar or same for


Suggested change

; workload; the offsets in getelementptr instructins are similar or same for

; workload; the offsets in getelementptr instructions are similar or same for

the misprint is fixed: 91a74c8

arsenm · 2024-11-01T06:15:42Z

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

+  for (auto &RedKeyToUO : RedKeyToUOMap) {
+    auto UObjects = RedKeyToUO.second;
+    if (UObjects.size() < 2)
+      continue;
+    const auto RedKey = RedKeyToUO.first;


use c++17 structure binding

Thank you for the comment! I made an incremental fix for it: 91a74c8

arsenm · 2024-11-01T06:15:55Z

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

+    for (const auto &UT : UTMap) {
+      const Value *UObject = UT.first;
+      const Value *UltimateTarget = UT.second;


use c++17 structure binding

Fixed in 91a74c8 . Thank you.

v-klochkov · 2024-11-08T18:57:24Z

Dear reviewers and colleagues (@arsenm , @jlebar , @valerydmit @michalpaszkowski ) this note is a friendly ping.
Please review the patch, it is an isolated incremental addition (1 func is added) that improves the LSV in general.

arsenm · 2024-11-08T19:41:13Z

Using any fixed lookup depth can result into creation of multiple equivalence classes that only differ by 1-level indirection bases.

The test IR doesn't look canonical / optimal. Other passes are expected to have rewritten the addressing expressions to be a simpler form. If I run your examples through -O3, it mostly vectorizes with some quirks:

have to hackily avoid a memset that forms
Ends up with a scalar store + a 7x vector store

arsenm · 2024-11-08T19:19:27Z

llvm/test/Transforms/LoadStoreVectorizer/X86/massive_indirection.ll

+; RUN: opt %s -mtriple=x86_64-unknown-linux-gnu -passes=load-store-vectorizer -mcpu=skx -S -o %t.out.ll
+; RUN: FileCheck -input-file=%t.out.ll %s


Avoid the temporary file, nearly all tests should just pipe directly to FileCheck

Fixed as suggested: 427983a

arsenm · 2024-11-08T19:20:07Z

llvm/test/Transforms/LoadStoreVectorizer/X86/massive_indirection.ll

+  %level1 = getelementptr inbounds i8, ptr %arg1, i32 917504
+  %level2 = getelementptr i8, ptr %level1, i32 %arg0
+  %level3 = getelementptr i8, ptr %level2, i32 32768
+  %level4 = getelementptr inbounds i8, ptr %level3, i32 %arg0
+  %level5 = getelementptr i8, ptr %level4, i32 %arg0


Why do some of the GEPs have inbounds and some not? If they are not significant, they should probably be removed

There were some in the original workload. This test is the result of big simplification.
Those inbound are not important for the test, so I removed them as suggested: 427983a

arsenm · 2024-11-08T19:20:37Z

llvm/test/Transforms/LoadStoreVectorizer/X86/massive_indirection.ll

+
+define void @v1_v2_v4_v1_to_v8_levels_6_7_8_8(i32 %arg0, ptr align 16 %arg1) {
+; CHECK-LABEL: @v1_v2_v4_v1_to_v8_levels_6_7_8_8
+; CHECK: store <8 x half>


Use update_test_checks so we can see full context

Ok, added more context using update_test_checks.py. Thank you. See 427983a

arsenm · 2024-11-08T19:20:58Z

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

+  LLVM_DEBUG({
+    dbgs() << "LSV: mergeEquivalenceClasses: after merging:\n";
+    for (const auto &EC : EQClasses) {
+      dbgs() << "    Key: ([" << std::get<0>(EC.first)
+             << "]: " << *std::get<0>(EC.first) << ", " << std::get<1>(EC.first)
+             << ", " << std::get<2>(EC.first) << ", "
+             << static_cast<int>(std::get<3>(EC.first)) << ")\n";
+      for (const auto &Inst : EC.second)
+        dbgs() << "\tInst:\t" << *Inst << "\n";
+    }
+  });


Suggested change

LLVM_DEBUG({

dbgs() << "LSV: mergeEquivalenceClasses: after merging:\n";

for (const auto &EC : EQClasses) {

dbgs() << " Key: ([" << std::get<0>(EC.first)

<< "]: " << *std::get<0>(EC.first) << ", " << std::get<1>(EC.first)

<< ", " << std::get<2>(EC.first) << ", "

<< static_cast<int>(std::get<3>(EC.first)) << ")\n";

for (const auto &Inst : EC.second)

dbgs() << "\tInst:\t" << *Inst << "\n";

}

});

LLVM_DEBUG({

dbgs() << "LSV: mergeEquivalenceClasses: after merging:\n";

for (const auto &EC : EQClasses) {

dbgs() << " Key: ([" << std::get<0>(EC.first)

<< "]: " << *std::get<0>(EC.first) << ", " << std::get<1>(EC.first)

<< ", " << std::get<2>(EC.first) << ", "

<< static_cast<int>(std::get<3>(EC.first)) << ")\n";

for (const auto &Inst : EC.second)

dbgs() << "\tInst: " << *Inst << '\n';

}

});

arsenm · 2024-11-08T19:21:26Z

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

+      auto VecFrom = EQClasses[KeyFrom];
+      auto VecTo = EQClasses[KeyTo];


No auto. Should these be references?

const auto & would be good here, IMO. Added it here: 427983a

arsenm · 2024-11-08T19:22:06Z

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

+  LLVM_DEBUG({
+    dbgs() << "LSV: mergeEquivalenceClasses: before merging:\n";
+    for (const auto &EC : EQClasses) {
+      dbgs() << "    Key: ([" << std::get<0>(EC.first)
+             << "]: " << *std::get<0>(EC.first) << ", " << std::get<1>(EC.first)
+             << ", " << std::get<2>(EC.first) << ", "
+             << static_cast<int>(std::get<3>(EC.first)) << ")\n";
+      for (const auto &Inst : EC.second)
+        dbgs() << "\tInst:\t" << *Inst << "\n";
+    }
+  });
+  LLVM_DEBUG({
+    dbgs() << "LSV: mergeEquivalenceClasses: RedKeyToUOMap:\n";
+    for (const auto &RedKeyToUO : RedKeyToUOMap) {
+      dbgs() << "    Reduced key: (" << std::get<0>(RedKeyToUO.first) << ", "
+             << std::get<1>(RedKeyToUO.first) << ", "
+             << static_cast<int>(std::get<2>(RedKeyToUO.first)) << ") --> "
+             << RedKeyToUO.second.size() << " underlying objects:\n";
+      for (auto UObject : RedKeyToUO.second)
+        dbgs() << "    [" << UObject << "]: " << *UObject << "\n";
+    }
+  });


Suggested change

LLVM_DEBUG({

dbgs() << "LSV: mergeEquivalenceClasses: before merging:\n";

for (const auto &EC : EQClasses) {

dbgs() << " Key: ([" << std::get<0>(EC.first)

<< "]: " << *std::get<0>(EC.first) << ", " << std::get<1>(EC.first)

<< ", " << std::get<2>(EC.first) << ", "

<< static_cast<int>(std::get<3>(EC.first)) << ")\n";

for (const auto &Inst : EC.second)

dbgs() << "\tInst:\t" << *Inst << "\n";

}

});

LLVM_DEBUG({

dbgs() << "LSV: mergeEquivalenceClasses: RedKeyToUOMap:\n";

for (const auto &RedKeyToUO : RedKeyToUOMap) {

dbgs() << " Reduced key: (" << std::get<0>(RedKeyToUO.first) << ", "

<< std::get<1>(RedKeyToUO.first) << ", "

<< static_cast<int>(std::get<2>(RedKeyToUO.first)) << ") --> "

<< RedKeyToUO.second.size() << " underlying objects:\n";

for (auto UObject : RedKeyToUO.second)

dbgs() << " [" << UObject << "]: " << *UObject << "\n";

}

});

LLVM_DEBUG({

dbgs() << "LSV: mergeEquivalenceClasses: before merging:\n";

for (const auto &EC : EQClasses) {

dbgs() << " Key: ([" << std::get<0>(EC.first)

<< "]: " << *std::get<0>(EC.first) << ", " << std::get<1>(EC.first)

<< ", " << std::get<2>(EC.first) << ", "

<< static_cast<int>(std::get<3>(EC.first)) << ")\n";

for (const auto &Inst : EC.second)

dbgs() << "\tInst: " << *Inst << '\n';

}

});

LLVM_DEBUG({

dbgs() << "LSV: mergeEquivalenceClasses: RedKeyToUOMap:\n";

for (const auto &RedKeyToUO : RedKeyToUOMap) {

dbgs() << " Reduced key: (" << std::get<0>(RedKeyToUO.first) << ", "

<< std::get<1>(RedKeyToUO.first) << ", "

<< static_cast<int>(std::get<2>(RedKeyToUO.first)) << ") --> "

<< RedKeyToUO.second.size() << " underlying objects:\n";

for (auto UObject : RedKeyToUO.second)

dbgs() << " [" << UObject << "]: " << *UObject << '\n';

}

});

arsenm · 2024-11-08T19:24:54Z

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

@@ -324,6 +324,11 @@ class Vectorizer {
      Instruction *ChainElem, Instruction *ChainBegin,
      const DenseMap<Instruction *, APInt /*OffsetFromLeader*/> &ChainOffsets);

+  /// Merges the equivalence classes if they have uderlying objects that differ


Suggested change

/// Merges the equivalence classes if they have uderlying objects that differ

/// Merges the equivalence classes if they have underlying objects that differ

arsenm · 2024-11-08T19:25:36Z

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

+    for (const auto *UObject : UObjects) {
+      const unsigned MaxLookupDepth = 1; // look for 1-level indirections only
+      const auto *UltimateTarget =
+          llvm::getUnderlyingObject(UObject, MaxLookupDepth);


Suggested change

llvm::getUnderlyingObject(UObject, MaxLookupDepth);

getUnderlyingObject(UObject, MaxLookupDepth);

v-klochkov · 2024-11-08T22:17:07Z

Using any fixed lookup depth can result into creation of multiple equivalence classes that only differ by 1-level indirection bases.

The test IR doesn't look canonical / optimal. Other passes are expected to have rewritten the addressing expressions to be a simpler form. If I run your examples through -O3, it mostly vectorizes with some quirks:

have to hackily avoid a memset that forms

Ends up with a scalar store + a 7x vector store

@arsenm : Matt - thank you for the review.
I added an incremental commit 427983a to address all of your comments.

Regarding the case itself. The original workload was way more complex, the LIT only shows the idea of it, it is a huge simplification.
For the 1st case/pattern I saw 7x + 1x vectors (for loads and stores), which ruined performance - 7x vectors had to be lowered/legalized to 4x+2x+1x producing 4 mem-ops + extra reg-to-reg moves and swizzles instead of a single 8x mem-operation.

The 2nd pattern in LIT shows potentially worse situation, giving 8 equivalence classes having 1 scalar mem-operation in each.

michalpaszkowski

Thank you for the review @arsenm, please post any other comments/suggestions or let me know if anything wasn't addressed.

The PR looks good to me. @v-klochkov does not have write access, I can merge the PR early next week (waiting in case there are any other comments).

arsenm · 2024-11-13T23:12:16Z

Regarding the case itself. The original workload was way more complex, the LIT only shows the idea of it, it is a huge simplification. For the 1st case/pattern I saw 7x + 1x vectors (for loads and stores), which ruined performance - 7x vectors had to be lowered/legalized to 4x+2x+1x producing 4 mem-ops + extra reg-to-reg moves and swizzles instead of a single 8x mem-operation.

Can you add a case that's more representative? You can try llvm-reduce to get a sample

v-klochkov · 2024-11-14T00:02:42Z

Regarding the case itself. The original workload was way more complex, the LIT only shows the idea of it, it is a huge simplification. For the 1st case/pattern I saw 7x + 1x vectors (for loads and stores), which ruined performance - 7x vectors had to be lowered/legalized to 4x+2x+1x producing 4 mem-ops + extra reg-to-reg moves and swizzles instead of a single 8x mem-operation.

Can you add a case that's more representative? You can try llvm-reduce to get a sample

It may be challenging due to couple of reasons. There is a legal consideration: using llvm-reduce on non-disclosed code might not make it shareable. and second - the original input code is for a target that is not published yet, manually removing those target specifics details may result into pretty synthetic test again.

The existing LIT test is designed to be easily understandable. Adding more variations into the address-operation (use different offsets in getelementptr, or using slightly different operations - not only getelementptr), keeps the idea of the test the same.

Perhaps, we can both agree that the 6-steps-depth-limited search of the underlying object can easily lead to situations where equivalence classes have underlying objects that are only one step or level apart.

@arsenm - I am open to adding more variations in address operations if you believe it would improve the LIT test. Please let me know your thoughts first.

v-klochkov · 2024-11-23T01:47:48Z

Can you add a case that's more representative? You can try llvm-reduce to get a sample

@arsenm - Matt - Please take a look at 1 extra LIT case here: 7b07f8c

llvm-reduce is a great tool, I played with it; In this case though, it turned out to be easier to copy the interesting STOREs operations and minimally re-build missing IRs around them to show the idea of the context the STOREs in the original workload were placed into.
It is easy to get underlying objects that 1 indirection level apart from each other when the 1st store in a sequence is a store indexed by index and consequent stores indexed by index + some_const, which gives extra depth/level created by getelementptr index + some_const.

v-klochkov · 2024-12-04T04:02:34Z

Can you add a case that's more representative? You can try llvm-reduce to get a sample

@arsenm - Matt - Please take a look at 1 extra LIT case here: 7b07f8c

llvm-reduce is a great tool, I played with it; In this case though, it turned out to be easier to copy the interesting STOREs operations and minimally re-build missing IRs around them to show the idea of the context the STOREs in the original workload were placed into. It is easy to get underlying objects that 1 indirection level apart from each other when the 1st store in a sequence is a store indexed by index and consequent stores indexed by index + some_const, which gives extra depth/level created by getelementptr index + some_const.

@arsenm - Greetings, this is just a reminder/ping message. Thank you!

github-actions · 2024-12-11T21:48:23Z

✅ With the latest revision this PR passed the undef deprecator.

michalpaszkowski

@v-klochkov Thank you for adding the test and addressing the comments! I just ran the code formatter. One more thing, could you please name the unnamed/numbered vregs in the test -- this should help with any changes in the future? Please rebase on top of main branch -- once the builbots pass, I will merge the change.

This patch introduces a new method: void Vectorizer::mergeEquivalenceClasses(EquivalenceClassMap &EQClasses) const The method is called at the end of Vectorizer::collectEquivalenceClasses() and is needed to merge equivalence classes that differ only by their underlying objects (UB1 and UB2), where UB1 is 1-level-indirection underlying base for UB2. This situation arises due to the limited lookup depth used during the search of underlying bases with llvm::getUnderlyingObject(ptr). Using any fixed lookup depth can result into creation of multiple equivalence classes that only differ by 1-level indirection bases. The new approach merges equivalence classes if they have adjucent bases (1-level indirection). If a series of equivalence classes form ladder formed of 1-step/level indirections, they are all merged into a single equivalence class. This provides more opportunities for the load-store vectorizer to generate better vectors. Signed-off-by: Klochkov, Vyacheslav N <[email protected]>

It is built from the original workload by copying the STORE instructions to be optimized, and manually bulding missing IRs around them minimally reproducing the context of the original STOREs in the full test.

v-klochkov · 2024-12-12T01:48:14Z

I just ran the code formatter. One more thing, could you please name the unnamed/numbered vregs in the test -- this should help with any changes in the future? Please rebase on top of main branch -- once the builbots pass, I will merge the change.

@michalpaszkowski, thank you! I re-based the patch, removed unnamed/numbered vregs, removed undef: 4bff749

michalpaszkowski

Thank you @v-klochkov! LGTM!

github-actions · 2024-12-12T03:01:56Z

@v-klochkov Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

michalpaszkowski · 2024-12-12T04:35:21Z

@v-klochkov Reverting the change due to a failure. This may require some investigation. Here is the reverting pull request.

…es" (#119657) Reverts #114501, due to the following failure: https://lab.llvm.org/buildbot/#/builders/55/builds/4171

…#114501) This patch introduces a new method: void Vectorizer::mergeEquivalenceClasses(EquivalenceClassMap &EQClasses) const The method is called at the end of Vectorizer::collectEquivalenceClasses() and is needed to merge equivalence classes that differ only by their underlying objects (UO1 and UO2), where UO1 is 1-level-indirection underlying base for UO2. This situation arises due to the limited lookup depth used during the search of underlying bases with llvm::getUnderlyingObject(ptr). Using any fixed lookup depth can result into creation of multiple equivalence classes that only differ by 1-level indirection bases. The new approach merges equivalence classes if they have adjacent bases (1-level indirection). If a series of equivalence classes form ladder formed of 1-step/level indirections, they are all merged into a single equivalence class. This provides more opportunities for the load-store vectorizer to generate better vectors. --------- Signed-off-by: Klochkov, Vyacheslav N <[email protected]>

v-klochkov · 2025-01-07T02:47:57Z

@v-klochkov Reverting the change due to a failure. This may require some investigation. Here is the reverting pull request.

@michalpaszkowski - Thank you for the quick revert of that patch.
I cherry-picked this patch and added extra fix for the memory error: #121861
Please see: #121861 (comment)

The error is reproducible only with memory sanitizer.
Is there a way to run one of post-commit CI bots before the commit there?

michalpaszkowski added enhancement Improving things as opposed to bug fixing, e.g. new or missing feature llvm:transforms vectorization labels Nov 1, 2024

michalpaszkowski marked this pull request as ready for review November 1, 2024 06:02

llvmbot added the vectorizers label Nov 1, 2024

michalpaszkowski requested review from jlebar and valerydmit November 1, 2024 06:03

arsenm reviewed Nov 1, 2024

View reviewed changes

v-klochkov force-pushed the load_store_vectorizer_merge_eq_classes branch from 29397d9 to 91a74c8 Compare November 1, 2024 16:34

v-klochkov requested a review from arsenm November 5, 2024 20:06

arsenm reviewed Nov 8, 2024

View reviewed changes

michalpaszkowski approved these changes Nov 13, 2024

View reviewed changes

michalpaszkowski requested changes Dec 11, 2024

View reviewed changes

v-klochkov added 5 commits December 11, 2024 17:23

[NFC] Address review comments: fix misprint + use C++17 struct binding

791a265

[NFC] Minor fixes to address the 2nd round of review comments

2325b52

Add one more LIT test case

3cc04a2

It is built from the original workload by copying the STORE instructions to be optimized, and manually bulding missing IRs around them minimally reproducing the context of the original STOREs in the full test.

Fix LIT: Rename unnamed/numbered vregs in LIT; remove usage of undef

4bff749

v-klochkov force-pushed the load_store_vectorizer_merge_eq_classes branch from 7b07f8c to 4bff749 Compare December 12, 2024 01:46

michalpaszkowski approved these changes Dec 12, 2024

View reviewed changes

michalpaszkowski merged commit fd2f8d4 into llvm:main Dec 12, 2024
8 checks passed

michalpaszkowski mentioned this pull request Dec 12, 2024

Revert "[LoadStoreVectorizer] Postprocess and merge equivalence classes" #119657

Merged

michalpaszkowski added a commit that referenced this pull request Dec 12, 2024

Revert "[LoadStoreVectorizer] Postprocess and merge equivalence class…

04313b8

…es" (#119657) Reverts #114501, due to the following failure: https://lab.llvm.org/buildbot/#/builders/55/builds/4171

v-klochkov deleted the load_store_vectorizer_merge_eq_classes branch January 6, 2025 21:59

v-klochkov mentioned this pull request Jan 7, 2025

[LoadStoreVectorizer] Postprocess and merge equivalence classes #121861

Merged

	; workload; the offsets in getelementptr instructins are similar or same for
	; workload; the offsets in getelementptr instructions are similar or same for

		; RUN: opt %s -mtriple=x86_64-unknown-linux-gnu -passes=load-store-vectorizer -mcpu=skx -S -o %t.out.ll
		; RUN: FileCheck -input-file=%t.out.ll %s

		auto VecFrom = EQClasses[KeyFrom];
		auto VecTo = EQClasses[KeyTo];

	/// Merges the equivalence classes if they have uderlying objects that differ
	/// Merges the equivalence classes if they have underlying objects that differ

	llvm::getUnderlyingObject(UObject, MaxLookupDepth);
	getUnderlyingObject(UObject, MaxLookupDepth);

[LoadStoreVectorizer] Postprocess and merge equivalence classes #114501

[LoadStoreVectorizer] Postprocess and merge equivalence classes #114501

Uh oh!

Conversation

v-klochkov commented Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 1, 2024

Uh oh!

llvmbot commented Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

v-klochkov commented Nov 8, 2024

Uh oh!

arsenm commented Nov 8, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

v-klochkov Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

v-klochkov commented Nov 8, 2024

Uh oh!

michalpaszkowski left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm commented Nov 13, 2024

Uh oh!

v-klochkov commented Nov 14, 2024

Uh oh!

v-klochkov commented Nov 23, 2024

Uh oh!

v-klochkov commented Dec 4, 2024

Uh oh!

github-actions bot commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michalpaszkowski left a comment

Choose a reason for hiding this comment

Uh oh!

v-klochkov commented Dec 12, 2024

Uh oh!

michalpaszkowski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Dec 12, 2024

v-klochkov commented Nov 1, 2024 •

edited

Loading

llvmbot commented Nov 1, 2024 •

edited

Loading

v-klochkov Nov 8, 2024 •

edited

Loading

github-actions bot commented Dec 11, 2024 •

edited

Loading