Skip to content

Commit

Permalink
[InstrFDO][TypeProf] Implement binary instrumentation and profile rea…
Browse files Browse the repository at this point in the history
…d/write (llvm#66825)

(The profile format change is split into a standalone change into llvm#81691)

* For InstrFDO value profiling, implement instrumentation and lowering for virtual table address.
* This is controlled by `-enable-vtable-value-profiling` and off by default.
* When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads.
 
* Implement profile reader and writer support 
  * Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols.
  * Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't
happen since IR is used to construct InstrProfSymtab.
  * Indexed profile writer collects the list of vtable names, and stores that to index profiles.
  * Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type.
* `llvm-profdata show -show-vtables <args> <profile>` is implemented.

rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7
  • Loading branch information
mingmingl-llvm authored Apr 1, 2024
1 parent 971b852 commit 1351d17
Show file tree
Hide file tree
Showing 17 changed files with 1,419 additions and 192 deletions.
142 changes: 142 additions & 0 deletions compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
// REQUIRES: lld-available

// RUN: %clangxx_pgogen -fuse-ld=lld -O2 -g -fprofile-generate=. -mllvm -enable-vtable-value-profiling %s -o %t-test
// RUN: env LLVM_PROFILE_FILE=%t-test.profraw %t-test

// Show vtable profiles from raw profile.
// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-test.profraw | FileCheck %s --check-prefixes=COMMON,RAW

// Generate indexed profile from raw profile and show the data.
// RUN: llvm-profdata merge %t-test.profraw -o %t-test.profdata
// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-test.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED

// Generate text profile from raw and indexed profiles respectively and show the data.
// RUN: llvm-profdata merge --text %t-test.profraw -o %t-raw.proftext
// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text %t-raw.proftext | FileCheck %s --check-prefix=ICTEXT
// RUN: llvm-profdata merge --text %t-test.profdata -o %t-indexed.proftext
// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text %t-indexed.proftext | FileCheck %s --check-prefix=ICTEXT

// Generate indexed profile from text profiles and show the data
// RUN: llvm-profdata merge --binary %t-raw.proftext -o %t-text.profraw
// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-text.profraw | FileCheck %s --check-prefixes=COMMON,INDEXED
// RUN: llvm-profdata merge --binary %t-indexed.proftext -o %t-text.profdata
// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-text.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED

// COMMON: Counters:
// COMMON-NEXT: main:
// COMMON-NEXT: Hash: 0x0f9a16fe6d398548
// COMMON-NEXT: Counters: 2
// COMMON-NEXT: Indirect Call Site Count: 2
// COMMON-NEXT: Number of instrumented vtables: 2
// RAW: Indirect Target Results:
// RAW-NEXT: [ 0, _ZN8Derived15func1Eii, 250 ] (25.00%)
// RAW-NEXT: [ 0, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii, 750 ] (75.00%)
// RAW-NEXT: [ 1, _ZN8Derived15func2Eii, 250 ] (25.00%)
// RAW-NEXT: [ 1, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii, 750 ] (75.00%)
// RAW-NEXT: VTable Results:
// RAW-NEXT: [ 0, _ZTV8Derived1, 250 ] (25.00%)
// RAW-NEXT: [ 0, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E, 750 ] (75.00%)
// RAW-NEXT: [ 1, _ZTV8Derived1, 250 ] (25.00%)
// RAW-NEXT: [ 1, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E, 750 ] (75.00%)
// INDEXED: Indirect Target Results:
// INDEXED-NEXT: [ 0, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii, 750 ] (75.00%)
// INDEXED-NEXT: [ 0, _ZN8Derived15func1Eii, 250 ] (25.00%)
// INDEXED-NEXT: [ 1, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii, 750 ] (75.00%)
// INDEXED-NEXT: [ 1, _ZN8Derived15func2Eii, 250 ] (25.00%)
// INDEXED-NEXT: VTable Results:
// INDEXED-NEXT: [ 0, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E, 750 ] (75.00%)
// INDEXED-NEXT: [ 0, _ZTV8Derived1, 250 ] (25.00%)
// INDEXED-NEXT: [ 1, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E, 750 ] (75.00%)
// INDEXED-NEXT: [ 1, _ZTV8Derived1, 250 ] (25.00%)
// COMMON: Instrumentation level: IR entry_first = 0
// COMMON-NEXT: Functions shown: 1
// COMMON-NEXT: Total functions: 6
// COMMON-NEXT: Maximum function count: 1000
// COMMON-NEXT: Maximum internal block count: 250
// COMMON-NEXT: Statistics for indirect call sites profile:
// COMMON-NEXT: Total number of sites: 2
// COMMON-NEXT: Total number of sites with values: 2
// COMMON-NEXT: Total number of profiled values: 4
// COMMON-NEXT: Value sites histogram:
// COMMON-NEXT: NumTargets, SiteCount
// COMMON-NEXT: 2, 2
// COMMON-NEXT: Statistics for vtable profile:
// COMMON-NEXT: Total number of sites: 2
// COMMON-NEXT: Total number of sites with values: 2
// COMMON-NEXT: Total number of profiled values: 4
// COMMON-NEXT: Value sites histogram:
// COMMON-NEXT: NumTargets, SiteCount
// COMMON-NEXT: 2, 2

// ICTEXT: :ir
// ICTEXT: main
// ICTEXT: # Func Hash:
// ICTEXT: 1124236338992350536
// ICTEXT: # Num Counters:
// ICTEXT: 2
// ICTEXT: # Counter Values:
// ICTEXT: 1000
// ICTEXT: 1
// ICTEXT: # Num Value Kinds:
// ICTEXT: 2
// ICTEXT: # ValueKind = IPVK_IndirectCallTarget:
// ICTEXT: 0
// ICTEXT: # NumValueSites:
// ICTEXT: 2
// ICTEXT: 2
// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii:750
// ICTEXT: _ZN8Derived15func1Eii:250
// ICTEXT: 2
// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii:750
// ICTEXT: _ZN8Derived15func2Eii:250
// ICTEXT: # ValueKind = IPVK_VTableTarget:
// ICTEXT: 2
// ICTEXT: # NumValueSites:
// ICTEXT: 2
// ICTEXT: 2
// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750
// ICTEXT: _ZTV8Derived1:250
// ICTEXT: 2
// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750
// ICTEXT: _ZTV8Derived1:250

#include <cstdio>
#include <cstdlib>
class Base {
public:
virtual int func1(int a, int b) = 0;
virtual int func2(int a, int b) = 0;
};
class Derived1 : public Base {
public:
int func1(int a, int b) override { return a + b; }

int func2(int a, int b) override { return a * b; }
};
namespace {
class Derived2 : public Base {
public:
int func1(int a, int b) override { return a - b; }

int func2(int a, int b) override { return a * (a - b); }
};
} // namespace
__attribute__((noinline)) Base *createType(int a) {
Base *base = nullptr;
if (a % 4 == 0)
base = new Derived1();
else
base = new Derived2();
return base;
}
int main(int argc, char **argv) {
int sum = 0;
for (int i = 0; i < 1000; i++) {
int a = rand();
int b = rand();
Base *ptr = createType(i);
sum += ptr->func1(a, b) + ptr->func2(b, a);
}
printf("sum is %d\n", sum);
return 0;
}
62 changes: 57 additions & 5 deletions llvm/include/llvm/Analysis/IndirectCallVisitor.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,23 +16,75 @@
#include <vector>

namespace llvm {
// Visitor class that finds all indirect call.
// Visitor class that finds indirect calls or instructions that gives vtable
// value, depending on Type.
struct PGOIndirectCallVisitor : public InstVisitor<PGOIndirectCallVisitor> {
enum class InstructionType {
kIndirectCall = 0,
kVTableVal = 1,
};
std::vector<CallBase *> IndirectCalls;
PGOIndirectCallVisitor() = default;
std::vector<Instruction *> ProfiledAddresses;
PGOIndirectCallVisitor(InstructionType Type) : Type(Type) {}

void visitCallBase(CallBase &Call) {
if (Call.isIndirectCall())
if (!Call.isIndirectCall())
return;

if (Type == InstructionType::kIndirectCall) {
IndirectCalls.push_back(&Call);
return;
}

assert(Type == InstructionType::kVTableVal && "Control flow guaranteed");

LoadInst *LI = dyn_cast<LoadInst>(Call.getCalledOperand());
// The code pattern to look for
//
// %vtable = load ptr, ptr %b
// %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
// %2 = load ptr, ptr %vfn
// %call = tail call i32 %2(ptr %b)
//
// %vtable is the vtable address value to profile, and
// %2 is the indirect call target address to profile.
if (LI != nullptr) {
Value *Ptr = LI->getPointerOperand();
Value *VTablePtr = Ptr->stripInBoundsConstantOffsets();
// This is a heuristic to find address feeding instructions.
// FIXME: Add support in the frontend so LLVM type intrinsics are
// emitted without LTO. This way, added intrinsics could filter
// non-vtable instructions and reduce instrumentation overhead.
// Since a non-vtable profiled address is not within the address
// range of vtable objects, it's stored as zero in indexed profiles.
// A pass that looks up symbol with an zero hash will (almost) always
// find nullptr and skip the actual transformation (e.g., comparison
// of symbols). So the performance overhead from non-vtable profiled
// address is negligible if exists at all. Comparing loaded address
// with symbol address guarantees correctness.
if (VTablePtr != nullptr && isa<Instruction>(VTablePtr))
ProfiledAddresses.push_back(cast<Instruction>(VTablePtr));
}
}

private:
InstructionType Type;
};

// Helper function that finds all indirect call sites.
inline std::vector<CallBase *> findIndirectCalls(Function &F) {
PGOIndirectCallVisitor ICV;
PGOIndirectCallVisitor ICV(
PGOIndirectCallVisitor::InstructionType::kIndirectCall);
ICV.visit(F);
return ICV.IndirectCalls;
}

inline std::vector<Instruction *> findVTableAddrs(Function &F) {
PGOIndirectCallVisitor ICV(
PGOIndirectCallVisitor::InstructionType::kVTableVal);
ICV.visit(F);
return ICV.ProfiledAddresses;
}

} // namespace llvm

#endif
Loading

0 comments on commit 1351d17

Please sign in to comment.