Skip to content

Commit 1351d17

Browse files
[InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (llvm#66825)
(The profile format change is split into a standalone change into llvm#81691) * For InstrFDO value profiling, implement instrumentation and lowering for virtual table address. * This is controlled by `-enable-vtable-value-profiling` and off by default. * When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads. * Implement profile reader and writer support * Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols. * Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't happen since IR is used to construct InstrProfSymtab. * Indexed profile writer collects the list of vtable names, and stores that to index profiles. * Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type. * `llvm-profdata show -show-vtables <args> <profile>` is implemented. rfc in https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7
1 parent 971b852 commit 1351d17

File tree

17 files changed

+1419
-192
lines changed

17 files changed

+1419
-192
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
// REQUIRES: lld-available
2+
3+
// RUN: %clangxx_pgogen -fuse-ld=lld -O2 -g -fprofile-generate=. -mllvm -enable-vtable-value-profiling %s -o %t-test
4+
// RUN: env LLVM_PROFILE_FILE=%t-test.profraw %t-test
5+
6+
// Show vtable profiles from raw profile.
7+
// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-test.profraw | FileCheck %s --check-prefixes=COMMON,RAW
8+
9+
// Generate indexed profile from raw profile and show the data.
10+
// RUN: llvm-profdata merge %t-test.profraw -o %t-test.profdata
11+
// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-test.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED
12+
13+
// Generate text profile from raw and indexed profiles respectively and show the data.
14+
// RUN: llvm-profdata merge --text %t-test.profraw -o %t-raw.proftext
15+
// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text %t-raw.proftext | FileCheck %s --check-prefix=ICTEXT
16+
// RUN: llvm-profdata merge --text %t-test.profdata -o %t-indexed.proftext
17+
// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text %t-indexed.proftext | FileCheck %s --check-prefix=ICTEXT
18+
19+
// Generate indexed profile from text profiles and show the data
20+
// RUN: llvm-profdata merge --binary %t-raw.proftext -o %t-text.profraw
21+
// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-text.profraw | FileCheck %s --check-prefixes=COMMON,INDEXED
22+
// RUN: llvm-profdata merge --binary %t-indexed.proftext -o %t-text.profdata
23+
// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-text.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED
24+
25+
// COMMON: Counters:
26+
// COMMON-NEXT: main:
27+
// COMMON-NEXT: Hash: 0x0f9a16fe6d398548
28+
// COMMON-NEXT: Counters: 2
29+
// COMMON-NEXT: Indirect Call Site Count: 2
30+
// COMMON-NEXT: Number of instrumented vtables: 2
31+
// RAW: Indirect Target Results:
32+
// RAW-NEXT: [ 0, _ZN8Derived15func1Eii, 250 ] (25.00%)
33+
// RAW-NEXT: [ 0, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii, 750 ] (75.00%)
34+
// RAW-NEXT: [ 1, _ZN8Derived15func2Eii, 250 ] (25.00%)
35+
// RAW-NEXT: [ 1, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii, 750 ] (75.00%)
36+
// RAW-NEXT: VTable Results:
37+
// RAW-NEXT: [ 0, _ZTV8Derived1, 250 ] (25.00%)
38+
// RAW-NEXT: [ 0, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E, 750 ] (75.00%)
39+
// RAW-NEXT: [ 1, _ZTV8Derived1, 250 ] (25.00%)
40+
// RAW-NEXT: [ 1, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E, 750 ] (75.00%)
41+
// INDEXED: Indirect Target Results:
42+
// INDEXED-NEXT: [ 0, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii, 750 ] (75.00%)
43+
// INDEXED-NEXT: [ 0, _ZN8Derived15func1Eii, 250 ] (25.00%)
44+
// INDEXED-NEXT: [ 1, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii, 750 ] (75.00%)
45+
// INDEXED-NEXT: [ 1, _ZN8Derived15func2Eii, 250 ] (25.00%)
46+
// INDEXED-NEXT: VTable Results:
47+
// INDEXED-NEXT: [ 0, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E, 750 ] (75.00%)
48+
// INDEXED-NEXT: [ 0, _ZTV8Derived1, 250 ] (25.00%)
49+
// INDEXED-NEXT: [ 1, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E, 750 ] (75.00%)
50+
// INDEXED-NEXT: [ 1, _ZTV8Derived1, 250 ] (25.00%)
51+
// COMMON: Instrumentation level: IR entry_first = 0
52+
// COMMON-NEXT: Functions shown: 1
53+
// COMMON-NEXT: Total functions: 6
54+
// COMMON-NEXT: Maximum function count: 1000
55+
// COMMON-NEXT: Maximum internal block count: 250
56+
// COMMON-NEXT: Statistics for indirect call sites profile:
57+
// COMMON-NEXT: Total number of sites: 2
58+
// COMMON-NEXT: Total number of sites with values: 2
59+
// COMMON-NEXT: Total number of profiled values: 4
60+
// COMMON-NEXT: Value sites histogram:
61+
// COMMON-NEXT: NumTargets, SiteCount
62+
// COMMON-NEXT: 2, 2
63+
// COMMON-NEXT: Statistics for vtable profile:
64+
// COMMON-NEXT: Total number of sites: 2
65+
// COMMON-NEXT: Total number of sites with values: 2
66+
// COMMON-NEXT: Total number of profiled values: 4
67+
// COMMON-NEXT: Value sites histogram:
68+
// COMMON-NEXT: NumTargets, SiteCount
69+
// COMMON-NEXT: 2, 2
70+
71+
// ICTEXT: :ir
72+
// ICTEXT: main
73+
// ICTEXT: # Func Hash:
74+
// ICTEXT: 1124236338992350536
75+
// ICTEXT: # Num Counters:
76+
// ICTEXT: 2
77+
// ICTEXT: # Counter Values:
78+
// ICTEXT: 1000
79+
// ICTEXT: 1
80+
// ICTEXT: # Num Value Kinds:
81+
// ICTEXT: 2
82+
// ICTEXT: # ValueKind = IPVK_IndirectCallTarget:
83+
// ICTEXT: 0
84+
// ICTEXT: # NumValueSites:
85+
// ICTEXT: 2
86+
// ICTEXT: 2
87+
// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii:750
88+
// ICTEXT: _ZN8Derived15func1Eii:250
89+
// ICTEXT: 2
90+
// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii:750
91+
// ICTEXT: _ZN8Derived15func2Eii:250
92+
// ICTEXT: # ValueKind = IPVK_VTableTarget:
93+
// ICTEXT: 2
94+
// ICTEXT: # NumValueSites:
95+
// ICTEXT: 2
96+
// ICTEXT: 2
97+
// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750
98+
// ICTEXT: _ZTV8Derived1:250
99+
// ICTEXT: 2
100+
// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750
101+
// ICTEXT: _ZTV8Derived1:250
102+
103+
#include <cstdio>
104+
#include <cstdlib>
105+
class Base {
106+
public:
107+
virtual int func1(int a, int b) = 0;
108+
virtual int func2(int a, int b) = 0;
109+
};
110+
class Derived1 : public Base {
111+
public:
112+
int func1(int a, int b) override { return a + b; }
113+
114+
int func2(int a, int b) override { return a * b; }
115+
};
116+
namespace {
117+
class Derived2 : public Base {
118+
public:
119+
int func1(int a, int b) override { return a - b; }
120+
121+
int func2(int a, int b) override { return a * (a - b); }
122+
};
123+
} // namespace
124+
__attribute__((noinline)) Base *createType(int a) {
125+
Base *base = nullptr;
126+
if (a % 4 == 0)
127+
base = new Derived1();
128+
else
129+
base = new Derived2();
130+
return base;
131+
}
132+
int main(int argc, char **argv) {
133+
int sum = 0;
134+
for (int i = 0; i < 1000; i++) {
135+
int a = rand();
136+
int b = rand();
137+
Base *ptr = createType(i);
138+
sum += ptr->func1(a, b) + ptr->func2(b, a);
139+
}
140+
printf("sum is %d\n", sum);
141+
return 0;
142+
}

llvm/include/llvm/Analysis/IndirectCallVisitor.h

+57-5
Original file line numberDiff line numberDiff line change
@@ -16,23 +16,75 @@
1616
#include <vector>
1717

1818
namespace llvm {
19-
// Visitor class that finds all indirect call.
19+
// Visitor class that finds indirect calls or instructions that gives vtable
20+
// value, depending on Type.
2021
struct PGOIndirectCallVisitor : public InstVisitor<PGOIndirectCallVisitor> {
22+
enum class InstructionType {
23+
kIndirectCall = 0,
24+
kVTableVal = 1,
25+
};
2126
std::vector<CallBase *> IndirectCalls;
22-
PGOIndirectCallVisitor() = default;
27+
std::vector<Instruction *> ProfiledAddresses;
28+
PGOIndirectCallVisitor(InstructionType Type) : Type(Type) {}
2329

2430
void visitCallBase(CallBase &Call) {
25-
if (Call.isIndirectCall())
31+
if (!Call.isIndirectCall())
32+
return;
33+
34+
if (Type == InstructionType::kIndirectCall) {
2635
IndirectCalls.push_back(&Call);
36+
return;
37+
}
38+
39+
assert(Type == InstructionType::kVTableVal && "Control flow guaranteed");
40+
41+
LoadInst *LI = dyn_cast<LoadInst>(Call.getCalledOperand());
42+
// The code pattern to look for
43+
//
44+
// %vtable = load ptr, ptr %b
45+
// %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
46+
// %2 = load ptr, ptr %vfn
47+
// %call = tail call i32 %2(ptr %b)
48+
//
49+
// %vtable is the vtable address value to profile, and
50+
// %2 is the indirect call target address to profile.
51+
if (LI != nullptr) {
52+
Value *Ptr = LI->getPointerOperand();
53+
Value *VTablePtr = Ptr->stripInBoundsConstantOffsets();
54+
// This is a heuristic to find address feeding instructions.
55+
// FIXME: Add support in the frontend so LLVM type intrinsics are
56+
// emitted without LTO. This way, added intrinsics could filter
57+
// non-vtable instructions and reduce instrumentation overhead.
58+
// Since a non-vtable profiled address is not within the address
59+
// range of vtable objects, it's stored as zero in indexed profiles.
60+
// A pass that looks up symbol with an zero hash will (almost) always
61+
// find nullptr and skip the actual transformation (e.g., comparison
62+
// of symbols). So the performance overhead from non-vtable profiled
63+
// address is negligible if exists at all. Comparing loaded address
64+
// with symbol address guarantees correctness.
65+
if (VTablePtr != nullptr && isa<Instruction>(VTablePtr))
66+
ProfiledAddresses.push_back(cast<Instruction>(VTablePtr));
67+
}
2768
}
69+
70+
private:
71+
InstructionType Type;
2872
};
2973

30-
// Helper function that finds all indirect call sites.
3174
inline std::vector<CallBase *> findIndirectCalls(Function &F) {
32-
PGOIndirectCallVisitor ICV;
75+
PGOIndirectCallVisitor ICV(
76+
PGOIndirectCallVisitor::InstructionType::kIndirectCall);
3377
ICV.visit(F);
3478
return ICV.IndirectCalls;
3579
}
80+
81+
inline std::vector<Instruction *> findVTableAddrs(Function &F) {
82+
PGOIndirectCallVisitor ICV(
83+
PGOIndirectCallVisitor::InstructionType::kVTableVal);
84+
ICV.visit(F);
85+
return ICV.ProfiledAddresses;
86+
}
87+
3688
} // namespace llvm
3789

3890
#endif

0 commit comments

Comments
 (0)