-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[BPF] Handle certain mem intrinsic functions with addr-space arguments #160025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
0e4f47c
to
3afb1d3
Compare
@@ -478,9 +479,57 @@ static void aspaceWrapOperand(DenseMap<Value *, Value *> &Cache, Instruction *I, | |||
} | |||
} | |||
|
|||
static Instruction *aspaceMemIntrinsic(DenseMap<Value *, Value *> &Cache, | |||
Instruction *I, bool IsSet, bool IsCpy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set|cpy|mov expressed oddly with these bool args. 3 separate helpers will look cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will do.
3afb1d3
to
669cd1d
Compare
Intrinsic::ID ID = Callee->getIntrinsicID(); | ||
bool IsSet = ID == Intrinsic::memset; | ||
bool IsCpy = ID == Intrinsic::memcpy; | ||
bool IsMove = ID == Intrinsic::memmove; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked if there are some other intrinsics we need to care about and found these:
Intrinsic::memcpy_inline
, available as a builtin function (link)Intrinsic::memset_inline
, sameIntrinsic::memcpy_element_unordered_atomic
,Intrinsic::memmove_element_unordered_atomic
,Intrinsic::memset_element_unordered_atomic
-- see the code to handle these, but don't see any code that introduces them.Intrinsic::experimental_memset_pattern
--LoopIdiomRecognize::processLoopStridedStore
can introduce these.- there are also a some vector related intrinsics, but I assume these are irrelevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for list the intrinsic's in the above. I missed __builtin_memcpy_inline and __builtin_memset_inline which is very similar to __builtin_mem{cpy,set} but the __inline version requires the 'size' argument to be constant. In current bpf progs, we all use __builtin_mem{set,cpy}() with constant size, so it essentially equivalent to __builtin_mem{set,cpy}_line(). It will be trivial to add both to the pull request.
I think we can ignore mem{cpy,move,set}_element_unordered_atomic. I am aware of this set of intrinsics. The operand of these memory operations need to be atomic and so for our addr-space arguments, we can ignore them.
For Intrinsic:experimental_memset_pattern, it tries to convert a loop like
for (unsigned i = 0; i < 2 * n; i += 2) {
f[i] = 2;
f[i+1] = 2;
}
to the following intrinsic
// Memset variant that writes a given pattern.
def int_experimental_memset_pattern
: Intrinsic<[],
[llvm_anyptr_ty, // Destination.
llvm_any_ty, // Pattern value.
llvm_anyint_ty, // Count (number of times to fill value).
llvm_i1_ty], // IsVolatile.
[IntrWriteMem, IntrArgMemOnly, IntrWillReturn, IntrNoFree, IntrNoCallback,
NoCapture<ArgIndex<0>>, WriteOnly<ArgIndex<0>>,
ImmArg<ArgIndex<3>>]>;
This should be rare. But for completeness, I think I can add this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked ChatGPT for the following question?
In llvm, what kind of C code can generate the following intrinsic: Intrinsic::experimental_memset_pattern ?
The following is the answer:
====================
Short version: call Apple’s memset_pattern{4,8,16} from C on a Darwin target (macOS/iOS). Clang recognizes these and lowers them to the LLVM IR intrinsic llvm.experimental.memset.pattern (which can then be expanded efficiently).
Minimal example (macOS / iOS targets)
// clang -O2 -target x86_64-apple-macos14 -S -emit-llvm ex.c -o ex.ll
#include <string.h>
void fill16(void *dst, size_t n) {
unsigned char pat[16] =
{0,1,2,3,4,5,6,7, 8,9,10,11,12,13,14,15};
memset_pattern16(dst, pat, n);
}
void fill8(void *dst, size_t n) {
unsigned char pat[8] = {1,2,3,4,5,6,7,8};
memset_pattern8(dst, pat, n);
}
void fill4(void *dst, size_t n) {
unsigned char pat[4] = {0xAA,0xBB,0xCC,0xDD};
memset_pattern4(dst, pat, n);
}
In the emitted IR you’ll see calls like:
call void @llvm.experimental.memset.pattern.p0.i64(
ptr %dst, ptr %pat, i64 %n, i1 false)
Notes
These memset_pattern{4,8,16} functions are Apple libc extensions. On non-Darwin targets, Clang won’t lower them to the intrinsic—either you’ll get a normal library call or a loop.
There isn’t a portable C standard function that maps to llvm.experimental.memset.pattern.
The intrinsic allows repeating multi-byte patterns (4/8/16). Plain llvm.memset only repeats a single byte.
If you’re not on macOS/iOS but still want the intrinsic (for experimentation), compile with a Darwin target triple as shown above.
====================
I actually tried to compile with the above example. It compiled successfully with the following compiler:
Apple clang version 17.0.0 (clang-1700.0.13.5)
Target: arm64-apple-darwin24.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
But it will fail to compile with linux and x86 target.
$ clang -O2 -S -emit-llvm ex.c -o ex.ll
ex.c:7:5: error: call to undeclared function 'memset_pattern16'; ISO C99 and later do not support
implicit function declarations [-Wimplicit-function-declaration]
7 | memset_pattern16(dst, pat, n);
| ^
ex.c:12:5: error: call to undeclared function 'memset_pattern8'; ISO C99 and later do not support
implicit function declarations [-Wimplicit-function-declaration]
12 | memset_pattern8(dst, pat, n);
| ^
ex.c:17:5: error: call to undeclared function 'memset_pattern4'; ISO C99 and later do not support
implicit function declarations [-Wimplicit-function-declaration]
17 | memset_pattern4(dst, pat, n);
| ^
3 errors generated.
Unfortunately, the compiler of Apple on my Mac is too old to generate llvm.experimental.memset.pattern. I suspect the latest clang (with Apple target) should generate llvm.experimental.memset.pattern. The following is the related code in LoopIDiomRecognize.cpp:
if (SplatValue) {
NewCall = Builder.CreateMemSet(BasePtr, SplatValue, MemsetArg,
MaybeAlign(StoreAlignment),
/*isVolatile=*/false, AATags);
} else if (ForceMemsetPatternIntrinsic ||
isLibFuncEmittable(M, TLI, LibFunc_memset_pattern16)) {
assert(isa<SCEVConstant>(StoreSizeSCEV) && "Expected constant store size");
NewCall = Builder.CreateIntrinsic(
Intrinsic::experimental_memset_pattern,
{DestInt8PtrTy, PatternValue->getType(), IntIdxTy},
{BasePtr, PatternValue, MemsetArg,
ConstantInt::getFalse(M->getContext())});
if (StoreAlignment)
cast<MemSetPatternInst>(NewCall)->setDestAlignment(*StoreAlignment);
NewCall->setAAMetadata(AATags);
} else {
// Neither a memset, nor memset_pattern16
return Changed;
}
ForceMemsetPatternIntrinsic is an internal flag.
static cl::opt<bool> ForceMemsetPatternIntrinsic(
"loop-idiom-force-memset-pattern-intrinsic",
cl::desc("Use memset.pattern intrinsic whenever possible"), cl::init(false),
cl::Hidden);
So memset_pattern16 function is needed to generate Intrinsic::experimental_memset_pattern() and memset_pattern16 is only available for Apple target.
So I will skip experimental_memset_pattern for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The need to create a new instance of the intrinsic because of the address space change is inconvenient indeed. Otherwise something as simple as this would have sufficed:
switch (Callee->getIntrinsicID()) {
case Intrinsic::memset:
aspaceWrapOperand(&Cache: CastsCache, I: &I, OpNum: 0);
...
}
Aside from the nit regarding address space check position, the change looks good to me.
669cd1d
to
db38076
Compare
In linux kernel commit [1], we have a bpf selftest failure caused by llvm. In this particular case, the BPFCheckAndAdjustIR pass has a function insertASpaceCasts() which inserts proper addrspacecast insn at proper IR places. It does not handle __builtin_memset() and hance caused selftest failure. Add support in insertASpaceCasts() to handle __builtin_(memset,memcpy,memmove,memset_inline,memcpy_inline}() properly and this can fix the issue in [1] as well. [1] https://lore.kernel.org/all/[email protected]/
db38076
to
359b645
Compare
Latest change (359b645) looks good to me. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/134/builds/27660 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/154/builds/22468 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/94/builds/11554 Here is the relevant piece of the build log for the reference
|
In linux kernel commit [1], we have a bpf selftest failure caused by llvm. In this particular case, the BPFCheckAndAdjustIR pass has a function
insertASpaceCasts()
which inserts proper addrspacecast insn at proper IR places. It does not handle__builtin_memset()
and hance caused selftest failure.Add support in
insertASpaceCasts()
to handle__builtin_(memset,memcpy,memmove,memset_inline,memcpy_inline}()
properly and this can fix the issue in [1] as well.[1] https://lore.kernel.org/all/[email protected]/