-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[BPF] Handle certain mem intrinsic functions with addr-space arguments #160025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+241
−6
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6 | ||
; RUN: opt --bpf-check-and-opt-ir -S -mtriple=bpf-pc-linux < %s | FileCheck %s | ||
|
||
@page1 = dso_local local_unnamed_addr addrspace(1) global [10 x ptr] zeroinitializer, align 8 | ||
@page2 = dso_local local_unnamed_addr addrspace(1) global [10 x ptr] zeroinitializer, align 8 | ||
|
||
define dso_local void @test_memset() local_unnamed_addr { | ||
; CHECK-LABEL: define dso_local void @test_memset() local_unnamed_addr { | ||
; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 8 addrspacecast (ptr addrspace(1) getelementptr inbounds nuw (i8, ptr addrspace(1) @page1, i64 16) to ptr), i8 0, i64 16, i1 false) | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.memset.p1.i64(ptr addrspace(1) noundef nonnull align 8 dereferenceable(16) getelementptr inbounds nuw (i8, ptr addrspace(1) @page1, i64 16), i8 0, i64 16, i1 false) | ||
ret void | ||
} | ||
|
||
declare void @llvm.memset.p1.i64(ptr addrspace(1) writeonly captures(none), i8, i64, i1 immarg) | ||
|
||
define dso_local void @test_memcpy() local_unnamed_addr { | ||
; CHECK-LABEL: define dso_local void @test_memcpy() local_unnamed_addr { | ||
; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 8 addrspacecast (ptr addrspace(1) getelementptr inbounds nuw (i8, ptr addrspace(1) @page2, i64 8) to ptr), ptr align 8 addrspacecast (ptr addrspace(1) getelementptr inbounds nuw (i8, ptr addrspace(1) @page1, i64 8) to ptr), i64 16, i1 false) | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.memcpy.p1.p1.i64(ptr addrspace(1) noundef nonnull align 8 dereferenceable(16) getelementptr inbounds nuw (i8, ptr addrspace(1) @page2, i64 8), ptr addrspace(1) noundef nonnull align 8 dereferenceable(16) getelementptr inbounds nuw (i8, ptr addrspace(1) @page1, i64 8), i64 16, i1 false) | ||
ret void | ||
} | ||
|
||
declare void @llvm.memcpy.p1.p1.i64(ptr addrspace(1) noalias writeonly captures(none), ptr addrspace(1) noalias readonly captures(none), i64, i1 immarg) | ||
|
||
define dso_local void @test_memmove() local_unnamed_addr { | ||
; CHECK-LABEL: define dso_local void @test_memmove() local_unnamed_addr { | ||
; CHECK-NEXT: call void @llvm.memmove.p0.p0.i64(ptr align 8 addrspacecast (ptr addrspace(1) getelementptr inbounds nuw (i8, ptr addrspace(1) @page2, i64 16) to ptr), ptr align 8 addrspacecast (ptr addrspace(1) getelementptr inbounds nuw (i8, ptr addrspace(1) @page2, i64 8) to ptr), i64 16, i1 false) | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.memmove.p1.p1.i64(ptr addrspace(1) noundef nonnull align 8 dereferenceable(16) getelementptr inbounds nuw (i8, ptr addrspace(1) @page2, i64 16), ptr addrspace(1) noundef nonnull align 8 dereferenceable(16) getelementptr inbounds nuw (i8, ptr addrspace(1) @page2, i64 8), i64 16, i1 false) | ||
ret void | ||
} | ||
|
||
declare void @llvm.memmove.p1.p1.i64(ptr addrspace(1) writeonly captures(none), ptr addrspace(1) readonly captures(none), i64, i1 immarg) | ||
|
||
define dso_local void @test_memset_inline() local_unnamed_addr { | ||
; CHECK-LABEL: define dso_local void @test_memset_inline() local_unnamed_addr { | ||
; CHECK-NEXT: call void @llvm.memset.inline.p0.i64(ptr align 8 addrspacecast (ptr addrspace(1) getelementptr inbounds nuw (i8, ptr addrspace(1) @page1, i64 16) to ptr), i8 0, i64 16, i1 false) | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.memset.inline.p1.i64(ptr addrspace(1) nonnull align 8 getelementptr inbounds nuw (i8, ptr addrspace(1) @page1, i64 16), i8 0, i64 16, i1 false) | ||
ret void | ||
} | ||
|
||
declare void @llvm.memset.inline.p1.i64(ptr addrspace(1) writeonly captures(none), i8, i64, i1 immarg) | ||
|
||
define dso_local void @test_memcpy_inline() local_unnamed_addr { | ||
; CHECK-LABEL: define dso_local void @test_memcpy_inline() local_unnamed_addr { | ||
; CHECK-NEXT: call void @llvm.memcpy.inline.p0.p0.i64(ptr align 8 addrspacecast (ptr addrspace(1) getelementptr inbounds nuw (i8, ptr addrspace(1) @page2, i64 8) to ptr), ptr align 8 addrspacecast (ptr addrspace(1) getelementptr inbounds nuw (i8, ptr addrspace(1) @page1, i64 8) to ptr), i64 16, i1 false) | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.memcpy.inline.p1.p1.i64(ptr addrspace(1) nonnull align 8 getelementptr inbounds nuw (i8, ptr addrspace(1) @page2, i64 8), ptr addrspace(1) nonnull align 8 getelementptr inbounds nuw (i8, ptr addrspace(1) @page1, i64 8), i64 16, i1 false) | ||
ret void | ||
} | ||
|
||
declare void @llvm.memcpy.inline.p1.p1.i64(ptr addrspace(1) noalias writeonly captures(none), ptr addrspace(1) noalias readonly captures(none), i64, i1 immarg) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6 | ||
; RUN: opt --bpf-check-and-opt-ir -S -mtriple=bpf-pc-linux < %s | FileCheck %s | ||
|
||
@page1 = dso_local local_unnamed_addr addrspace(1) global [10 x ptr] zeroinitializer, align 8 | ||
@page2 = dso_local local_unnamed_addr addrspace(1) global [10 x ptr] zeroinitializer, align 8 | ||
|
||
define dso_local void @test_memset() local_unnamed_addr { | ||
; CHECK-LABEL: define dso_local void @test_memset() local_unnamed_addr { | ||
; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 8 addrspacecast (ptr addrspace(1) @page1 to ptr), i8 0, i64 16, i1 false) | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.memset.p1.i64(ptr addrspace(1) noundef align 8 dereferenceable(16) @page1, i8 0, i64 16, i1 false) | ||
ret void | ||
} | ||
|
||
declare void @llvm.memset.p1.i64(ptr addrspace(1) writeonly captures(none), i8, i64, i1 immarg) | ||
|
||
define dso_local void @test_memcpy() local_unnamed_addr { | ||
; CHECK-LABEL: define dso_local void @test_memcpy() local_unnamed_addr { | ||
; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 8 addrspacecast (ptr addrspace(1) @page2 to ptr), ptr align 8 addrspacecast (ptr addrspace(1) @page1 to ptr), i64 16, i1 false) | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.memcpy.p1.p1.i64(ptr addrspace(1) noundef align 8 dereferenceable(16) @page2, ptr addrspace(1) noundef align 8 dereferenceable(16) @page1, i64 16, i1 false) | ||
ret void | ||
} | ||
|
||
declare void @llvm.memcpy.p1.p1.i64(ptr addrspace(1) noalias writeonly captures(none), ptr addrspace(1) noalias readonly captures(none), i64, i1 immarg) | ||
|
||
define dso_local void @test_memset_inline() local_unnamed_addr { | ||
; CHECK-LABEL: define dso_local void @test_memset_inline() local_unnamed_addr { | ||
; CHECK-NEXT: call void @llvm.memset.inline.p0.i64(ptr align 8 addrspacecast (ptr addrspace(1) @page1 to ptr), i8 0, i64 16, i1 false) | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.memset.inline.p1.i64(ptr addrspace(1) align 8 @page1, i8 0, i64 16, i1 false) | ||
ret void | ||
} | ||
|
||
declare void @llvm.memset.inline.p1.i64(ptr addrspace(1) writeonly captures(none), i8, i64, i1 immarg) | ||
|
||
define dso_local void @test_memcpy_inline() local_unnamed_addr { | ||
; CHECK-LABEL: define dso_local void @test_memcpy_inline() local_unnamed_addr { | ||
; CHECK-NEXT: call void @llvm.memcpy.inline.p0.p0.i64(ptr align 8 addrspacecast (ptr addrspace(1) @page2 to ptr), ptr align 8 addrspacecast (ptr addrspace(1) @page1 to ptr), i64 16, i1 false) | ||
; CHECK-NEXT: ret void | ||
; | ||
tail call void @llvm.memcpy.inline.p1.p1.i64(ptr addrspace(1) align 8 @page2, ptr addrspace(1) align 8 @page1, i64 16, i1 false) | ||
ret void | ||
} | ||
|
||
declare void @llvm.memcpy.inline.p1.p1.i64(ptr addrspace(1) noalias writeonly captures(none), ptr addrspace(1) noalias readonly captures(none), i64, i1 immarg) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked if there are some other intrinsics we need to care about and found these:
Intrinsic::memcpy_inline
, available as a builtin function (link)Intrinsic::memset_inline
, sameIntrinsic::memcpy_element_unordered_atomic
,Intrinsic::memmove_element_unordered_atomic
,Intrinsic::memset_element_unordered_atomic
-- see the code to handle these, but don't see any code that introduces them.Intrinsic::experimental_memset_pattern
--LoopIdiomRecognize::processLoopStridedStore
can introduce these.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for list the intrinsic's in the above. I missed __builtin_memcpy_inline and __builtin_memset_inline which is very similar to __builtin_mem{cpy,set} but the __inline version requires the 'size' argument to be constant. In current bpf progs, we all use __builtin_mem{set,cpy}() with constant size, so it essentially equivalent to __builtin_mem{set,cpy}_line(). It will be trivial to add both to the pull request.
I think we can ignore mem{cpy,move,set}_element_unordered_atomic. I am aware of this set of intrinsics. The operand of these memory operations need to be atomic and so for our addr-space arguments, we can ignore them.
For Intrinsic:experimental_memset_pattern, it tries to convert a loop like
to the following intrinsic
This should be rare. But for completeness, I think I can add this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked ChatGPT for the following question?
The following is the answer:
====================
Short version: call Apple’s memset_pattern{4,8,16} from C on a Darwin target (macOS/iOS). Clang recognizes these and lowers them to the LLVM IR intrinsic llvm.experimental.memset.pattern (which can then be expanded efficiently).
Minimal example (macOS / iOS targets)
// clang -O2 -target x86_64-apple-macos14 -S -emit-llvm ex.c -o ex.ll
#include <string.h>
void fill16(void *dst, size_t n) {
unsigned char pat[16] =
{0,1,2,3,4,5,6,7, 8,9,10,11,12,13,14,15};
memset_pattern16(dst, pat, n);
}
void fill8(void *dst, size_t n) {
unsigned char pat[8] = {1,2,3,4,5,6,7,8};
memset_pattern8(dst, pat, n);
}
void fill4(void *dst, size_t n) {
unsigned char pat[4] = {0xAA,0xBB,0xCC,0xDD};
memset_pattern4(dst, pat, n);
}
In the emitted IR you’ll see calls like:
call void @llvm.experimental.memset.pattern.p0.i64(
ptr %dst, ptr %pat, i64 %n, i1 false)
Notes
These memset_pattern{4,8,16} functions are Apple libc extensions. On non-Darwin targets, Clang won’t lower them to the intrinsic—either you’ll get a normal library call or a loop.
There isn’t a portable C standard function that maps to llvm.experimental.memset.pattern.
The intrinsic allows repeating multi-byte patterns (4/8/16). Plain llvm.memset only repeats a single byte.
If you’re not on macOS/iOS but still want the intrinsic (for experimentation), compile with a Darwin target triple as shown above.
====================
I actually tried to compile with the above example. It compiled successfully with the following compiler:
Apple clang version 17.0.0 (clang-1700.0.13.5)
Target: arm64-apple-darwin24.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
But it will fail to compile with linux and x86 target.
Unfortunately, the compiler of Apple on my Mac is too old to generate llvm.experimental.memset.pattern. I suspect the latest clang (with Apple target) should generate llvm.experimental.memset.pattern. The following is the related code in LoopIDiomRecognize.cpp:
ForceMemsetPatternIntrinsic is an internal flag.
So memset_pattern16 function is needed to generate Intrinsic::experimental_memset_pattern() and memset_pattern16 is only available for Apple target.
So I will skip experimental_memset_pattern for now.