-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CIR][CIRGen][TBAA] Add support for struct types #1338
base: main
Are you sure you want to change the base?
Conversation
lowerMod->getContext().getCodeGenOpts(), | ||
lowerMod->getContext().getLangOpts()); | ||
auto baseType = | ||
structLower.lowerStructType(ast.getRawDecl()->getTypeForDecl()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually don't look at the raw decl, either (a) provide the AST interface methods to solve the queries you need or (b) lower that information in CIRGen and attach more data to the TBAA attribute so you don't need to do the work here. Before you make any changes I'd like to hear why you decided not to do (b) (given perhaps you didn't know about (a))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(b) lower that information in CIRGen and attach more data to the TBAA attribute so you don't need to do the work here.
For (b), my first implementation involves encoding enough information into the TBAA attribute within CIRGen. You can find the initial implementation here: #1076. After the code review, several key suggestions emerged.
The overall idea is to infer as much as possible during lowering, while only encoding hard-to-determine information in CIRGen.
A general question for this PR: why do we need to re-encode information that CIR already possesses? The goal is to lower to identical LLVM TBAA information without replicating it all in CIR. How can we better reuse existing data?
Refer to the original comments here:
- [CIR][CIRGen][TBAA] Add support for TBAA #1076 (comment)
- [CIR][CIRGen][TBAA] Add support for TBAA #1076 (comment)
In this patch, I lower TBAAStructTypeAttr
based on the Clang type in Lowering.
(a) provide the AST interface methods to solve the queries you need
Regarding (a), how do we get the type name of each field in a struct from CIR_StructType
? To keep the final TBAA info consistent with the original, we need the Clang type name. If using raw declarations isn't possible, we should at least encode the type name.
I have some thoughts to share; please feel free to express your insights.
-
What do we need in lowering
cir::TBAAStructTypeAttr
?
We need the type name, size, and offset of each field in the struct. This information helps us constructllvm::TBAATypeDescriptorAttr
withllvm::TBAAMemberAttr
. Thellvm::TBAAMemberAttr
carries field details, including type name, size, and offset. -
How do we obtain type size and field offset?
Sample code for obtaining this information:// cir::LowerModule *lowerMod for (int i = 0; i < structType.getNumElements(); i++) { auto fieldType = structType.getMembers()[i]; auto size = lowerMod->getDataLayout().layout.getTypeSize(fieldType); auto offset = lowerMod->getDataLayout() .getStructLayout(structType) ->getMemberOffsets()[i]; llvm::errs() << "idx: " << i << ", size: " << size << ", offset: " << offset << "\n"; }
-
How do we obtain the type name?
It seems there's no direct way to get the type name of echo field fromCIR_StructType
. (the same issue when we handle scalar type). Perhaps we should encode the type name for each field in the struct. Following this idea, we can constructcir::TBAAMemberAttr
with only an ID. The size and offset can be inferred fromCIR_StructType
.
A potential risk is that the IDs generated by CIRGen do not match the size and offset inferred from CIR_StructType.
For example, we have a struct A
struct A {
int a;
long b;
long long c;
}
and the CIRTBAAStructAttr is #tbaa_struct{#tbaa_member{id = int}, #tbaa_member{id = long}, #tbaa_member{id = long long}}
. when lowering, we get size and offset use the method in How do we obtain type size and field offset? . I am not sure if the IDs and size/offset here can match precisely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed comment!
After the code review, several key suggestions emerged. You can find the initial implementation here: #1076. After the code review, several key suggestions emerged.
My apologies - I somehow didn't realized it would need so much of Clang's in order to solve these. From a layering perspective, LLVM lowering shouldn't be looking at clang::Type
(probably more acceptable if through interfaces), it's too late.
how do we get the type name of each field in a struct from CIR_StructType?
You could a new method to ASTRecordDeclInterface
that will compute whatever information you need from the original RecordDecl, see the other methods in ASTRecordDeclInterface
for examples. What limitations have you found with this approach?
Alternatively, you could add it directly to the attribute, I'm more inclined for the interface approach because not sure the name is useful for anything else if part of the attribute, but I guess that depends on how feasible it is to implement the interface above.
How do we obtain type size and field offset
Doing it as part of the LowerModule seems reasonable (so that we we don't have to recompute it again during callconv lowering), but given callconv isn't ON by default, I'm afraid we will probably need to do it during CIRGen and fix the info when structs are changed during LowerModule. Thoughts?
Is there a way to split this patch a bit? I'm finding it hard to have the full picture. It's fine if this leads to incremental work (or some of the testing has to come a bit later).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could a new method to ASTRecordDeclInterface that will compute whatever information you need from the original RecordDecl, see the other methods in ASTRecordDeclInterface for examples. What limitations have you found with this approach?
I have tried this approach. Here are some observations.
If I add a method such as llvm::ArrayRef<TBAAStructField> getTBAAStructFieldArray()
to ASTRecordDeclInterface
, the structure of TBAAStructField
would be:
struct TBAAStructField {
uint64_t offset;
uint64_t size;
mlir::LLVM::TBAATypeDescriptorAttr type
}
This means doing lowering within ASTRecordDeclInterface
. We then need to manage nested struct fields in getTBAAStructFieldArray()
. I believe that handling lowering here is not a good practice.
Alternatively, if we define TBAAStructField
as:
struct TBAAStructField {
uint64_t offset;
uint64_t size;
clang::QualType type; // or const clang::Type *ty
}
We can obtain the Clang type, offset, and size. If the field type is scalar, we can use CIRGenTBAA::getScalarTypeInfo
to get the TBAA type name. please note we need duplicate the logic in LLVM lowering.
However, for struct types, recursion is needed. There’s a gap before the recursion starts. Also, we must remember that the field type here is clang::Type
not CIR_StructType
.
If we lower with clang::Type
, ASTRecordDeclInterface::getTBAAStructFieldArray()
becomes redundant. If we use converted CIR_StructType
, we face the challenge of converting clang::QualType
to CIR_StructType
. In CIRGen
, CIRGenTypes::convertType
handles this. But how do we do it in LLVM lowering?
Conclusion
After various attempts, I prefer encoding struct fields in CIRGen even if it may result in some duplication in CIR.
However, if we handle cir::TBAAStructAttr
by adding a new method to ASTRecordDeclInterface, it will also introduce duplication between CIRGen and LLVM Lowering.
I've submitted a new implementation. see #1365
Note that I've only included changes in CIRGen to minimize the review burden, but I will add the LLVM lowering component once we reach consensus on this implementation.
I'm afraid we will probably need to do it during CIRGen and fix the info when structs are changed during LowerModule. Thoughts?
I think that encoding the offset in CIRGen and retrieving the size from the converted type could represent the simplest approach. please see the new implementation.
Is there a way to split this patch a bit? I'm finding it hard to have the full picture. It's fine if this leads to incremental work (or some of the testing has to come a bit later).
I've only included changes in CIRGen in the new implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the extra effort trying out different things, I feel like CIRGen is probably the way to go (even with some dups), let me check #1365
This patch introduces support for TBAA with struct types. The converted
CIR_StructType
is stored inTBAAStructAttr
, which is then lowered usingCIRToLLVMTBAAStructAttrLowering
.