[CIR][CIRGen][TBAA] Add support for struct types #1338

PikachuHyA · 2025-02-12T03:42:03Z

This patch introduces support for TBAA with struct types. The converted CIR_StructType is stored in TBAAStructAttr, which is then lowered using CIRToLLVMTBAAStructAttrLowering.

bcardosolopes · 2025-02-12T13:54:04Z

clang/lib/CIR/Lowering/DirectToLLVM/LowerTBAAToLLVM.cpp

+          lowerMod->getContext().getCodeGenOpts(),
+          lowerMod->getContext().getLangOpts());
+      auto baseType =
+          structLower.lowerStructType(ast.getRawDecl()->getTypeForDecl());


We usually don't look at the raw decl, either (a) provide the AST interface methods to solve the queries you need or (b) lower that information in CIRGen and attach more data to the TBAA attribute so you don't need to do the work here. Before you make any changes I'd like to hear why you decided not to do (b) (given perhaps you didn't know about (a))

(b) lower that information in CIRGen and attach more data to the TBAA attribute so you don't need to do the work here.

For (b), my first implementation involves encoding enough information into the TBAA attribute within CIRGen. You can find the initial implementation here: #1076. After the code review, several key suggestions emerged.

The overall idea is to infer as much as possible during lowering, while only encoding hard-to-determine information in CIRGen.

A general question for this PR: why do we need to re-encode information that CIR already possesses? The goal is to lower to identical LLVM TBAA information without replicating it all in CIR. How can we better reuse existing data?

Refer to the original comments here:

[CIR][CIRGen][TBAA] Add support for TBAA #1076 (comment)

[CIR][CIRGen][TBAA] Add support for TBAA #1076 (comment)

In this patch, I lower TBAAStructTypeAttr based on the Clang type in Lowering.

(a) provide the AST interface methods to solve the queries you need

Regarding (a), how do we get the type name of each field in a struct from CIR_StructType? To keep the final TBAA info consistent with the original, we need the Clang type name. If using raw declarations isn't possible, we should at least encode the type name.

I have some thoughts to share; please feel free to express your insights.

What do we need in lowering cir::TBAAStructTypeAttr?
We need the type name, size, and offset of each field in the struct. This information helps us construct llvm::TBAATypeDescriptorAttr with llvm::TBAAMemberAttr. The llvm::TBAAMemberAttr carries field details, including type name, size, and offset.

How do we obtain type size and field offset?
Sample code for obtaining this information:

// cir::LowerModule *lowerMod for (int i = 0; i < structType.getNumElements(); i++) { auto fieldType = structType.getMembers()[i]; auto size = lowerMod->getDataLayout().layout.getTypeSize(fieldType); auto offset = lowerMod->getDataLayout() .getStructLayout(structType) ->getMemberOffsets()[i]; llvm::errs() << "idx: " << i << ", size: " << size << ", offset: " << offset << "\n"; }

How do we obtain the type name?
It seems there's no direct way to get the type name of echo field from CIR_StructType. (the same issue when we handle scalar type). Perhaps we should encode the type name for each field in the struct. Following this idea, we can construct cir::TBAAMemberAttr with only an ID. The size and offset can be inferred from CIR_StructType.
A potential risk is that the IDs generated by CIRGen do not match the size and offset inferred from CIR_StructType.
For example, we have a struct A

struct A { int a; long b; long long c; }

and the CIRTBAAStructAttr is #tbaa_struct{#tbaa_member{id = int}, #tbaa_member{id = long}, #tbaa_member{id = long long}}. when lowering, we get size and offset use the method in How do we obtain type size and field offset? . I am not sure if the IDs and size/offset here can match precisely.

Thanks for the detailed comment!

After the code review, several key suggestions emerged. You can find the initial implementation here: #1076. After the code review, several key suggestions emerged.

My apologies - I somehow didn't realized it would need so much of Clang's in order to solve these. From a layering perspective, LLVM lowering shouldn't be looking at clang::Type (probably more acceptable if through interfaces), it's too late.

how do we get the type name of each field in a struct from CIR_StructType?

You could a new method to ASTRecordDeclInterface that will compute whatever information you need from the original RecordDecl, see the other methods in ASTRecordDeclInterface for examples. What limitations have you found with this approach?

Alternatively, you could add it directly to the attribute, I'm more inclined for the interface approach because not sure the name is useful for anything else if part of the attribute, but I guess that depends on how feasible it is to implement the interface above.

How do we obtain type size and field offset

Doing it as part of the LowerModule seems reasonable (so that we we don't have to recompute it again during callconv lowering), but given callconv isn't ON by default, I'm afraid we will probably need to do it during CIRGen and fix the info when structs are changed during LowerModule. Thoughts?

Is there a way to split this patch a bit? I'm finding it hard to have the full picture. It's fine if this leads to incremental work (or some of the testing has to come a bit later).

You could a new method to ASTRecordDeclInterface that will compute whatever information you need from the original RecordDecl, see the other methods in ASTRecordDeclInterface for examples. What limitations have you found with this approach?

I have tried this approach. Here are some observations.

If I add a method such as llvm::ArrayRef<TBAAStructField> getTBAAStructFieldArray() to ASTRecordDeclInterface, the structure of TBAAStructField would be:

struct TBAAStructField { uint64_t offset; uint64_t size; mlir::LLVM::TBAATypeDescriptorAttr type }

This means doing lowering within ASTRecordDeclInterface. We then need to manage nested struct fields in getTBAAStructFieldArray(). I believe that handling lowering here is not a good practice.

Alternatively, if we define TBAAStructField as:

struct TBAAStructField { uint64_t offset; uint64_t size; clang::QualType type; // or const clang::Type *ty }

We can obtain the Clang type, offset, and size. If the field type is scalar, we can use CIRGenTBAA::getScalarTypeInfo to get the TBAA type name. please note we need duplicate the logic in LLVM lowering.
However, for struct types, recursion is needed. There’s a gap before the recursion starts. Also, we must remember that the field type here is clang::Type not CIR_StructType.

If we lower with clang::Type, ASTRecordDeclInterface::getTBAAStructFieldArray() becomes redundant. If we use converted CIR_StructType, we face the challenge of converting clang::QualType to CIR_StructType. In CIRGen, CIRGenTypes::convertType handles this. But how do we do it in LLVM lowering?

Conclusion
After various attempts, I prefer encoding struct fields in CIRGen even if it may result in some duplication in CIR.
However, if we handle cir::TBAAStructAttr by adding a new method to ASTRecordDeclInterface, it will also introduce duplication between CIRGen and LLVM Lowering.
I've submitted a new implementation. see #1365
Note that I've only included changes in CIRGen to minimize the review burden, but I will add the LLVM lowering component once we reach consensus on this implementation.

I'm afraid we will probably need to do it during CIRGen and fix the info when structs are changed during LowerModule. Thoughts?

I think that encoding the offset in CIRGen and retrieving the size from the converted type could represent the simplest approach. please see the new implementation.

Is there a way to split this patch a bit? I'm finding it hard to have the full picture. It's fine if this leads to incremental work (or some of the testing has to come a bit later).

I've only included changes in CIRGen in the new implementation.

Thanks for the extra effort trying out different things, I feel like CIRGen is probably the way to go (even with some dups), let me check #1365

[CIR][CIRGen][TBAA] Add support for struct types

d1765c3

PikachuHyA requested review from lanza and bcardosolopes as code owners February 12, 2025 03:42

bcardosolopes requested changes Feb 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CIR][CIRGen][TBAA] Add support for struct types #1338

[CIR][CIRGen][TBAA] Add support for struct types #1338

PikachuHyA commented Feb 12, 2025

bcardosolopes Feb 12, 2025

PikachuHyA Feb 13, 2025

bcardosolopes Feb 14, 2025

PikachuHyA Feb 17, 2025

bcardosolopes Feb 19, 2025

[CIR][CIRGen][TBAA] Add support for struct types #1338

Are you sure you want to change the base?

[CIR][CIRGen][TBAA] Add support for struct types #1338

Conversation

PikachuHyA commented Feb 12, 2025

bcardosolopes Feb 12, 2025

Choose a reason for hiding this comment

PikachuHyA Feb 13, 2025

Choose a reason for hiding this comment

bcardosolopes Feb 14, 2025

Choose a reason for hiding this comment

PikachuHyA Feb 17, 2025

Choose a reason for hiding this comment

bcardosolopes Feb 19, 2025

Choose a reason for hiding this comment