Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][CIRGen][TBAA] Add support for struct types #1338

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

PikachuHyA
Copy link
Collaborator

This patch introduces support for TBAA with struct types. The converted CIR_StructType is stored in TBAAStructAttr, which is then lowered using CIRToLLVMTBAAStructAttrLowering.

lowerMod->getContext().getCodeGenOpts(),
lowerMod->getContext().getLangOpts());
auto baseType =
structLower.lowerStructType(ast.getRawDecl()->getTypeForDecl());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually don't look at the raw decl, either (a) provide the AST interface methods to solve the queries you need or (b) lower that information in CIRGen and attach more data to the TBAA attribute so you don't need to do the work here. Before you make any changes I'd like to hear why you decided not to do (b) (given perhaps you didn't know about (a))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(b) lower that information in CIRGen and attach more data to the TBAA attribute so you don't need to do the work here.

For (b), my first implementation involves encoding enough information into the TBAA attribute within CIRGen. You can find the initial implementation here: #1076. After the code review, several key suggestions emerged.

The overall idea is to infer as much as possible during lowering, while only encoding hard-to-determine information in CIRGen.

A general question for this PR: why do we need to re-encode information that CIR already possesses? The goal is to lower to identical LLVM TBAA information without replicating it all in CIR. How can we better reuse existing data?

Refer to the original comments here:

In this patch, I lower TBAAStructTypeAttr based on the Clang type in Lowering.

(a) provide the AST interface methods to solve the queries you need

Regarding (a), how do we get the type name of each field in a struct from CIR_StructType? To keep the final TBAA info consistent with the original, we need the Clang type name. If using raw declarations isn't possible, we should at least encode the type name.


I have some thoughts to share; please feel free to express your insights.

  1. What do we need in lowering cir::TBAAStructTypeAttr?
    We need the type name, size, and offset of each field in the struct. This information helps us construct llvm::TBAATypeDescriptorAttr with llvm::TBAAMemberAttr. The llvm::TBAAMemberAttr carries field details, including type name, size, and offset.

  2. How do we obtain type size and field offset?
    Sample code for obtaining this information:

    // cir::LowerModule *lowerMod
    for (int i = 0; i < structType.getNumElements(); i++) {
      auto fieldType = structType.getMembers()[i];
      auto size = lowerMod->getDataLayout().layout.getTypeSize(fieldType);
      auto offset = lowerMod->getDataLayout()
                        .getStructLayout(structType)
                        ->getMemberOffsets()[i];
      llvm::errs() << "idx: " << i << ", size: " << size
                   << ", offset: " << offset << "\n";
    }
  3. How do we obtain the type name?
    It seems there's no direct way to get the type name of echo field from CIR_StructType. (the same issue when we handle scalar type). Perhaps we should encode the type name for each field in the struct. Following this idea, we can construct cir::TBAAMemberAttr with only an ID. The size and offset can be inferred from CIR_StructType.
    A potential risk is that the IDs generated by CIRGen do not match the size and offset inferred from CIR_StructType.
    For example, we have a struct A

struct A {
int a;
long b;
long long c;
}

and the CIRTBAAStructAttr is #tbaa_struct{#tbaa_member{id = int}, #tbaa_member{id = long}, #tbaa_member{id = long long}}. when lowering, we get size and offset use the method in How do we obtain type size and field offset? . I am not sure if the IDs and size/offset here can match precisely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed comment!

After the code review, several key suggestions emerged. You can find the initial implementation here: #1076. After the code review, several key suggestions emerged.

My apologies - I somehow didn't realized it would need so much of Clang's in order to solve these. From a layering perspective, LLVM lowering shouldn't be looking at clang::Type (probably more acceptable if through interfaces), it's too late.

how do we get the type name of each field in a struct from CIR_StructType?

You could a new method to ASTRecordDeclInterface that will compute whatever information you need from the original RecordDecl, see the other methods in ASTRecordDeclInterface for examples. What limitations have you found with this approach?

Alternatively, you could add it directly to the attribute, I'm more inclined for the interface approach because not sure the name is useful for anything else if part of the attribute, but I guess that depends on how feasible it is to implement the interface above.

How do we obtain type size and field offset

Doing it as part of the LowerModule seems reasonable (so that we we don't have to recompute it again during callconv lowering), but given callconv isn't ON by default, I'm afraid we will probably need to do it during CIRGen and fix the info when structs are changed during LowerModule. Thoughts?

Is there a way to split this patch a bit? I'm finding it hard to have the full picture. It's fine if this leads to incremental work (or some of the testing has to come a bit later).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could a new method to ASTRecordDeclInterface that will compute whatever information you need from the original RecordDecl, see the other methods in ASTRecordDeclInterface for examples. What limitations have you found with this approach?

I have tried this approach. Here are some observations.

If I add a method such as llvm::ArrayRef<TBAAStructField> getTBAAStructFieldArray() to ASTRecordDeclInterface, the structure of TBAAStructField would be:

struct TBAAStructField {
  uint64_t offset;
  uint64_t size;
  mlir::LLVM::TBAATypeDescriptorAttr type
}

This means doing lowering within ASTRecordDeclInterface. We then need to manage nested struct fields in getTBAAStructFieldArray(). I believe that handling lowering here is not a good practice.

Alternatively, if we define TBAAStructField as:

struct TBAAStructField {
  uint64_t offset;
  uint64_t size;
  clang::QualType type; // or const clang::Type *ty
}

We can obtain the Clang type, offset, and size. If the field type is scalar, we can use CIRGenTBAA::getScalarTypeInfo to get the TBAA type name. please note we need duplicate the logic in LLVM lowering.
However, for struct types, recursion is needed. There’s a gap before the recursion starts. Also, we must remember that the field type here is clang::Type not CIR_StructType.

If we lower with clang::Type, ASTRecordDeclInterface::getTBAAStructFieldArray() becomes redundant. If we use converted CIR_StructType, we face the challenge of converting clang::QualType to CIR_StructType. In CIRGen, CIRGenTypes::convertType handles this. But how do we do it in LLVM lowering?

Conclusion
After various attempts, I prefer encoding struct fields in CIRGen even if it may result in some duplication in CIR.
However, if we handle cir::TBAAStructAttr by adding a new method to ASTRecordDeclInterface, it will also introduce duplication between CIRGen and LLVM Lowering.
I've submitted a new implementation. see #1365
Note that I've only included changes in CIRGen to minimize the review burden, but I will add the LLVM lowering component once we reach consensus on this implementation.

I'm afraid we will probably need to do it during CIRGen and fix the info when structs are changed during LowerModule. Thoughts?

I think that encoding the offset in CIRGen and retrieving the size from the converted type could represent the simplest approach. please see the new implementation.

Is there a way to split this patch a bit? I'm finding it hard to have the full picture. It's fine if this leads to incremental work (or some of the testing has to come a bit later).

I've only included changes in CIRGen in the new implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the extra effort trying out different things, I feel like CIRGen is probably the way to go (even with some dups), let me check #1365

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants