Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compiler-rt][RISCV] Implement __init_riscv_feature_bits #85790

Merged
merged 33 commits into from
Jul 21, 2024

Conversation

BeMg
Copy link
Contributor

@BeMg BeMg commented Mar 19, 2024

Base on riscv-non-isa/riscv-c-api-doc#74, this patch defines the __riscv_feature_bits and __riscv_vendor_feature_bits structures to store the enabled feature bits at runtime.

It also introduces the __init_riscv_features_bit function to update these structures based on the platform query mechanism.

Additionally, the groupid/bitmask definitions from riscv-non-isa/riscv-c-api-doc#74 are declared and used to update the __riscv_feature_bits and __riscv_vendor_feature_bits structures.

@BeMg
Copy link
Contributor Author

BeMg commented Mar 19, 2024

This patch make #85786 could run some real test.

@BeMg
Copy link
Contributor Author

BeMg commented Mar 30, 2024

  1. Align with latest sys_riscv_hwprobe
  2. Update __riscv_ifunc_select, from __riscv_ifunc_select(char *) to __riscv_ifunc_select(unsigned long long, unsigned long long).
  3. Remove the cpuinfo relate code and string process relate code
  4. Use the bitset method to determine whether a set of extension is available for current environment.

@BeMg BeMg marked this pull request as ready for review March 30, 2024 12:13
@BeMg BeMg requested a review from kito-cheng March 30, 2024 12:14
@BeMg BeMg requested review from topperc, lukel97 and preames March 30, 2024 12:14
Copy link

github-actions bot commented Apr 1, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@BeMg BeMg requested review from lukel97 and wangpc-pp April 1, 2024 02:35
@wangpc-pp
Copy link
Contributor

Are there any processes for GNU/GCC implementation? If we want to port glibc, I think it should be required.

@BeMg
Copy link
Contributor Author

BeMg commented Apr 8, 2024

  1. Let the caller to manage and construct the necessary key/value pairs for hwprobe, eliminating the need for the runtime site to sync with the hwprobe key table.
  2. Modify __riscv_ifunc_select to accept a pointer to riscv_hwprobe and its length, so the prototype does not need to be updated when the hwprobe keys increase or change.

@BeMg
Copy link
Contributor Author

BeMg commented Apr 22, 2024

Since this resolver function is expected to be available and interchangeable for both libgcc and compiler-rt, a formal specification for the resolver function interface is necessary.

I've create one for this PR riscv-non-isa/riscv-c-api-doc#74 and provide the three different candidate approach to achieve the same purpose.

…ature_bits/__init_riscv_features_bit

Base on riscv-non-isa/riscv-c-api-doc#74, this patch defines the __riscv_feature_bits and __riscv_vendor_feature_bits structures to store the enabled feature bits at runtime.

It also introduces the __init_riscv_features_bit function to update these structures based on the platform query mechanism.

Additionally, the groupid/bitmask definitions from riscv-non-isa/riscv-c-api-doc#74 are declared and used to update the __riscv_feature_bits and __riscv_vendor_feature_bits structures.
@BeMg BeMg force-pushed the IFUNC/riscv_ifunc_select-impl branch from 9b06f1b to 628f3e8 Compare June 11, 2024 04:38
@BeMg BeMg marked this pull request as ready for review June 11, 2024 14:31
Copy link
Member

@kito-cheng kito-cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my end, also I has implemented libgcc version, and posted into mailing list: https://patchwork.sourceware.org/project/gcc/patch/[email protected]/

Copy link
Member

@kito-cheng kito-cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comment to improving multi-threading issue.

@dtcxzyw dtcxzyw requested a review from MaskRay July 17, 2024 11:48
@preames
Copy link
Collaborator

preames commented Jul 19, 2024

Following up to conversation from the RISCV sync up call yesterday.

LGTM to the approach. I'm deferring to Kito on the implementation details of compiler-rt. This LGTM is subject to the requirement that this patch is reverted from the release branch if for any reason the dependent compiler default ifunc resolver change doesn't make it into the release. (Edit: After going and taking a detail look at the dependent compiler changes - yeah, those are likely not getting in. As such, this LGTM will not mean much.)

As broader context (as much for my future self as anything else). We have three major options on the default resolver approach.

  • We could just use hwcaps. This is pretty universally rejected as the bits are ambiguous in several known cases, and only cover a handful of extensions.
  • We could use the libc entry point to hwprobe provided to the resolver in the second argument register. This is only available in glibc 2.40 and later, before that a nullptr is passed (args are nullptr terminated.) 2.40 is unreleased, and we don't want to dependent on an unpublished ABI. As such, this option would require we delay this feature until 20.x.
  • This approach. The downsides of this approach are that a) most users use libgcc not compiler-rt, and b) we have an extra dependency layer which may slow pickup of future extensions. The benefit is that a clang toolchain using compiler-rt picks up this feature at least 6 months sooner. There's also some discussion of backporting the corresponding libgcc change.

The later two both have pros and cons. I personally would mildly prefer the second option, but am deferring to the folks who've worked on this as the third choice (this one) is at least reasonable. Worth noting is that even if we land this, if we later decide the versioning upgrade thing is a major problem, there's nothing preventing a future compiler version from switching to the glibc entry if available.


// Init vendor extension
__riscv_vendor_feature_bits.length = 0;
__riscv_vendor_feature_bits.vendorID = Hwprobes[2].value;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth a note in the code...

On first glance it looks like there's missing error handling here. The code is actually okay, but that's slightly non-obvious.

You may be on a kernel version which supports hwprobe, but doesn't recognize a given key. In that situation, the documentation says that the syscall will return success, but the key field will be set to -1. This code is relying on the fact that the value field will also be 0 in this case. This happens to work out to having all the bits unset.

@jrtc27
Copy link
Collaborator

jrtc27 commented Jul 19, 2024

Following up to conversation from the RISCV sync up call yesterday.

LGTM to the approach. I'm deferring to Kito on the implementation details of compiler-rt. This LGTM is subject to the requirement that this patch is reverted from the release branch if for any reason the dependent compiler default ifunc resolver change doesn't make it into the release.

As broader context (as much for my future self as anything else). We have three major options on the default resolver approach.

  • We could just use hwcaps. This is pretty universally rejected as the bits are ambiguous in several known cases, and only cover a handful of extensions.
  • We could use the libc entry point to hwprobe provided to the resolver in the second argument register. This is only available in glibc 2.40 and later, before that a nullptr is passed (args are nullptr terminated.) 2.40 is unreleased, and we don't want to dependent on an unpublished ABI. As such, this option would require we delay this feature until 20.x.
  • This approach. The downsides of this approach are that a) most users use libgcc not compiler-rt, and b) we have an extra dependency layer which may slow pickup of future extensions. The benefit is that a clang toolchain using compiler-rt picks up this feature at least 6 months sooner. There's also some discussion of backporting the corresponding libgcc change.

The later two both have pros and cons. I personally would mildly prefer the second option, but am deferring to the folks who've worked on this as the third choice (this one) is at least reasonable. Worth noting is that even if we land this, if we later decide the versioning upgrade thing is a major problem, there's nothing preventing a future compiler version from switching to the glibc entry if available.

One of the points of having an abstraction, whether like this or otherwise, is that you don't need to have per-OS code in the compiler to handle multi-versioning. This format is simple enough that it's not tied to one OS, and the extensions specified by a body other than a specific OS, unlike hwprobe which is defined by Linux and has an interface tied to it (e.g. the use of its notion of CPU sets). This is something FreeBSD can realistically implement.

@preames
Copy link
Collaborator

preames commented Jul 19, 2024

One of the points of having an abstraction, whether like this or otherwise, is that you don't need to have per-OS code in the compiler to handle multi-versioning. This format is simple enough that it's not tied to one OS, and the extensions specified by a body other than a specific OS, unlike hwprobe which is defined by Linux and has an interface tied to it (e.g. the use of its notion of CPU sets). This is something FreeBSD can realistically implement.

@jrtc27 Is there an interface provided by e.g. FreeBSD that we should be looking at here? If not, this seems like somewhat of a moot argument.

As a second point, asking from ignorance here as I honestly don't know, don't we generally know the target OS from the triple? Generating code which has to work on any OS versus some specific OS seems like a generally harder problem. The dependent patches already have e.g.:

  if (getContext().getTargetInfo().getTriple().getOS() !=
      llvm::Triple::OSType::Linux) {
    CGM.getDiags().Report(diag::err_os_unsupport_riscv_target_clones);
    return;
  }

Is your argument that while we can generate OS specific code, we should prefer not to? If so, that seems like a reasonable code quality point, but I don't see how it's in anyway blocking. We can ship a version of the compiler with the OS specific enable, and then generalize once we have a second example, and sink common APIs into compiler runtimes if useful. It also seems like a concern which deserves to be balanced with e.g. the timeline to expose a new extension as opposed to a hard and fast rule.

@jrtc27
Copy link
Collaborator

jrtc27 commented Jul 19, 2024

One of the points of having an abstraction, whether like this or otherwise, is that you don't need to have per-OS code in the compiler to handle multi-versioning. This format is simple enough that it's not tied to one OS, and the extensions specified by a body other than a specific OS, unlike hwprobe which is defined by Linux and has an interface tied to it (e.g. the use of its notion of CPU sets). This is something FreeBSD can realistically implement.

@jrtc27 Is there an interface provided by e.g. FreeBSD that we should be looking at here? If not, this seems like somewhat of a moot argument.

Not yet, because I was waiting to see what happened with function multiversioning.

As a second point, asking from ignorance here as I honestly don't know, don't we generally know the target OS from the triple?

We do, but the less conditionality the better; easier to maintain, and less to test.

Generating code which has to work on any OS versus some specific OS seems like a generally harder problem. The dependent patches already have e.g.:

  if (getContext().getTargetInfo().getTriple().getOS() !=
      llvm::Triple::OSType::Linux) {
    CGM.getDiags().Report(diag::err_os_unsupport_riscv_target_clones);
    return;
  }

Is your argument that while we can generate OS specific code, we should prefer not to? If so, that seems like a reasonable code quality point

Yeah, exactly.

but I don't see how it's in anyway blocking. We can ship a version of the compiler with the OS specific enable, and then generalize once we have a second example,

Eh, you can go either way on the compiler. It's an interface that FreeBSD will implement at some point, so you could argue that it's better to get it in the compiler sooner rather than later so you can use an older compiler on a newer system that provides it (especially given it has to have run-time detection of the interface's availability anyway). But you could also argue that it's known to be useless so making it look like it works is unhelpful.

and sink common APIs into compiler runtimes if useful. It also seems like a concern which deserves to be balanced with e.g. the timeline to expose a new extension as opposed to a hard and fast rule.

@jrtc27
Copy link
Collaborator

jrtc27 commented Jul 19, 2024

And to be clear, I'm not saying that anything should be blocked on getting FreeBSD supported. I'm just saying that one of the benefits of this interface is that it can be reused on other OSes in future PRs, all that's needed is implementing it for that OS in compiler-rt, which is completely doable for any OS.

preames added a commit to preames/llvm-project that referenced this pull request Jul 19, 2024
This implements the __builtin_cpu_init and __builtin_cpu_supports
builtin routines based on the compiler runtime changes in llvm#85790.

This is inspired by llvm#85786.
Major changes are a) a restriction in scope to only the builtins
(which have a much narrower user interface), and the avoidance of
false generality.  This change deliberately only handles group 0
extensions (which happen to be all defined ones today), and avoids
the tblgen changes from that review.

This is still a WIP.  It is posted for initial feedback on whether
this makes sense to try to get into 19.x release. Major items left
undone:

* Updating clang tests to exercise this logic.
* Actually running it at all.  I did not build compiler-rt, and thus
  all my checking was of generated asm/IR.
* Investigate claims from gcc docs that __builtin_cpu_init is called
  early in process lifetime with high priority constructor.  I did
  not find this with some quick searching.
@BeMg BeMg merged commit a41a4ac into llvm:main Jul 21, 2024
6 checks passed

static int FeaturesBitCached = 0;

void __init_riscv_feature_bits() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a missing piece here. The corresponding bit of X86 code (in compiler-rt/lib/builtins/cpu_model/x86.c), uses CONSTRUCTOR_ATTRIBUTE to ensure that the initialization is called early in process lifetime even if an ifunc which explicitly depends invokes the initialization isn't called. I believe we need to do the same thing here. The slightly confusing bit is that aarch64 appears not to do this.

preames added a commit that referenced this pull request Jul 23, 2024
This implements the __builtin_cpu_init and __builtin_cpu_supports
builtin routines based on the compiler runtime changes in
#85790.

This is inspired by #85786.
Major changes are a) a restriction in scope to only the builtins (which
have a much narrower user interface), and the avoidance of false
generality. This change deliberately only handles group 0 extensions
(which happen to be all defined ones today), and avoids the tblgen
changes from that review.

I don't have an environment in which I can actually test this, but @BeMg
has been kind enough to report that this appears to work as expected.

Before this can make it into a release, we need a change such as
#99958. The gcc docs claim that
cpu_support can be called by "normal" code without calling the cpu_init
routine because the init routine will have been called by a high
priority constructor. Our current compiler-rt mechanism does not do
this.
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Base on riscv-non-isa/riscv-c-api-doc#74, this
patch defines the `__riscv_feature_bits` and
`__riscv_vendor_feature_bits` structures to store the enabled feature
bits at runtime.

It also introduces the `__init_riscv_feature_bits` function to update
these structures based on the platform query mechanism.

Additionally, the groupid/bitmask definitions from
riscv-non-isa/riscv-c-api-doc#74 are declared
and used to update the `__riscv_feature_bits` and
`__riscv_vendor_feature_bits` structures.

---------

Co-authored-by: Kito Cheng <[email protected]>
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
This implements the __builtin_cpu_init and __builtin_cpu_supports
builtin routines based on the compiler runtime changes in
#85790.

This is inspired by #85786.
Major changes are a) a restriction in scope to only the builtins (which
have a much narrower user interface), and the avoidance of false
generality. This change deliberately only handles group 0 extensions
(which happen to be all defined ones today), and avoids the tblgen
changes from that review.

I don't have an environment in which I can actually test this, but @BeMg
has been kind enough to report that this appears to work as expected.

Before this can make it into a release, we need a change such as
#99958. The gcc docs claim that
cpu_support can be called by "normal" code without calling the cpu_init
routine because the init routine will have been called by a high
priority constructor. Our current compiler-rt mechanism does not do
this.
@asb
Copy link
Contributor

asb commented Jul 30, 2024

For what it's worth, I left a comment on the C API PR querying whether we should better define the interface for failure (e.g. if __init_riscv_features doesn't do anything useful for the target platform). See here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants