Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: @loopHint(hint: LoopHint) void builtin function #22151

Open
dweiller opened this issue Dec 5, 2024 · 0 comments · May be fixed by #22187
Open

Proposal: @loopHint(hint: LoopHint) void builtin function #22151

dweiller opened this issue Dec 5, 2024 · 0 comments · May be fixed by #22187
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@dweiller
Copy link
Contributor

dweiller commented Dec 5, 2024

Currently there is no mechanism give the compiler information on how a loop should be optimized. For example, to prevent LLVM's unrolling this loop:

for (0..n) |_| {
    doStuff();
}

you can do something like call std.mem.doNotOptimizeAway:

for (0..n) |i| {
    std.mem.doNotOptimizeAway(i);
    doStuff();
}

but it's not clear precisely what the effect of std.mem.doNotOptimizeAway is here and it can't be used to provide fine-grained control, such as specifying an unroll count.

Proposal

Similar to @branchHint, a new builtin should be used to provide loop optimization hints. It's signature would be:

fn @loopHint(hint: std.builtin.LoopHint) void

If we look at the loop optimization metadata that LLVM has (start here and scroll down to irr_loop to see them all) for controlling loop optimizations, there is potential for the LoopHint type to be quite complicated. We probably shouldn't provide direct access to all these options; I think it makes sense to provide some high-level options in LoopHint to control unrolling, vectorisation and interleaving.

The clang loop directives, available through #pragma clang loop ... are:

distribute(enable|disable)

vectorize(enable|disable)
vectorize_width(value[, fixed|scalable])
vectorize_width(fixed|scalable)
vectorize_predicate(enable|disable)

interleave(enable|disable)
interleave_count(value)

unroll(enable|disable|full)  // 'full' means unroll if comptime-known trip count
unroll_count(value)

Clang doesn't seem to provide directives for controlling LLVM's unroll-and-jam pass, which is a kind of unrolling that can be done on nested loop.

If we wanted similar expressive capabilities as clang, we could define std.builtin.LoopHint something like:

pub const LoopHint = struct {
    unroll: union(enum) {
        auto,
        disable,
        enable,
        count: usize,
    } = .auto,
    vectorize: union(enum) {
        auto,
        enable: struct {
            width: ?usize = null,
            kind: enum { auto, fixed, scalable } = .auto,
            predicate: enum { auto, enable, disable } = .auto,
        },
        disable,
    } = .auto,
    interleave: union(enum) {
        auto,
        disable,
        enable,
        count: usize,
    } = .auto,
    distribute: enum { auto, enable, disable } = .auto,
};

where the .auto options are like not calling @loopHint. There would be some redundancy here: setting a count/width to 1 would be like disable, and it's not clear what count/width of 0 should mean. By using default values, we can always start with a more restricted LoopHint and expand it over time.

Note that the effect of hints provided by @loopHint will depend on the backend used and they shouldn't be considered to be specifying hard requirements of the generated code.

Similar to @branchHint, @loopHint would have to be at the start of a loop body, but I would suggest that when both @branchHint and @loopHint are used, they can come in either order:

// these two loops are equivalent
for (0..n) |i| {
    @branchHint(.unpredictable);
    @loopHint(.{ .unroll = .disable });
    doStuff(i);
}

for (0..n) |i| {
    @loopHint(.{ .unroll = .disable });
    @branchHint(.unpredictable);
    doStuff(i);
}

Motivation

The initial motivation for wanting a @loopHint was to control LLVM's unrolling when implementing functions like memcpy/memset/memmove. While it is possible to force a certain amount of unrolling with inline loops, LLVM will still try to unroll these loops further on some targets, and can be too aggressive when doing so. While an @unrollHint would be sufficient for these use-cases, even if we only want to initially support providing unroll hints @loopHint could be forwards compatible with any future loop-optimisations hints we want to support, requiring no language change (other than changing std.builin.LoopHint, if that counts as a language change).

@dweiller dweiller linked a pull request Dec 8, 2024 that will close this issue
@mlugg mlugg added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Dec 10, 2024
@mlugg mlugg added this to the 0.15.0 milestone Dec 10, 2024
@dweiller dweiller mentioned this issue Dec 15, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants