Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for leading_zeros and trailing_zeros #213

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Firestar99
Copy link
Member

@Firestar99 Firestar99 commented Jan 29, 2025

In main they require some obscure Intel Extension. This PR changes them to use the GLSL.std.450 functions FindILsb and FindUMsb, which do the exact same thing. Though they slightly mismatch when passing in 0, rust expects the bit count of the type but they return !0/-1. This requires a branch, making this a lot more complicated to implement, as I don't know how to split the function down the middle into two different blocks.
They are also limited to 32 bits, which I'm currently ignoring.

Currently panics during linking, probably cause I screwed up branching somewhere:

ConsumerError(DetachedInstruction(Some(Instruction { class: Instruction { opname: "Store", opcode: Store, capabilities: [], extensions: [], operands: [LogicalOperand { kind: IdRef, quantifier: One }, LogicalOperand { kind: IdRef, quantifier: One }, LogicalOperand { kind: MemoryAccess, quantifier: ZeroOrOne }] }, result_type: None, result_id: None, operands: [IdRef(114), IdRef(120)] })))

Also has some unused warnings for capability querying that I don't really want to remove yet, as we may need.

Closes #210

@Firestar99
Copy link
Member Author

Firestar99 commented Jan 29, 2025

I basically want this glsl:

#version 450

layout(location=0) flat in uint inbla;
layout(location=0) out uint outbla;

uint trailing_zeros(uint v) {
    uint ret;
    if (v == 0) {
        ret = 32;
    } else {
        ret = findLSB(v);
    }
    return ret;
}

void main() {
    outbla = trailing_zeros(inbla);
}

Which disassembles to:

    %bla_u1_ = OpFunction %uint None %8
          %v = OpFunctionParameter %_ptr_Function_uint
         %11 = OpLabel
        %ret = OpVariable %_ptr_Function_uint Function
         %12 = OpLoad %uint %v
         %15 = OpIEqual %bool %12 %uint_0
               OpSelectionMerge %17 None
               OpBranchConditional %15 %16 %19
         %16 = OpLabel
               OpStore %ret %uint_0
               OpBranch %17
         %19 = OpLabel
         %20 = OpLoad %uint %v
         %22 = OpExtInst %int %1 FindILsb %20
         %23 = OpBitcast %uint %22
               OpStore %ret %23
               OpBranch %17
         %17 = OpLabel
         %24 = OpLoad %uint %ret
               OpReturnValue %24
               OpFunctionEnd

@LegNeato
Copy link
Collaborator

LegNeato commented Jan 29, 2025

Sweet! I actually looked at this too, just didn't put my WIP up. Differences between what I did:

I pushed a branch. I think the code is largely correct, but it is only lightly tested:

https://github.com/LegNeato/rust-gpu/tree/clz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

u32::leading_zeros intrinic requires weird extension to work
2 participants