Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] select v_sat_pk from two i16 or v2i16 #121124

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
49 changes: 49 additions & 0 deletions llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,55 @@ def srl_16 : PatFrag<
(ops node:$src0), (srl_oneuse node:$src0, (i32 16))
>;

def clamp_s16_u8 : PatFrag<
(ops node:$src),
(i16 (AMDGPUsmed3 $src, (i16 0), (i16 255)))
>;

def conc_lo_u8_i16 : PatFrags<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just one Pat, Can use PatFrag not PatFrags and avoid the list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add another pattern of addition... sorry for the trouble its my first PR....

(ops node:$src0, node:$src1),
[
(or
(i16 $src0),
(shl (i16 $src1), (i16 8))
),
(or
(and (i16 $src0), (i16 255)),
(shl (i16 $src1), (i16 8))
)
]
>;

def clamp_v2i16_u8 : PatFrags<
(ops node:$src),
[
(v2i16 (smax (smin $src, (build_vector (i16 255), (i16 255))), (build_vector (i16 0), (i16 0)))),
(v2i16 (smin (smax $src, (build_vector (i16 0), (i16 0))), (build_vector (i16 255), (i16 255))))
]
>;

def conc_lo_v2i16_i16 : PatFrags<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm Isn't matching bitwise ops fragile?

Wouldn't it be better to make v2i8 legal, address codegen regressions (maybe by handling it in TargetLowering/CC stuff as well), then come back to this?

I'm afraid the patterns would constantly break whenever a new combine is added for the bitwise ops like shift/and/or/etc. if we try to match v2i8 ops that are lowered as bitwise ops

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends what you mean by "fragile" but for an optimization it doesn't require robustness. -100 for making v2i8 legal, that's a huge amount of effort for one operation.

(ops node:$src),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These cases are stretching what should be done in patterns, and there are too many of them in one patch. Can you keep this to one pattern per patch, it's much harder to review the test coverage.

These are all implementing the same thing, so we should be canonicalizing to this form so you don't have as many variants to deal with. This is also implementing the same patterns as is matched for the truncating stores, which we should be trying to reuse.

[
(or
(i16 (trunc (i32 (bitconvert $src)))),
(shl
(i16 (trunc(srl (i32 (bitconvert $src)), (i32 16)))),
(i16 8)
)
),
(or
(and (i16 (trunc (i32 (bitconvert $src)))), (i16 255)),
(shl
(and
(i16 (trunc (srl (i32 (bitconvert $src)), (i32 16)))),
(i16 255)
),
(i16 8)
)
)
]
>;

def hi_i16_elt : PatFrag<
(ops node:$src0), (i16 (trunc (i32 (srl_16 node:$src0))))
Expand Down
33 changes: 32 additions & 1 deletion llvm/lib/Target/AMDGPU/SIInstructions.td
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,10 @@
// that are not yet supported remain commented out.
//===----------------------------------------------------------------------===//

class GCNPat<dag pattern, dag result> : Pat<pattern, result>, PredicateControl;
class GCNPat<dag pattern, dag result> : Pat<pattern, result>, PredicateControl, GISelFlags;

let GIIgnoreCopies = 1 in
class GCNPatIgnoreCopies<dag pattern, dag result> : GCNPat<pattern, result>;

class UniformSextInreg<ValueType VT> : PatFrag<
(ops node:$src),
Expand Down Expand Up @@ -3298,6 +3301,34 @@ def : GCNPat <
(v2i16 (V_LSHL_OR_B32_e64 $src1, (i32 16), (i32 (V_AND_B32_e64 (i32 (V_MOV_B32_e32 (i32 0xffff))), $src0))))
>;

multiclass V_SAT_PK_Pat<Instruction inst> {
def: GCNPatIgnoreCopies<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know why you specifically need to ignore copies here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi for global isel, there would be some COPY MIR within the pattern, this ignored the COPY MIR so that it can be matched

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more hazards here and I'd rather leave that for a separate patch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arsenm , so should I do the following:

  1. do not ignore copy here
  2. change back for global isel cases

Or should I do:

  1. do not ignore copy here
  2. add the copy into the pattern

Thanks a lot, and sorry for the late reply

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use GCNPatIgnoreCopies. Anything related to the globalisel handling should be done separately

(i16 (conc_lo_u8_i16 (clamp_s16_u8 i16:$lo), (clamp_s16_u8 i16:$hi))),
(inst
(V_LSHL_OR_B32_e64 VRegSrc_32:$hi, (S_MOV_B32 (i32 16)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to materialize this constant, you can just directly use the inline immediate

VRegSrc_32:$lo))
>;

def: GCNPatIgnoreCopies<
(i16 (conc_lo_u8_i16 (clamp_s16_u8 i16:$lo), (smax i16:$hi, (i16 0)))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tihnk there are missing hasOneUse checks throughout this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arsenm , may I ask why since if there are other uses of the (clamp_s16_u8 i16:$lo), the DAG is just not going to fold...

(inst
(V_LSHL_OR_B32_e64 VRegSrc_32:$hi, (S_MOV_B32 (i32 16)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, you can directly use the inline immediate in the output

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(V_LSHL_OR_B32_e64 VRegSrc_32:$hi, (S_MOV_B32 (i32 16)),
(V_LSHL_OR_B32_e64 VRegSrc_32:$hi, (i32 16),

Same here, you can directly use the inline immediate in the output (maybe can drop the type annotation too)

VRegSrc_32:$lo))
>;

def: GCNPatIgnoreCopies<
(i16 (conc_lo_v2i16_i16 (clamp_v2i16_u8 v2i16:$src))),
(inst VRegSrc_32:$src)
>;
}

let OtherPredicates = [NotHasTrue16BitInsts] in {
defm : V_SAT_PK_Pat<V_SAT_PK_U8_I16_e64>;
} // End OtherPredicates = [NotHasTrue16BitInsts]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a blank line between the two for readability

Also the // End comment is not really needed if the whole thing is just 3 lines IMO, so I'd remove it, but that's really a small nit

let True16Predicate = UseFakeTrue16Insts in {
defm : V_SAT_PK_Pat<V_SAT_PK_U8_I16_fake16_e64>;
}

// With multiple uses of the shift, this will duplicate the shift and
// increase register pressure.
def : GCNPat <
Expand Down
Loading
Loading