-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] select v_sat_pk from two i16 or v2i16 #121124
base: main
Are you sure you want to change the base?
Changes from all commits
ab0d614
8ca654f
e8d88ba
889198d
582c56a
5319c9f
fe7c1c7
5249824
0e4c4ed
f2dee69
010322a
aa46174
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -315,6 +315,55 @@ def srl_16 : PatFrag< | |
(ops node:$src0), (srl_oneuse node:$src0, (i32 16)) | ||
>; | ||
|
||
def clamp_s16_u8 : PatFrag< | ||
(ops node:$src), | ||
(i16 (AMDGPUsmed3 $src, (i16 0), (i16 255))) | ||
>; | ||
|
||
def conc_lo_u8_i16 : PatFrags< | ||
(ops node:$src0, node:$src1), | ||
[ | ||
(or | ||
(i16 $src0), | ||
(shl (i16 $src1), (i16 8)) | ||
), | ||
(or | ||
(and (i16 $src0), (i16 255)), | ||
(shl (i16 $src1), (i16 8)) | ||
) | ||
] | ||
>; | ||
|
||
def clamp_v2i16_u8 : PatFrags< | ||
(ops node:$src), | ||
[ | ||
(v2i16 (smax (smin $src, (build_vector (i16 255), (i16 255))), (build_vector (i16 0), (i16 0)))), | ||
(v2i16 (smin (smax $src, (build_vector (i16 0), (i16 0))), (build_vector (i16 255), (i16 255)))) | ||
] | ||
>; | ||
|
||
def conc_lo_v2i16_i16 : PatFrags< | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @arsenm Isn't matching bitwise ops fragile? Wouldn't it be better to make v2i8 legal, address codegen regressions (maybe by handling it in TargetLowering/CC stuff as well), then come back to this? I'm afraid the patterns would constantly break whenever a new combine is added for the bitwise ops like shift/and/or/etc. if we try to match v2i8 ops that are lowered as bitwise ops There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Depends what you mean by "fragile" but for an optimization it doesn't require robustness. -100 for making v2i8 legal, that's a huge amount of effort for one operation. |
||
(ops node:$src), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These cases are stretching what should be done in patterns, and there are too many of them in one patch. Can you keep this to one pattern per patch, it's much harder to review the test coverage. These are all implementing the same thing, so we should be canonicalizing to this form so you don't have as many variants to deal with. This is also implementing the same patterns as is matched for the truncating stores, which we should be trying to reuse. |
||
[ | ||
(or | ||
(i16 (trunc (i32 (bitconvert $src)))), | ||
(shl | ||
(i16 (trunc(srl (i32 (bitconvert $src)), (i32 16)))), | ||
(i16 8) | ||
) | ||
), | ||
(or | ||
(and (i16 (trunc (i32 (bitconvert $src)))), (i16 255)), | ||
(shl | ||
(and | ||
(i16 (trunc (srl (i32 (bitconvert $src)), (i32 16)))), | ||
(i16 255) | ||
), | ||
(i16 8) | ||
) | ||
) | ||
] | ||
>; | ||
|
||
def hi_i16_elt : PatFrag< | ||
(ops node:$src0), (i16 (trunc (i32 (srl_16 node:$src0)))) | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -10,7 +10,10 @@ | |||||
// that are not yet supported remain commented out. | ||||||
//===----------------------------------------------------------------------===// | ||||||
|
||||||
class GCNPat<dag pattern, dag result> : Pat<pattern, result>, PredicateControl; | ||||||
class GCNPat<dag pattern, dag result> : Pat<pattern, result>, PredicateControl, GISelFlags; | ||||||
|
||||||
let GIIgnoreCopies = 1 in | ||||||
class GCNPatIgnoreCopies<dag pattern, dag result> : GCNPat<pattern, result>; | ||||||
|
||||||
class UniformSextInreg<ValueType VT> : PatFrag< | ||||||
(ops node:$src), | ||||||
|
@@ -3298,6 +3301,34 @@ def : GCNPat < | |||||
(v2i16 (V_LSHL_OR_B32_e64 $src1, (i32 16), (i32 (V_AND_B32_e64 (i32 (V_MOV_B32_e32 (i32 0xffff))), $src0)))) | ||||||
>; | ||||||
|
||||||
multiclass V_SAT_PK_Pat<Instruction inst> { | ||||||
def: GCNPatIgnoreCopies< | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't know why you specifically need to ignore copies here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi for global isel, there would be some COPY MIR within the pattern, this ignored the COPY MIR so that it can be matched There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are more hazards here and I'd rather leave that for a separate patch There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @arsenm , so should I do the following:
Or should I do:
Thanks a lot, and sorry for the late reply There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do not use GCNPatIgnoreCopies. Anything related to the globalisel handling should be done separately |
||||||
(i16 (conc_lo_u8_i16 (clamp_s16_u8 i16:$lo), (clamp_s16_u8 i16:$hi))), | ||||||
(inst | ||||||
(V_LSHL_OR_B32_e64 VRegSrc_32:$hi, (S_MOV_B32 (i32 16)), | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't need to materialize this constant, you can just directly use the inline immediate |
||||||
VRegSrc_32:$lo)) | ||||||
>; | ||||||
|
||||||
def: GCNPatIgnoreCopies< | ||||||
(i16 (conc_lo_u8_i16 (clamp_s16_u8 i16:$lo), (smax i16:$hi, (i16 0)))), | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tihnk there are missing hasOneUse checks throughout this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @arsenm , may I ask why since if there are other uses of the |
||||||
(inst | ||||||
(V_LSHL_OR_B32_e64 VRegSrc_32:$hi, (S_MOV_B32 (i32 16)), | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here, you can directly use the inline immediate in the output There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Same here, you can directly use the inline immediate in the output (maybe can drop the type annotation too) |
||||||
VRegSrc_32:$lo)) | ||||||
>; | ||||||
|
||||||
def: GCNPatIgnoreCopies< | ||||||
(i16 (conc_lo_v2i16_i16 (clamp_v2i16_u8 v2i16:$src))), | ||||||
(inst VRegSrc_32:$src) | ||||||
>; | ||||||
} | ||||||
|
||||||
let OtherPredicates = [NotHasTrue16BitInsts] in { | ||||||
defm : V_SAT_PK_Pat<V_SAT_PK_U8_I16_e64>; | ||||||
} // End OtherPredicates = [NotHasTrue16BitInsts] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a blank line between the two for readability Also the |
||||||
let True16Predicate = UseFakeTrue16Insts in { | ||||||
defm : V_SAT_PK_Pat<V_SAT_PK_U8_I16_fake16_e64>; | ||||||
} | ||||||
|
||||||
// With multiple uses of the shift, this will duplicate the shift and | ||||||
// increase register pressure. | ||||||
def : GCNPat < | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just one Pat, Can use PatFrag not PatFrags and avoid the list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add another pattern of addition... sorry for the trouble its my first PR....