-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add knobs control inner dim unroll and outer dim unroll in pointwise scheduler #3275
Conversation
…into llu/unroll_outer_dim
!build |
!build |
ffd65d1
to
7e04577
Compare
!build --diff-bench --diff |
!build --diff-bench --diff |
!build --diff-bench --diff |
1 similar comment
!build --diff-bench --diff |
!build --diff-bench --diff |
(1) diffs in
The additional domain
(2) diffs in (3) diffs in (4) diffs in |
!build --diff-bench --diff |
c739569
to
b23cb41
Compare
Yes, we have identified that it is a serde issue. @naoyam confirmed a fix in #3283, |
@@ -640,7 +640,8 @@ void defineHeuristicParamBindings(py::module& nvfuser) { | |||
.PARAM(PointwiseParams, split_grid_y_dim) | |||
.PARAM(PointwiseParams, flip_grid_binding) | |||
.PARAM(PointwiseParams, vectorization_factor) | |||
.PARAM(PointwiseParams, unroll_factor); | |||
.PARAM(PointwiseParams, unroll_factor_inner) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
You may want to change unroll_factor
to unroll_factor_outer
in https://github.com/NVIDIA/Fuser/blob/main/doc/dev/python_scheduling/autotune_pointwise.py#L92, so the script runs as-is?
reference_tv->split(0, pparams->unroll_factor); | ||
// [o-remainder, Unroll| i-remainder, TIDx, Vect] | ||
if (pparams->unroll_factor_inner > 1) { | ||
reference_tv->split(1, pparams->unroll_factor_inner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are splitting on dimension 1? which is the TIDx here right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for 2D scheduler, start with [outer dim, inner dim]
, so here dimension 1 is i-remainder
in [0-outer | 1-i-remainder, 2-TIDx, 3-Vect]
. i-remainder
means what is left after splitting out other dims, e.g. Vect, TIDx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is a behavior change then.
If we look at the above commented code change, we are doing
- reference_tv->split(0, pparams->unroll_factor);
- // [o-remainder, Unroll| i-remainder, TIDx, Vect]
+ if (pparams->unroll_factor_inner > 1) {
+ reference_tv->split(1, pparams->unroll_factor_inner);
Which means the old behavior (outer unroll) is being updated to a default inner unroll instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Should assign unroll to inner dim only when the scheduler is 1D, for 2D should assign to outer dim.
// for 1D scheduler, unroll the inner dimension
// since there is no outer dimension.
if (break_point == 0) {
params->unroll_factor_inner = total_unroll;
params->unroll_factor_outer = 1L;
} else {
// for 2D scheduler, unroll the outer dimension
// to prioritize resue across different rows, will
// be revised in heuristics tuning, e.g. unroll different
// dims based on the broadcast dimension.
params->unroll_factor_inner = 1L;
params->unroll_factor_outer = total_unroll;
}
csrc/scheduler/pointwise.cpp
Outdated
max_vect_unroll_factor, params->vectorization_factor); | ||
params->unroll_factor_inner = total_unroll; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this PR shouldn't impose any functional changes. So I would expect all old use of params->unroll_factor
to be replaced with params->unroll_factor_inner
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. here all the unroll factors go to unroll_factor_inner
through params->unroll_factor_inner = total_unroll;
csrc/scheduler/pointwise.cpp
Outdated
if (pparams->unroll_factor_outer > 1) { | ||
reference_tv->split(0, pparams->unroll_factor_outer); | ||
} | ||
// [o-remainder, o-Unroll| i-remainder, i-Unroll, TIDx, Vect] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit lost about the notation here. What's o-Unroll | i-remainder
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
o
represents outer dim and i
represents inner dim. |
sperates inner dim and outer dim. So here o-Unroll
represents outer unroll
and i-remainder
means what is left in the inner dim after splitting out other domains, e.g. Vect, TIDx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used o-Unroll
and i-Unroll
to distinguish between unroll in outer dim and inner dim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, sorry I was totally not getting |
part here. Now it reads clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some comments for clarity.
// Here and in the following comments:
// prefix [i] represents inner dimension
// prefix [o] represents inner dimension
// [|] separates the outer and inner dimensions
@@ -822,7 +847,9 @@ void schedulePointwise(Fusion* fusion, const PointwiseParams* pparams) { | |||
// Threads | |||
reference_tv->split(0, kThreadX); | |||
// Unroll | |||
reference_tv->split(0, pparams->unroll_factor); | |||
if (pparams->unroll_factor_inner > 1) { | |||
reference_tv->split(0, pparams->unroll_factor_inner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: we are not using unroll_factor_outer
in this branch, is that expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this else
branch is for 1D scheduler, all IDs are merged into 1 domain, there is no outer dim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Since this isn't applying any functional change, should we double check the code diff just to be sure?
!build --diff-bench --diff |
!build --diff-bench --diff |
!test --diff-bench --diff |
Still not sure why code is changed, e.g. |
I'm seeing that in all the PRs. I think there's something going on that is flipping the order of outputs in the generated kernel. It may or may not be related to serde. |
@jjsjann123 I am going to merge this PR after buid test. There are two types of code diffs. |
!build !test |
!tests |
Fix the `autotune_pointwise` script which was broken by #3275. The earlier PR changed the pointwise setting from `unroll_factor` to `inner_unroll_factor`.
What's in this PR?
(1) Added two knobs to control unroll in inner dim and outer dim for pointwise scheduler
(2) The original untoll knob which applies to outer dim is removed.
(3) Extended test
UnrollOnTopOfVectorize
to test 8 different combinations ofvectorization
,inner unroll
, andouter unroll
.(4) Neither
inner unroll
norouter unroll
is used in the heuristics. They are always1
unlessvectorization == 1
, in that case,inner unroll
is used.(5) If
inner or outer unroll factor == 1
, we won't split out an additional domain with size of1
.Why?
These two knobs allows more performance optimizations, e.g. unroll in different dims based on broadcast dims.