Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resize scheduler: enable vectorization #3694

Merged
merged 39 commits into from
Jan 15, 2025
Merged

Resize scheduler: enable vectorization #3694

merged 39 commits into from
Jan 15, 2025

Conversation

naoyam
Copy link
Collaborator

@naoyam naoyam commented Jan 10, 2025

Stacked on #3693

This PR adds a preliminary vectorization support to the resize scheduler. It currently only considers vectorization of the innermost dimension, just because that's good enough for the RoPE cases. It should eventually be extended to support vectorizing multiple innermost dimensions.

naoyam added 24 commits January 6, 2025 15:47
Currently only has one parameter
@naoyam naoyam added the rope label Jan 10, 2025
@naoyam
Copy link
Collaborator Author

naoyam commented Jan 14, 2025

!test

@naoyam naoyam marked this pull request as ready for review January 14, 2025 22:53
@naoyam naoyam requested a review from jjsjann123 January 14, 2025 22:53
Copy link
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

csrc/scheduler/resize.cpp Outdated Show resolved Hide resolved
csrc/scheduler/resize.cpp Outdated Show resolved Hide resolved
csrc/scheduler/resize.cpp Outdated Show resolved Hide resolved
// slicing paths as well. For now, in order to avoid the error due
// to issue #3640, use a size that is divisible by 8.
// std::vector<int64_t> shape({16, 100});
std::vector<int64_t> shape({16, 96});
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjsjann123 Forgot to include this WAR. This is what I mentioned to you that I saw an error due to #3640.

@naoyam
Copy link
Collaborator Author

naoyam commented Jan 15, 2025

!test

@naoyam
Copy link
Collaborator Author

naoyam commented Jan 15, 2025

!test

Base automatically changed from resize_scheduler_reorder to main January 15, 2025 20:29
@naoyam
Copy link
Collaborator Author

naoyam commented Jan 15, 2025

!test

Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
⚡ Recommended focus areas for review

Logic Change

The ResizeScheduler class has been modified to support vectorization. The computeHeuristics function now calculates a vectorization_factor based on the innermost dimension of the largest input. The schedule function has been updated to apply vectorization to the innermost dimension of the reference tensor view. This change may have performance implications and should be reviewed carefully.

auto ref_tv_entry =
    HeuristicDataCacheEntry<HeuristicCompileTime::ReferenceTensors>(
        data_cache, [fusion]() {
          std::vector<TensorView*> data{getReferenceTensor(fusion)};
          return std::make_unique<std::vector<TensorView*>>(std::move(data));
        });
TensorView* ref_tv = ref_tv_entry.get()[0];

// Before applying the vectorization split, any reshape transform of
// the largest input will be cancelled whenever possible, so the
// largest input is used as the reference of vectorization.
auto vec_ref_tv = largest_input != nullptr ? largest_input : ref_tv;

// Only consider the innermost dimension to vectorize for now.
// TODO: Consider vectorizing merged IDs, not just the innermost
params->vectorization_factor = vectorize_helper::getVectorizationFactor(
    runtime_info,
    vec_ref_tv,
    data_cache,
    (int64_t)vec_ref_tv->getLogicalDomain().size() - 1,
    {});

return params;
Potential Bug

In the schedule function, the next_innermost_pos variable is used to keep track of the next innermost position after splitting the reference tensor view. However, if the vectorization_factor is greater than 1, the next_innermost_pos variable is decremented twice, which may lead to incorrect results.

int64_t next_innermost_pos = -1;
// [..., ...]
//        ^
//        +--- next_innermost_pos

if (vec_factor > 1) {
  ref_tv->split(-1, vec_factor);
  --next_innermost_pos;
  // [..., vec_factor]
  //   ^
  //   +--- next_innermost_pos
}
Test Modification

The SliceRotateCatResidual test has been modified to use a shape that is divisible by 8 to avoid an error due to issue #3640. This change may not be related to the vectorization support added in this PR, but it should be reviewed to ensure that the test is still valid.

// Due to #3640, the vectorization analysis may return 4 for this
// fusion since there's the use of the input without
// slicing. However, the correct factor needs to consider the
// slicing paths as well. For now, in order to avoid the error due
// to issue #3640, use a size that is divisible by 8.
// std::vector<int64_t> shape({16, 100});
std::vector<int64_t> shape({16, 96});

@naoyam
Copy link
Collaborator Author

naoyam commented Jan 15, 2025

!test

@naoyam naoyam merged commit ef6f169 into main Jan 15, 2025
33 of 34 checks passed
@naoyam naoyam deleted the resize_scheduler_vec branch January 15, 2025 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants