-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intel gpu backend gemm pipeline #89
Intel gpu backend gemm pipeline #89
Conversation
Jiaxingla
commented
Jul 3, 2024
•
edited
Loading
edited
- High performance gemm pipeline for pvc.
- Enable the prefetch by copy atom.
- Try the epilogue like ReLU and Softmax.
tools/util/include/cutlass/util/reference/device/tensor_compare.h
Outdated
Show resolved
Hide resolved
189753c
to
d82b5a4
Compare
include/cute/arch/copy_xe.hpp
Outdated
@@ -44,6 +44,18 @@ namespace cute | |||
inline x { assert(false); } | |||
#endif | |||
|
|||
enum LSC_LDCC { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// | ||
// Methods | ||
// | ||
|
||
bool verify(const ProblemShapeType& problem_size, ElementCompute alpha, ElementCompute beta) { | ||
bool verify(const ProblemShapeType &problem_size, ElementCompute alpha, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you use automatic code formatting? https://github.com/NVIDIA/cutlass/blob/main/media/docs/programming_guidelines.md#no-automatic-code-formatting
I hope we can use it because I think having to review formatting is a pain. But if so we need to find a tool which follows all of https://github.com/NVIDIA/cutlass/blob/main/media/docs/programming_guidelines.md and only apply to lines we change (e.g. git clang-format
) so that we don't reformat any of the upstream code.
Here format is again wrong (west const, space alignment, line length). Please try to fix it everywhere and I'll review formatting after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--Did you use automatic code formatting?
Yes, we used a wrong code formatting.
I'll correct the clang-format and try my best to fix the format everywhere with cutlass guidelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you are using clang-format please add the config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we aren't using Chang-format?
template <class T> | ||
CUTE_HOST_DEVICE static void copy(const void *baseoffset, int width, | ||
int height, int pitch, intel::coord_t coord) { | ||
#if defined(SYCL_INTEL_TARGET) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is the same as on the unmodified line 170.
I think we can improve this by changing __SYCL_DEVICE_ONLY__
to
#ifdef __SYCL_DEVICE_ONLY__
#ifdef SYCL_INTEL_TARGET
#define SYCL_DEVICE_BUILTIN(x) SYCL_EXTERNAL extern "C" x
#else
#define SYCL_DEVICE_BUILTIN(x) \
inline x { CUTE_INVALID_CONTROL_PATH("Trying to use IGC built-in on non-Intel hardware"); }
#endif
#else
#define SYCL_DEVICE_BUILTIN(x) \
inline x { CUTE_INVALID_CONTROL_PATH("Trying to use device built-in on host."); }
#endif
Up to you whether you want to do this as part of this PR.
e7381bc
to
26d8a88
Compare
039211b
to
8f3bd67
Compare
576261d
to
6bdda75
Compare
@@ -362,14 +340,14 @@ int main(int argc, const char** argv) | |||
using LayoutD = cutlass::layout::RowMajor; | |||
|
|||
using GmemTiledCopyA = XE_2D_U16x8x16x4x2_LD_N; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding some explanation for naming conventions of copy function
include/cute/atom/copy_atom.hpp
Outdated
} else if constexpr (is_tuple<typename Tensor<SEngine, | ||
SLayout>::engine_type::iterator:: | ||
value_type>::value) { | ||
return copy_unpack(*this, src, dst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed please remove this elseif condition and make the changes in the copy traits definition so the execution go into the first if check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this you need to make SrcLayout
in the prefetch copy trait to match decltype(size(src))>::value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the VNNI file from here as it is not needed https://github.com/codeplaysoftware/cutlass-fork/pull/89/files#diff-eef47671b10b1dafabc54befe9bfbc0ed6a690a6688bf0fc638c7f2bf2b72ebcL61-R75
Co-authored-by: Mehdi Goli <[email protected]>
Co-authored-by: Mehdi Goli <[email protected]>
} | ||
int prefetch_k = 0; | ||
|
||
// Manually set the value to 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this comment say Manually set the value to 3
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments. LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Some of my comments haven't been addressed. Are you planning to address them before merging or after? |
Hi @rolandschulz , i do want to address or answer all your comments before merging. Because PR is completely different with beginning, if i have missed any of your comments, please let me know. |
I clicked "resolve" on everything which I can see being addressed. Please check the remaining 3. |
I resolved the format of enum class and answered the questions about clang-format. |
include/cute/arch/copy_xe.hpp
Outdated
@@ -44,6 +44,17 @@ namespace cute | |||
inline x { assert(false); } | |||
#endif | |||
|
|||
enum CacheControl { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be a scoped enum?
int height, int pitch, intel::coord_t coord) { | ||
#if defined(SYCL_INTEL_TARGET) | ||
static_assert(sizeof(T) == 2, "Expected T to have size 2"); | ||
__builtin_IB_subgroup_block_read_prefetch_u16_m16k16v1( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this not match the non-prefetch in the same struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to still not match (u16/u32).
CI is failing due to changes introduced in the latest nightly version of DPCPP. Rebasing on |
include/cute/arch/copy_xe.hpp
Outdated
}; | ||
|
||
/// @brief This function loads data from 2D memory surface. | ||
/// Loads an array of rectangular regions coord(X,Y)..coord(X+W,Y+H) from global memory into registers. | ||
/// Loads 1x1 memory blocks, and each block size is 8x16x32bits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we use different ordering of dimensions for the comment and the name of the struct?
I think all comments have been addressed. If there is something missing, please let us know and it will be added in the follow up PR. |
Enable the prefetch by copy atom. --------- Co-authored-by: Mehdi Goli <[email protected]>