-
Notifications
You must be signed in to change notification settings - Fork 789
Enhance querying kernels preferred wgsize #16186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Enhance querying kernels preferred wgsize #16186
Conversation
b6a69bc
to
d57635b
Compare
Co-authored-by: Georgi Mirazchiyski <[email protected]>
d57635b
to
71739a8
Compare
@intel/llvm-reviewers-runtime Could I get a review on this when it is possible, Thanks! |
_ZN4sycl3_V16detail22reduGetPreferredWGSizeERSt10shared_ptrINS1_10queue_implEEm | ||
_ZN4sycl3_V16detail28reduGetPreferredDeviceWGSizeERSt10shared_ptrINS1_10queue_implEEm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ABI breaking changes? If so, we need to put them under the fpreview-breaking-flag
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is little to no need to rename reduGetPreferredWGSize
to reduGetPreferredDeviceWGSize
. Leaving it as it was avoids an ABI breaking change too.
|
||
// If the reduction kernel is not name defined, we won't be able to query the | ||
// exact kernel for the best wgsize, so we query all the reduction kernels for | ||
// thier wgsize and use the minimum wgsize as a safe and approximate option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// thier wgsize and use the minimum wgsize as a safe and approximate option. | |
// their wgsize and use the minimum wgsize as a safe and approximate option. |
@@ -2741,7 +2779,29 @@ void reduction_parallel_for(handler &CGH, range<Dims> Range, | |||
// TODO: currently the preferred work group size is determined for the given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this TODO be updated based on the changes in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added just an initial comment now based on first skim, I'll finish up review on a follow-up more involved look.
As a first pass this looks okay, though it will be inaccurate for unnamed lambda kernels / auto_name as they are not queried via kernel bundles here.
Ideally, I am interested to hear from @steffenlarsen and/or @aelovikov-intel if time allows them, if have any suggestions on tackling this issue design-wise. (here was a related quite brutal attempt to use kernel bundles in all cases / unnamed vs named kernel lambdas #16009 but the refactoring is quite large and unsightly)
@@ -1515,6 +1536,8 @@ template <> struct NDRangeReduction<reduction::strategy::range_basic> { | |||
using Name = __sycl_reduction_kernel<reduction::MainKrn, KernelName, | |||
reduction::strategy::range_basic>; | |||
|
|||
WGSize = std::min(WGSize, reduGetPreferredKernelWGSize<Name>(Queue)); | |||
|
|||
CGH.parallel_for<Name>(NDRange, Properties, [=](nd_item<1> NDId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are recalculating WGSize
now based on the named kernel's info query, we likely need to update the NDRange
for the kernel dispatch here and in all of the other reduction strategy implementations specialisations.
auto ExecBundle = | ||
get_kernel_bundle<KernelName, bundle_state::executable>(Ctx, {Dev}); | ||
kernel Kernel = ExecBundle.template get_kernel<KernelName>(); | ||
MaxWGSize = Kernel.template get_info<work_group_size>(Dev); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to reduGetPreferredWGSize
in reduction.cpp, I think this function should probably also respect the SYCL_REDUCTION_PREFERRED_WORKGROUP_SIZE
environment variable value from SYCLConfig
.
I haven't had time today, but will try to do it tomorrow. |
I had a chat with @tahonermann and he pointed me to this llvm/sycl/include/sycl/kernel.hpp Lines 44 to 58 in 4881d6d
|
@aelovikov-intel sry for late reply, I gave that a try but the kernel type seem to be an unnamed type in the context of reduction kernels. It gave me this error:
I might have a misunderstanding here so if you could elaborate the idea more that would be great. (I was using the
|
Why isn't this enabled?
You'd still need to wrap |
I suspect it is. I think the issue is that a named kernel is being provided (a specialization of The only workaround I was able to come up with was to, instead of wrapping the kernel name, to wrap the kernel object instead. This might impose some overhead, but it should work. https://godbolt.org/z/35T9absYd.
|
@tahonermann I tried that and it seems reasonable for wrapping the kernel name but i am still a little confused by how should we get the unnamed kernel name at runtime to query it for the wgsize? |
What I demonstrated was wrapping the kernel name when a named type is provided and wrapping the kernel object in a lambda otherwise (and letting the kernel name default to The SYCL 2020 specification doesn't provide an interface to reflect a kernel name given a kernel type or object. This seems intentional since the same kernel type and/or object can be associated with multiple (explicitly provided) kernel names. If we can design a useful interface to reflect kernel names that appropriately handles the potential 1-N relationship, I think it would be worthwhile proposing it for standardization. In the meantime, you can use the As part of the SYCL upstreaming effort, we are planning to retire the |
Ah okay, that make sense.
Yeah, that would be useful to have an interface like that to make that cases more concrete. Might give that a thought and see if I could come with some good ideas about that.
Thanks for sharing this info, I wasn't aware of the |
Work-group sizes currently rely on device maximum rather than the max from a kernel query. This could result in an error raised as the device maximum could be more than what the kernel is actually allowed to use.
This PR uses an approach to make choosing the wgsize more safer for the kernels. The approach used composed of 2 sides:
parallel_for<class Name>
then we use this name to query the best wgsize for this kernel.The second approximate approach part could be more accurate by using this PR that would give each reduction kernel a unique name that would make querying them possible at runtime.