Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix for reduce-by-segment with differing input and output types #1987

Merged
merged 3 commits into from
Jan 3, 2025
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl.h
Original file line number Diff line number Diff line change
Expand Up @@ -939,12 +939,10 @@ struct __write_red_by_seg
using std::get;
auto __out_keys = get<0>(__out_rng.tuple());
auto __out_values = get<1>(__out_rng.tuple());
using _KeyType = oneapi::dpl::__internal::__value_t<decltype(__out_keys)>;
using _ValType = oneapi::dpl::__internal::__value_t<decltype(__out_values)>;

const _KeyType& __next_key = get<2>(__tup);
const _KeyType& __current_key = get<3>(__tup);
const _ValType& __current_value = get<1>(get<0>(__tup));
const auto& __next_key = get<2>(__tup);
const auto& __current_key = get<3>(__tup);
const auto& __current_value = get<1>(get<0>(__tup));
Comment on lines +943 to +945
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there is any advantage, but we could get the type with std::tuple_element_t<I, _Tup> rather than using auto.
Effectively I think we end up with the same code, without much more information about the types were using. Just mentioning it to see if you find reason to use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think auto is a bit cleaner here since it is clear which tuple element type the variable will be from the rhs of the equal sign. Also, __current_value will become

const std::tuple_element_t<1, std::tuple_element_t<0, _Tup>>& __current_value = get<1>(get<0>(__tup));

which is more difficult to follow in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree. The only thing I stopped to think about is if somehow with the lvalue ref auto, we are possibly being less specific than we intend to be, but I definitely agree that what you have now is much cleaner.

I'm good with how you have it, I think.

const bool __is_seg_end = get<1>(__tup);
const std::size_t __out_idx = get<0>(get<0>(__tup));

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,65 @@ test_with_usm()
EXPECT_EQ_N(exp_values1, output_values1, n, "wrong values1 from reduce_by_segment");
EXPECT_EQ_N(exp_values2, output_values2, n, "wrong values2 from reduce_by_segment");
}

template <typename KernelName>
void
test_zip_with_discard()
{
constexpr sycl::usm::alloc alloc_type = sycl::usm::alloc::device;
sycl::queue q = TestUtils::get_test_queue();

constexpr int n = 5;

//data initialization
int keys1[n] = {1, 1, 2, 2, 3};
int keys2[n] = {1, 1, 2, 2, 3};
int values1[n] = {1, 1, 1, 1, 1};
int values2[n] = {2, 2, 2, 2, 2};
int output_keys[n] = {};
int output_values1[n] = {};
int output_values2[n] = {};

// allocate USM memory and copying data to USM shared/device memory
TestUtils::usm_data_transfer<alloc_type, int> dt_helper1(q, keys1, n);
TestUtils::usm_data_transfer<alloc_type, int> dt_helper2(q, keys2, n);
TestUtils::usm_data_transfer<alloc_type, int> dt_helper3(q, values1, n);
TestUtils::usm_data_transfer<alloc_type, int> dt_helper4(q, values2, n);
TestUtils::usm_data_transfer<alloc_type, int> dt_helper5(q, output_keys, n);
TestUtils::usm_data_transfer<alloc_type, int> dt_helper6(q, output_values1, n);
TestUtils::usm_data_transfer<alloc_type, int> dt_helper7(q, output_values2, n);
auto d_keys1 = dt_helper1.get_data();
auto d_keys2 = dt_helper2.get_data();
auto d_values1 = dt_helper3.get_data();
auto d_values2 = dt_helper4.get_data();
auto d_output_keys = dt_helper5.get_data();
auto d_output_values1 = dt_helper6.get_data();
auto d_output_values2 = dt_helper7.get_data();

//make zip iterators
auto begin_keys_in = oneapi::dpl::make_zip_iterator(d_keys1, d_keys2);
auto end_keys_in = oneapi::dpl::make_zip_iterator(d_keys1 + n, d_keys2 + n);
auto begin_vals_in = oneapi::dpl::make_zip_iterator(d_values1, d_values2);
auto begin_keys_out = oneapi::dpl::make_zip_iterator(d_output_keys, oneapi::dpl::discard_iterator());
auto begin_vals_out = oneapi::dpl::make_zip_iterator(d_output_values1, d_output_values2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to include a discard iterator in the values as well? I think right now we are just hitting a compilation issue just with the key side of the issue, not the value side, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it makes sense to add this for completeness as the same issue would occur with a discard iterator in the output values. I have added it.


//run reduce_by_segment algorithm
auto new_last = oneapi::dpl::reduce_by_segment(TestUtils::make_device_policy<KernelName>(q), begin_keys_in,
end_keys_in, begin_vals_in, begin_keys_out, begin_vals_out,
std::equal_to<>(), TestUtils::TupleAddFunctor());

//retrieve result on the host and check the result
dt_helper5.retrieve_data(output_keys);
dt_helper6.retrieve_data(output_values1);
dt_helper7.retrieve_data(output_values2);

const int exp_keys[n] = {1, 2, 3};
const int exp_values1[n] = {2, 2, 1};
const int exp_values2[n] = {4, 4, 2};
EXPECT_EQ_N(exp_keys, output_keys, n, "wrong keys from reduce_by_segment");
EXPECT_EQ_N(exp_values1, output_values1, n, "wrong values1 from reduce_by_segment");
EXPECT_EQ_N(exp_values2, output_values2, n, "wrong values2 from reduce_by_segment");
}
#endif

//The code below for test a call of reduce_by_segment with zip iterators was kept "as is", as an example reported by a user; just "memory deallocation" added.
Expand All @@ -118,6 +177,8 @@ int main()
test_with_usm<sycl::usm::alloc::shared, class KernelName1>();
// Run tests for USM device memory
test_with_usm<sycl::usm::alloc::device, class KernelName2>();

test_zip_with_discard<class KernelName3>();
#endif

return TestUtils::done(TEST_DPCPP_BACKEND_PRESENT);
Expand Down
Loading