-
Notifications
You must be signed in to change notification settings - Fork 286
[Thrust]: New single-pass run length decoding example #6470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Thrust]: New single-pass run length decoding example #6470
Conversation
|
Here's a reproducer for the NVCC 13 + GCC 14 bug. I filed it upstream. Examples should be exemplars, so I think we should leave the lovely ranges code and live with masking off the bad config instead of doing a raw for loop or something. |
a450ace to
72a84ff
Compare
This comment has been minimized.
This comment has been minimized.
…and build infrastructure, and disable the build for CUDA 13 + GCC 14.
…uilds, requires C++20 ranges.
415ac66 to
1930824
Compare
| template <typename ValueType> | ||
| __host__ __device__ CountType operator()(run<ValueType, CountType> r) const | ||
| { | ||
| cuda::std::size_t end = cuda::minimum()(r.offset + r.count, out_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could just be
| cuda::std::size_t end = cuda::minimum()(r.offset + r.count, out_size); | |
| cuda::std::size_t end = cuda::std::min(r.offset + r.count, out_size); |
| run(run const& other) = default; | ||
| run& operator=(run const& other) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: What is the purpose here? do you want to delete move operations?
| auto runs = thrust::make_transform_iterator( | ||
| thrust::make_zip_iterator(values, counts), [] __host__ __device__(thrust::tuple<ValueType, CountType> tup) { | ||
| return run{thrust::get<0>(tup), thrust::get<1>(tup)}; | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have cuda::zip_transform_iterator, just saying
| thrust::device_vector<CountType> counts(size); | ||
| thrust::fill(thrust::device, counts.begin(), counts.end(), repeat); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should rather be
| thrust::device_vector<CountType> counts(size); | |
| thrust::fill(thrust::device, counts.begin(), counts.end(), repeat); | |
| thrust::device_vector<CountType> counts(size, repeat); |
| } | ||
| std::cout << std::endl; | ||
|
|
||
| auto gold = std::views::iota(CountType{0}) | std::views::transform([=](auto x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have cuda::std::iota_view and cuda::std::transform_view at home
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are even available in C++17
🥳 CI Workflow Results🟩 Finished in 2h 51m: Pass: 100%/70 | Total: 9h 57m | Max: 52m 58s | Hits: 99%/115298See results here. |
| __host__ __device__ run(ValueType value = ValueType{}, CountType count = 0, CountType offset = 0, CountType run_id = 1) | ||
| : value(value) | ||
| , count(count) | ||
| , offset(offset) | ||
| , run_id(run_id) | ||
| {} | ||
| run(run const& other) = default; | ||
| run& operator=(run const& other) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: delete special ops and just use aggregate init
| __host__ __device__ | ||
| expand(OutputIterator out, CountType out_size, CountType runs_size, ExpandedSizeIterator expanded_size) | ||
| : out(out) | ||
| , out_size(out_size) | ||
| , runs_size(runs_size) | ||
| , expanded_size(expanded_size) | ||
| {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
| auto runs = thrust::make_transform_iterator( | ||
| thrust::make_zip_iterator(values, counts), [] __host__ __device__(thrust::tuple<ValueType, CountType> tup) { | ||
| return run{thrust::get<0>(tup), thrust::get<1>(tup)}; | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: use zip_function to unpack the tuple
| auto runs = thrust::make_transform_iterator( | |
| thrust::make_zip_iterator(values, counts), [] __host__ __device__(thrust::tuple<ValueType, CountType> tup) { | |
| return run{thrust::get<0>(tup), thrust::get<1>(tup)}; | |
| }); | |
| auto runs = thrust::make_transform_iterator( | |
| thrust::make_zip_iterator(values, counts), thrust::make_zip_function([] __host__ __device__(ValueType v CountType c) { | |
| return run{v, c}; | |
| })); |
| auto expand_out = thrust::make_transform_output_iterator( | ||
| thrust::make_discard_iterator(), expand(out, out_size, runs_size, expanded_size.begin())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: I would find a braced-init list easier to read here:
| auto expand_out = thrust::make_transform_output_iterator( | |
| thrust::make_discard_iterator(), expand(out, out_size, runs_size, expanded_size.begin())); | |
| auto expand_out = thrust::make_transform_output_iterator( | |
| thrust::make_discard_iterator(), expand{out, out_size, runs_size, expanded_size.begin()}); |
Plus, you can drop the ctor of expand now.
|
|
||
| if (static_cast<CountType>(out_end - output.begin()) != size * repeat) | ||
| { | ||
| throw int{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important: let's not show bad practices like throwing exceptions not derived from std::exception. Since this is in main, can we just print an error message and return something nonzero?
Applies one more time below.
Adds a new Thrust example demonstrating single-pass run-length decoding using inclusive_scan with a transform_output_iterator.
Conor thought it couldn't be done, but once again I have prevailed!
Episode 255: 🇩🇰 C++ Copenhagen Meetup & Replicate
Episode 256: 🇩🇰 Algorithms: Replicate, Scatter, Gather & RLD (Part 2)
Episode 257: 🇳🇴 Live from Norway! Replicate, Scatter, Gather & RLD (Part 3)
If you code review this PR, you might be featured in part 4!