Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relocation variants #6364

Merged
merged 10 commits into from
Jan 9, 2024
Merged

Conversation

isidorostsa
Copy link
Contributor

@isidorostsa isidorostsa commented Oct 12, 2023

This PR introduces additional features related to relocation, aimed at both internal and public use. The specifics are:

  • The algorithm uninitialized_relocate_backward and its underlying lower-level primitives have been implemented. Corresponding tests for these algorithms have also been added.

  • The function uninitialized_relocate has been refactored and it now uses a sentinel based for loop, instead of calculating the number of iterations and internally calling uninitialized_relocate_n.

The primitives of uninitialized_relocate_backwards will be utilized internally for the hpx::small_vector::insert() method.

Copy link
Contributor

@Quuxplusone Quuxplusone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. I didn't look closely from the point of my last comment onward: reviewer silence doesn't necessarily mean endorsement. :)

Comment on lines +520 to +534
auto dest_first = std::prev(dest_last, count);

return parallel_uninitialized_relocate_n(
HPX_FORWARD(ExPolicy, policy), first, count, dest_first);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether this is OK depends on the preconditions of your parallel algorithms. I expect that this is not OK. The important thing IMO is that we want this to work:

Widget a[10] = {1,2,3,4,5,6,7,8,9,destroyed};
std::uninitialized_relocate_backward(a, a+9, a+10);
assert(a == {destroyed,1,2,3,4,5,6,7,8,9});

(modulo syntax errors and language-lawyering over the lifetimes of those Widget objects). If you copy them in the forward direction, the output will be something like

assert(a == {destroyed,1,1,1,1,1,1,1,1,1});

So the idea of having uninitialized_relocate_backward dispatch to uninitialized_relocate_n smells wrong to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct in pointing this out, however I should look into how (if at all) parallelizable this would be, while retaining the correct ordering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naïvely, it's not parallelizable at all unless you can prove (by pointer comparisons) that the two ranges don't overlap. If they don't overlap, then this implementation is fine. It occurs to me that HPX already has a std::copy{,_backward} implementation; you should look at what it does, and copy that strategy.
(It might be that HPX's std::copy{,_backward} also completely ignore the overlapping issue. If so, then I think it would be perfectly fine for you to copy that ignorance here. uninitialized_relocate_backward is not intended to be any more difficult to implement than copy_backward.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From #6364 (comment):

In standard C++20 we have

  • the non-parallel std::copy_backward, which forbids overlap
  • the ExecutionPolicy overload of std::copy_backward, which also forbids overlap

and thus {parallel_,}vector::erase fundamentally isn't allowed to use std::copy_backward.
D1144R10 currently proposes

  • a non-parallel std::uninitialized_relocate_backward which permits overlap
  • an ExecutionPolicy overload of std::uninitialized_relocate_backward, which also permits overlap

And noting that:

uninitialized_relocate_backward is not intended to be any more difficult to implement than copy_backward.

Are you proposing permitting overlapping ranges with parallel copy, or breaking the symmetry between copy and relocation?

I can experiment with implementing and benchmarking a parallel relocation algorithm for overlapping ranges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm proposing breaking the symmetry between copy and uninitialized_relocate. (But I admit that's bad; and I also admit I still don't know why copy forbids overlap in the first place.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, for now I will structure the code in a way to accept overlaps from the expected direction in each of the forward/backward algorithms.

Do you think they should permit overlaps from both sides?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My view is:

  • uninitialized_relocate{,_n} should permit "shift left" overlap, as needed by vector::erase.
  • uninitialized_relocate_backward should permit "shift right" overlap, as needed by vector::insert.

My proposed wording reflects this since P1144R6: the behavior of uninitialized_relocate is "as if" by a plain old for-loop relocating each element in sequence from first to last, left to right.

Comment on lines +771 to +787
// if count is representing a negative value, we do nothing
if (hpx::parallel::detail::is_negative(count))
{
return parallel::util::detail::algorithm_result<ExPolicy,
BiIter2>::get(HPX_MOVE(dest_last));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be "library UB" according to P1144. You're welcome to treat it as a no-op (documented or undocumented), but it might be more appropriate to assert-fail or something, I don't know.

@codacy-production
Copy link

codacy-production bot commented Oct 12, 2023

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-0.13% 80.38%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (103a7b8) 190583 162311 85.17%
Head commit (74155d9) 191899 (+1316) 163188 (+877) 85.04% (-0.13%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#6364) 943 758 80.38%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@hkaiser
Copy link
Member

hkaiser commented Oct 19, 2023

@isidorostsa could you please rebase this and resolve the conflicts while you do?

@isidorostsa
Copy link
Contributor Author

@isidorostsa could you please rebase this and resolve the conflicts while you do?

@hkaiser, I rebased and resolved the conflicts. Before we finalize the merge, there are some design choices I'd like to discuss:

  1. Relocation variants #6364 (comment): should negative input sizes be treated as a no-op, similar to our move/copy algorithms? Arthur suggests using an assertion failure, and I'm leaning towards that as well. What do you think?
  2. Relocation variants #6364 (comment): Does it make sense to have the backward to not actually perform operations in reverse order when executed in parallel? So when running in parallel the only difference between the default and the backward algorithm is the calling parameters
  3. Codacy false alarm: There are some warnings here that are inaccurate. Do you know how we can disable those? The code it is shouting about is essentially:
void foo() noexcept(!(mode == throwing)){
    if constexpr(mode == throwing) { throw; }
}

@hkaiser
Copy link
Member

hkaiser commented Oct 19, 2023

@isidorostsa could you please rebase this and resolve the conflicts while you do?

@hkaiser, I rebased and resolved the conflicts. Before we finalize the merge, there are some design choices I'd like to discuss:

  1. Relocation variants #6364 (comment): should negative input sizes be treated as a no-op, similar to our move/copy algorithms? Arthur suggests using an assertion failure, and I'm leaning towards that as well. What do you think?

Yes, sounds good.

  1. Relocation variants #6364 (comment): Does it make sense to have the backward to not actually perform operations in reverse order when executed in parallel? So when running in parallel the only difference between the default and the backward algorithm is the calling parameters

Using a backwards algorithm usually has a reason, e.g., you know that doing an inplace operation would overwrite values before they can be used if the operation is done in a certain way. I don't think we should strat cheating here ;-)

  1. Codacy false alarm: There are some warnings here that are inaccurate. Do you know how we can disable those? The code it is shouting about is essentially:
void foo() noexcept(!(mode == throwing)){
    if constexpr(mode == throwing) { throw; }
}

I can take care of those.

@isidorostsa
Copy link
Contributor Author

isidorostsa commented Oct 19, 2023

you know that doing an inplace operation would overwrite values before they can be used if the operation is done in a certain way

@hkaiser In case we want to preserve parallelization while not overwriting objects I have two techniques in mind:

A. Execute the overlapping portion of the relocation sequentially and do what is left in parallel. However, I suspect that in most common use cases, the ranges will largely overlap, rendering the operation mostly sequential.
B. Break down the overlapping relocation into two non-overlapping ones, using an intermediate buffer. But if memory speed is a limiting factor, this could actually reduce performance. Additionally, the unexpected allocation of the buffer could be an issue.

So it may be more practical to opt for a purely sequential implementation. Is there an alternative solution that I'm overlooking?

@hkaiser
Copy link
Member

hkaiser commented Oct 23, 2023

retest lsu

@codacy-production
Copy link

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-0.12% 80.38%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (103a7b8) 190583 162311 85.17%
Head commit (1b08b70) 191899 (+1316) 163194 (+883) 85.04% (-0.12%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#6364) 943 758 80.38%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@hkaiser
Copy link
Member

hkaiser commented Nov 8, 2023

@isidorostsa what's the status for this PR?

@isidorostsa
Copy link
Contributor Author

@isidorostsa what's the status for this PR?

I think last time I pushed the hpx tests were broken, which is why we have not merged yet

But locally the tests related to this are passing.

@hkaiser
Copy link
Member

hkaiser commented Nov 8, 2023

@isidorostsa what's the status for this PR?

I think last time I pushed the hpx tests were broken, which is why we have not merged yet

But locally the tests related to this are passing.

The broken tests are unrelated. So this is good to go, then?

@Pansysk75
Copy link
Member

Pansysk75 commented Nov 8, 2023

The broken tests are unrelated. So this is good to go, then?

@hkaiser
Oh, the tests aren't really running. Take a look here: https://cdash.cscs.ch/build/118631
It's related to this: #6377
I apologize I didn't make that very clear.
I have put fixing them on the top of my queue.

@isidorostsa
Copy link
Contributor Author

I have put fixing them on the top of my queue.

P1144R9 is out already, so we do need to rush merging this!

@isidorostsa isidorostsa force-pushed the relocation_variants branch 2 times, most recently from 961680e to 0f91e09 Compare November 14, 2023 11:53
Copy link

codacy-production bot commented Nov 14, 2023

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-0.30% 89.19%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (103a7b8) 0 0 84.75%
Head commit (323904c) 197388 (+197388) 166693 (+166693) 84.45% (-0.30%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#6364) 925 825 89.19%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation

"Relocating from this source type to this destination type is "
"ill-formed");

auto count = std::distance(first, last);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hkaiser @Pansysk75 @gonidelis Considering #6364 (comment), it may be sensible to include a warning here such as:
"Calling uninitialized_relocate_backward with a non-sequential execution policy does not guarantee the order of execution of the relocations to be backward. Use on overlapping ranges could yield undefined behavior."

However, I am not sure how we would issue such a warning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to do that. The same warning applies to std::copy and std::move. There, it is up to the user to ensure these preconditions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to do that. [...] it is up to the user to ensure these preconditions.

Does that contradict your earlier #6364 (comment) ?

Using a backwards algorithm usually has a reason, e.g., you know that doing an inplace operation would overwrite values before they can be used if the operation is done in a certain way. I don't think we should start cheating here ;-)

Either "no overlap" is a precondition that you don't need to check, or else we expect overlap to be the common case for the backwards algorithm (because doing it forwards would overwrite values in the overlapping portion). But we can't have both!

FYI, I'm very interested in feedback into the design of these algorithms in P1144.

In standard C++20 we have

  • the non-parallel std::copy_backward, which forbids overlap
  • the ExecutionPolicy overload of std::copy_backward, which also forbids overlap

and thus {parallel_,}vector::erase fundamentally isn't allowed to use std::copy_backward.
D1144R10 currently proposes

  • a non-parallel std::uninitialized_relocate_backward which permits overlap
  • an ExecutionPolicy overload of std::uninitialized_relocate_backward, which also permits overlap

so that {parallel_,}vector::erase will be allowed to use std::uninitialized_relocate_backward (because I think that's an important feature). But this means inconsistency between copy_backward (which forbids overlap) and std::uninitialized_relocate_backward (which permits it); and also between std::uninitialized_relocate_backward (which permits overlap) and hpx::experimental::uninitialized_relocate_backward (which currently forbids it).

I think you could fix the latter inconsistency by having hpx::parallel::detail::parallel_uninitialized_relocate_n check for overlap (that is, check whether we're going to end up using memcpy, so we surely have contiguous ranges, and then check whether those contiguous ranges overlap) and if so, then don't call util::partitioner_with_cleanup — just do the whole range sequentially-not-in-parallel (or, for trivially relocatable types, use memmove instead of memcpy). (Or to put it a different way that implies more major surgery: implement a parallel_memmove primitive and then make the ExecutionPolicy overloads of uninitialized_relocate{,_n,_backward} dispatch straight to parallel_memmove. But then you get a parallel speedup only for trivially relocatable types, because for non-trivial types and/or non-pointer iterators you can't tell whether the ranges overlap or not? Yuck.)

OTOH, maybe you'd rather that P1144's std::uninitialized_relocate_backward should forbid overlap for consistency with std::copy_backward. But if so, then I'd like to know what's your plan for implementing things like {parallel_,}vector::erase. I think the Standard Library needs some primitive algorithm that fits that use-case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Quuxplusone Thank you for the detailed write-up, and sorry for my late response.

I think my previous comment: #6364 (comment), is incorrect and there does exist a way to parallelize the relocation of overlapping ranges, maybe like in the following image: (example for 2 threads and an overlapping offset of 1)

image

Supposing this is true, I see no reason to avoid offering this as a feature. Furthermore, I don't see why we would need non-overlapping ranges to be a precondition for relocation!

I think you could fix the latter inconsistency by having hpx::parallel::detail::parallel_uninitialized_relocate_n check for overlap

I think for the time being we will do this, only parallelizing the guaranteed safe case (contiguous iterator + no overlap), and have the rest be sequenced, but definitely add proper parallelization support on the todo list. @hkaiser What do you think?

maybe you'd rather that P1144's std::uninitialized_relocate_backward should forbid overlap .... If so, I'd like to know what's your plan for implementing things like {parallel_,}vector::erase

For the time being we do not implement parallel data structures, but there is a draft PR for using relocation inside hpx::detail::small_vector (which is like static_vector except it grows to dynamic storage after running out of static space) (isidorostsa#9).

@isidorostsa
Copy link
Contributor Author

I think for the time being we will do this, only parallelizing the guaranteed safe case (contiguous iterator + no overlap), and have the rest be sequenced, but definitely add proper parallelization support on the todo list. @hkaiser What do you think?

@hkaiser this is okay to merge if you agree with this

@hkaiser hkaiser merged commit fcac367 into STEllAR-GROUP:master Jan 9, 2024
64 of 72 checks passed
@isidorostsa
Copy link
Contributor Author

@hkaiser I noticed that after the merge some tests are failing and I will get onto fixing that as soon as possible.

@hkaiser
Copy link
Member

hkaiser commented Jan 10, 2024

@isidorostsa I'd like to revert this merge. Let's fix the issues with a new PR. Can you please do that?

@hkaiser
Copy link
Member

hkaiser commented Jan 10, 2024

@isidorostsa I'd like to revert this merge. Let's fix the issues with a new PR. Can you please do that?

I take this back. The issues are unrelated to your changes. I will fix them separately.

@isidorostsa
Copy link
Contributor Author

I take this back. The issues are unrelated to your changes. I will fix them separaty.

Thank you for letting me know. Unfortunately I did not get a chance to review it myself due to traveling for the visa.

@hkaiser hkaiser added this to the 1.10.0 milestone Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants