Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Added some wording for the init function. #74

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions docs/api/synchronization_library/barrier.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,26 +23,35 @@ cuda::std::barrier<> bb;
cuda::barrier<cuda::thread_scope_block> c;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this default/value/trivial initialization do for barrier?

std::barrier only has the constructor barrierbarrier(ptrdiff_t expected, CompletionFunction f = CompletionFunction());

What is the expected count set to when using default/value/trivial initialization?

```

The class template `barrier` may also be declared without initialization; a
The class template `barrier` may also be declared without initialization in the `cuda::` namespace; a
friend function `init` may be used to initialize the object.
Comment on lines +26 to 27
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The class template `barrier` may also be declared without initialization in the `cuda::` namespace; a
friend function `init` may be used to initialize the object.
Variable declarations whose type is an instantiation of the `barrier` class template may be initialized with the friend function `init`.

I think the "class template barrier" is already declared in the ::cuda namespace and cannot be re-declared there. What can be declared without initialization are variables whose type is an instantiation of that template.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The class template `barrier` may also be declared without initialization in the `cuda::` namespace; a
friend function `init` may be used to initialize the object.
Variables declared with a type that is an instantiation of the `cuda::barrier` class template may be declared without initialization; in such case, they must be initialized by calling a friend function `init` prior to any use.

How about the above?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at my suggested change now, I think I'll want to tweak further, but that's a thing for tomorrow.


```c++
// Shared memory does not allow initialization.
__shared__ cuda::barrier<cuda::thread_scope_block> b;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of a specification of the exact semantics of __shared__ in CUDA C++ and how they interact with default constructors, but I imagine they are very similar to obtaining a pointer to a raw allocation with kernel lifetime, e.g.,

cuda::barrier<cuda::thread_scope_block>* b = ...points to uninitialized storage...;

where init(b, 1) acts like new(b) cuda::barrier<cuda::thread_scope_block>{1}.

Placement-new guarantees that the object will be initialized in place. Why do we need init ? Couldn't we implement the constructor as follows:

cuda::barrier<cuda::thread_scope_block>::barrier(ptrdiff_t expected) {
    init(this, expected);
}

and just use "placement new" for initializing barriers?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is more or less what happens; but having non-experts write placement new expressions is probably not the greatest idea.


init(&b, 1); // Use this friend function to initialize the object.
/*
namespace cuda {
template<thread_scope Sco, class CompletionF>
__host__ __device__ void init(barrier<Sco,CompletionF>* bar, std::ptrdiff_t expected);
template<thread_scope Sco, class CompletionF>
__host__ __device__ void init(barrier<Sco,CompletionF>* bar, std::ptrdiff_t expected, CompletionF completion);
Comment on lines +36 to +39
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we add two overloads of init instead of one with a default CompletionF ?

Suggested change
template<thread_scope Sco, class CompletionF>
__host__ __device__ void init(barrier<Sco,CompletionF>* bar, std::ptrdiff_t expected);
template<thread_scope Sco, class CompletionF>
__host__ __device__ void init(barrier<Sco,CompletionF>* bar, std::ptrdiff_t expected, CompletionF completion);
template<thread_scope Sco, class CompletionF>
__host__ __device__ void init(barrier<Sco,CompletionF>* bar, std::ptrdiff_t expected, CompletionF completion = CompletionF());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Gonzalo that we should document this with just a single overload with completion defaulted.

The reason for why this is implemented as two overloads is that the friend functions are instantiated when you instantiate the class template, so that they can be found by ADL; and if CompletionF does not support default construction, the default argument is ill-formed and the entire program fails (even when you don't actually call init, if I remember how this works and what I encountered here correctly).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.

The standard requires the default type of CompletionF to be DefaultConstructible, http://eel.is/c++draft/thread.barrier#class-6 , but it allows users to specify a non-DefaultConstructible one.

}
*/
```

- Expects: `*bar` is trivially initialized.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intent is to exhibit undefined behavior if this condition is not met:

Suggested change
- Expects: `*bar` is trivially initialized.
- Preconditions: `*bar` is trivially initialized.

See: https://eel.is/c++draft/structure.specifications#3

- Effects: equivalent to initializing `*bar` with a constructor.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which constructor? Arguable there is only one:

Suggested change
- Effects: equivalent to initializing `*bar` with a constructor.
- Effects: equivalent to initializing `*bar` with `barrier::barrier(expected, completion)`.


In the `device::` namespace, a `__device__` free function is available that
provides direct access to the underlying PTX state of a `barrier` object, if
its scope is `thread_scope_block` and it is allocated in shared memory.

```c++
namespace cuda { namespace device {

__device__ std::uint64_t* barrier_native_handle(
__device__ std::uint64_t* barrier_native_handle(
barrier<thread_scope_block>& b);

}}
```

Expand Down