Skip to content

Conversation

@pauleonix
Copy link
Contributor

@pauleonix pauleonix commented Sep 5, 2025

Description

closes #5693

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Sep 5, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-project-automation github-project-automation bot moved this to Todo in CCCL Sep 5, 2025
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Sep 5, 2025
@pauleonix pauleonix self-assigned this Sep 5, 2025
@pauleonix pauleonix added the cub For all items related to CUB label Sep 5, 2025
//! @{

//! @brief Collective constructor using a private static allocation of shared memory as temporary storage.
_CCCL_DEVICE _CCCL_FORCEINLINE BlockLoadToShared()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe those should just be _CCCL_DEVICE_API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I see for other block-wide primitives, but if this is the new guideline, sure! Do I understand correctly that I should avoid _CCCL_FORCEINLINE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_CCCL_DEVICE_API is missing from https://nvidia.github.io/cccl/cccl/development/macro.html#visibility-macros. It would be great if any guidelines regarding usage of visibility macros and _CCCL_FORCEINLINE could be codified in https://github.com/NVIDIA/cccl/wiki/Cpp-Coding-Guidelines#general.

Copy link
Contributor Author

@pauleonix pauleonix Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding visibility are you referring to the following?

Defaulted constructors should be marked with _CCCL_HIDE_FROM_ABI

Currently it is listed as a libcu++-specific guideline. Should it rather be a general guideline? And to me "defaulted" means a constructor with = default and not a default-constructor. Is the formulation wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bernhardmgruber Opinions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use _CCCL_DEVICE_API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all members or just constructors? No _CCCL_FORCEINLINE or both? I can add it in another PR.

@fbusato fbusato self-requested a review September 5, 2025 15:31
@pauleonix pauleonix force-pushed the block_load_to_shared branch 7 times, most recently from 1f44d1a to 9427acc Compare September 7, 2025 01:52
Copy link
Contributor

@bernhardmgruber bernhardmgruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work so far! Let's add some unit test to cover the API and show how it's used.

@pauleonix pauleonix force-pushed the block_load_to_shared branch 9 times, most recently from 751e4ad to 0dae021 Compare September 15, 2025 01:01
@pauleonix

This comment was marked as resolved.

@github-actions

This comment has been minimized.

@pauleonix

This comment was marked as resolved.

@github-actions

This comment was marked as outdated.

@github-actions

This comment has been minimized.

@github-actions

This comment was marked as outdated.

@github-actions
Copy link
Contributor

🥳 CI Workflow Results

🟩 Finished in 4h 50m: Pass: 100%/185 | Total: 1d 19h | Max: 1h 18m | Hits: 99%/186891

See results here.

@pauleonix pauleonix merged commit 380be80 into NVIDIA:main Sep 30, 2025
196 of 197 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Sep 30, 2025
@pauleonix pauleonix deleted the block_load_to_shared branch September 30, 2025 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cub For all items related to CUB

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[FEA]: Expose UBLKCP/LDGSTS for gmem->smem as a block algorithm

3 participants