Skip to content

[FEA] Optimize cub::DeviceMerge by using cub::detail::BlockLoadToShared #6005

@pauleonix

Description

@pauleonix

cub::DeviceMerge is a great candidate for getting BlockLoadToShared tested and benchmarked in the real world as it currently leaves a lot of bandwidth on the table and needs the data in shared memory anyway for merge path and serial merge.

Metadata

Metadata

Labels

cubFor all items related to CUBfeature requestNew feature or request.

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions