Question regarding block launch order in CUDA #1067

Snektron · 2023-02-24T10:54:28Z

Snektron
Feb 24, 2023

The CUDA C programming guide mentions on page 13:

This decomposition preserves language expressivity by allowing threads to cooperate when solving each sub-problem, and at the same time enables automatic scalability. Indeed, each block of threads can be scheduled on any of the available multiprocessors within a GPU, in any order, concurrently or sequentially, so that a compiled CUDA program can execute on any number of multiprocessors as illustrated by Figure 3, and only the runtime system needs to know the physical multiprocessor count.

However, in the code for agent scan there is this comment:
https://github.com/NVIDIA/cub/blob/5d12837f92ee12016827ad6f1ccbbc963eb428ff/cub/agent/agent_scan.cuh#L408-L412

Does this mean that scan is using undefined behavior here, or have I missed some specification of CUDA?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding block launch order in CUDA #1067

{{title}}

Replies: 0 comments

Select a reply

Question regarding block launch order in CUDA #1067

Snektron Feb 24, 2023

Replies: 0 comments

Snektron
Feb 24, 2023