Consider maximizing grid utilization

We currently maximize block utilization (taking the max threads), which may leave SMs underutilized. We should consider first selecting an optimal amount of blocks, before maximizing the thread could:

```julia
    config = launch_configuration(kernel.fun)
    threads = min(length(ps), config.threads)
    # XXX: this kernel performs much better with all blocks active
    blocks = max(cld(length(ps), threads), config.blocks)
    threads = cld(length(ps), blocks)
```

I'm sure this will lead to some kernels performing worse, though, but it's probably a good thing to test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider maximizing grid utilization #1321

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider maximizing grid utilization #1321

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions