Skip to content

Consider maximizing grid utilization #1321

Open
@maleadt

Description

@maleadt

We currently maximize block utilization (taking the max threads), which may leave SMs underutilized. We should consider first selecting an optimal amount of blocks, before maximizing the thread could:

    config = launch_configuration(kernel.fun)
    threads = min(length(ps), config.threads)
    # XXX: this kernel performs much better with all blocks active
    blocks = max(cld(length(ps), threads), config.blocks)
    threads = cld(length(ps), blocks)

I'm sure this will lead to some kernels performing worse, though, but it's probably a good thing to test.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions