Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

work in progress optimize softmax: use better task partitioning #76

Merged
merged 3 commits into from
Jul 2, 2024

Commits on Jun 18, 2024

  1. use different kernels(inner & non_inner) for softmax forward & backwa…

    …rd, both have ONE_TILE_PER_CTA static condition(to decide whether to load only one tile per cta.
    iclementine committed Jun 18, 2024
    Configuration menu
    Copy the full SHA
    067d992 View commit details
    Browse the repository at this point in the history

Commits on Jun 20, 2024

  1. Configuration menu
    Copy the full SHA
    1131e33 View commit details
    Browse the repository at this point in the history

Commits on Jun 28, 2024

  1. optimize softmax forward kernel: use better heuristics for task parti…

    …tioning(considering TILE_SIZE and number of blocks), use better eviction policy, better algorithm for softmax online normalizer, and reverse the second loop, specialize the last iteration, etc.
    iclementine committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    ad3f189 View commit details
    Browse the repository at this point in the history