Added experimental mm padding for cache behavior #825

rsuderman · 2025-01-14T20:51:42Z

Padding to cache lines can significantly improve performance by separating out cache lines between iterations.

rsuderman · 2025-01-14T20:52:24Z

Include sample invocation:

python3 -m sharktank.examples.export_paged_llm_v1 --block-seq-stride=32 --attention-kernel=torch --bs=4 --irpa-file irpa/llama.8b.fp16.irpa --output-mlir /tmp/llama.mlir --experimental-mm-cache-size=128 --^Cperimental-mm-cache-s
ets=4

=

Added experimental mm padding for cache behavior

b630c33

Padding to cache lines can significantly improve performance by separating out cache lines between iterations.

rsuderman force-pushed the pad_mm branch from 3a134ca to b630c33 Compare January 14, 2025 20:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added experimental mm padding for cache behavior #825

Added experimental mm padding for cache behavior #825

rsuderman commented Jan 14, 2025

rsuderman commented Jan 14, 2025

Added experimental mm padding for cache behavior #825

Are you sure you want to change the base?

Added experimental mm padding for cache behavior #825

Conversation

rsuderman commented Jan 14, 2025

rsuderman commented Jan 14, 2025