Llama 3.1 8B fp16 TP8 sharded fails to compile for CPU and GPU #19263

aviator19941 · 2024-11-22T03:24:10Z

What happened?

When I try to compile the sharded Llama 3.1 8b fp16 IR for CPU or GPU:

I get this error for CPU:
https://gist.github.com/aviator19941/82bceb2624571d446da0964440790fde

and this error for GPU:
https://gist.github.com/aviator19941/89761b3bbb6ace5a6945de667e6d1e39

I tried to use these flags that were suggested to be used when compiling Llama as well:
--iree-dispatch-creation-enable-aggressive-fusion=true --iree-global-opt-propagate-transposes=true --iree-opt-aggressively-propagate-transposes=true --iree-opt-data-tiling=false --iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' --iree-hal-indirect-command-buffers=true --iree-stream-resource-memory-model=discrete --iree-hip-legacy-sync=false --iree-hal-memoization=true --iree-opt-strip-assertions

Steps to reproduce your issue

wget the IR: https://gist.github.com/aviator19941/bab5886f53f2fd0b3b8458519148542c
Try to compile for CPU:
../iree-build-no-trace/tools/iree-compile 8b_f16_tp8_decomposed.mlir -o=8b_f16_tp8_decomposed_cpu.vmfb --iree-hal-target-device=llvm-cpu[0] --iree-hal-target-device=llvm-cpu[1] --iree-hal-target-device=llvm-cpu[2] --iree-hal-target-device=llvm-cpu[3] --iree-hal-target-device=llvm-cpu[4] --iree-hal-target-device=llvm-cpu[5] --iree-hal-target-device=llvm-cpu[6] --iree-hal-target-device=llvm-cpu[7]
CPU error: https://gist.github.com/aviator19941/82bceb2624571d446da0964440790fde
Try to compile for GPU:
../iree-build-no-trace/tools/iree-compile 8b_f16_tp8_decomposed.mlir --iree-hip-target=gfx942 -o=8b_f16_tp8_decomposed.vmfb --iree-hal-target-device=hip[0] --iree-hal-target-device=hip[1] --iree-hal-target-device=hip[2] --iree-hal-target-device=hip[3] --iree-hal-target-device=hip[4] --iree-hal-target-device=hip[5] --iree-hal-target-device=hip[6] --iree-hal-target-device=hip[7]
GPU error: https://gist.github.com/aviator19941/89761b3bbb6ace5a6945de667e6d1e39

What component(s) does this issue relate to?

No response

Version information

iree-base-compiler 3.1.0rc20241121

Additional context

No response

The text was updated successfully, but these errors were encountered:

sogartar · 2024-11-22T15:29:03Z

About the CPU compilation error. I made a fix when exporting for the unsharded case where we want no device affinities. This is a sharded variant. At a first glance argument and global parameter affinities look fine. It is probably something with the flow.tensor.transfer ops.

nirvedhmeshram · 2024-11-22T21:27:34Z

The GPU error is in an attention dispatch failing. It is going down the LLVMGPUDistribute which is not the pipeline we want for it. Here is the Input IR for it.

You can run this with

iree-compile attention_dispatch.mlir --iree-hip-target=gfx942 -o=8b_f16_tp8_decomposed.vmfb --iree-hal-target-backends=rocm --compile-from=executable-sources --mlir-print-ir-after-all &> output.mlir

Here is the full dump.

Looks like it has some dynamic shapes so I am guessing vector distribute bailed on it.

CC @raikonenfnu @Groverkss @kumardeepakamd

aviator19941 · 2024-11-22T22:28:23Z

The GPU error is in an attention dispatch failing. It is going down the LLVMGPUDistribute which is not the pipeline we want for it. Here is the Input IR for it.

You can run this with
iree-compile attention_dispatch.mlir --iree-hip-target=gfx942 -o=8b_f16_tp8_decomposed.vmfb --iree-hal-target-backends=rocm --compile-from=executable-sources --mlir-print-ir-after-all &> output.mlir
Here is the full dump.

Looks like it has some dynamic shapes so I am guessing vector distribute bailed on it.

CC @raikonenfnu @Groverkss @kumardeepakamd

I think @sogartar suggested we not compile with --iree-hal-target-backends=rocm since it is considered a legacy flag and will be removed in the future.

nirvedhmeshram · 2024-11-22T23:40:21Z

The GPU error is in an attention dispatch failing. It is going down the LLVMGPUDistribute which is not the pipeline we want for it. Here is the Input IR for it.
You can run this with
iree-compile attention_dispatch.mlir --iree-hip-target=gfx942 -o=8b_f16_tp8_decomposed.vmfb --iree-hal-target-backends=rocm --compile-from=executable-sources --mlir-print-ir-after-all &> output.mlir
Here is the full dump.
Looks like it has some dynamic shapes so I am guessing vector distribute bailed on it.
CC @raikonenfnu @Groverkss @kumardeepakamd
I think @sogartar suggested we not compile with --iree-hal-target-backends=rocm since it is considered a legacy flag and will be removed in the future.

Yes I was just using that to be concise, you can use the new flags and get the same error too.

Groverkss · 2024-11-27T15:38:58Z

On the GPU side, this looks like it is coming because of inner unit dims for K2 dimension of attention. We could either collapse those unit dims to make it work, or I can send a patch tommorow to add support for multiple M/N dimension for intrinsic targetting.

IanWood1 · 2024-11-27T18:14:51Z

On the GPU side, this looks like it is coming because of inner unit dims for K2 dimension of attention. We could either collapse those unit dims to make it work, or I can send a patch tommorow to add support for multiple M/N dimension for intrinsic targetting.

@aviator19941 for some context, is this a regression or something that never worked?

Edit: I thought this might have to do with tensor.concat but instead it looks like we need to be able to canonicalize

%expanded_8645 = tensor.expand_shape %28622 [[0, 1, 2], [3, 4]] output_shape [4, %21, 1, 1, 128]
	 : tensor<?x128xf16> into tensor<4x?x1x1x128xf16>
%collapsed_8653 = tensor.collapse_shape %expanded_8645 [[0], [1, 2, 3], [4]]
	 : tensor<4x?x1x1x128xf16> into tensor<4x?x128xf16>

to a single expand shape. ComposeCollapseOfExpand in ReshapeOpsUtils.h doesn't handle this case.

kumardeepakamd · 2024-11-28T18:19:41Z

This is a regression @IanWood1 . "I can send a patch tommorow to add support for multiple M/N dimension for intrinsic targetting" <- this seems ile rt thing to do.

Groverkss · 2024-12-01T11:01:21Z

#19336

This should fix it

This change adds `linalgExtExpansionFn` to limit sinking of `collapse_shape` ops through `iree_linalg_ext.attention` only when the k2 dimensions are not expanded by the reshape fusion. Currently, GPU Codegen cannot support unit dims on the k2 dimensions, so any `collapse_shape` that expands out unit dimensions on these dims will cause compilation errors. This fixes the unit dim error in #19263 but it uncovered furtherk but unrelated, compilation errors tracked in #19377. Signed-off-by: Ian Wood <[email protected]>

aviator19941 added the bug 🐞 Something isn't working label Nov 22, 2024

aviator19941 changed the title ~~Llama 3.1 8B fp16 sharded fails to compile for CPU and GPU~~ Llama 3.1 8B fp16 TP8 sharded fails to compile for CPU and GPU Nov 22, 2024

aviator19941 assigned sogartar Nov 22, 2024

sogartar removed their assignment Nov 22, 2024

Groverkss self-assigned this Nov 25, 2024

IanWood1 self-assigned this Nov 27, 2024

This was referenced Dec 3, 2024

[Dispatch] Fold collapse(expand) unit dims #19357

Open

[Codegen] Llama 3.1 8B fp16 TP8 compile time regression #19377

Closed

[Dispatch] Don't sink collapse_shape through k2 dims #19379

Merged

This was linked to pull requests Dec 5, 2024

[Dispatch] Don't sink collapse_shape through k2 dims #19379

Merged

[Codegen][LLVMGPU] Avoid long compilation times of warp reduction pipeline #19381

Merged

IanWood1 closed this as completed in #19379 Dec 6, 2024

IanWood1 reopened this Dec 6, 2024

MaheshRavishankar closed this as completed in #19381 Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3.1 8B fp16 TP8 sharded fails to compile for CPU and GPU #19263

Llama 3.1 8B fp16 TP8 sharded fails to compile for CPU and GPU #19263

aviator19941 commented Nov 22, 2024 •

edited

Loading

sogartar commented Nov 22, 2024 •

edited

Loading

nirvedhmeshram commented Nov 22, 2024 •

edited

Loading

aviator19941 commented Nov 22, 2024

nirvedhmeshram commented Nov 22, 2024

Groverkss commented Nov 27, 2024

IanWood1 commented Nov 27, 2024 •

edited

Loading

kumardeepakamd commented Nov 28, 2024

Groverkss commented Dec 1, 2024

Llama 3.1 8B fp16 TP8 sharded fails to compile for CPU and GPU #19263

Llama 3.1 8B fp16 TP8 sharded fails to compile for CPU and GPU #19263

Comments

aviator19941 commented Nov 22, 2024 • edited Loading

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

sogartar commented Nov 22, 2024 • edited Loading

nirvedhmeshram commented Nov 22, 2024 • edited Loading

aviator19941 commented Nov 22, 2024

nirvedhmeshram commented Nov 22, 2024

Groverkss commented Nov 27, 2024

IanWood1 commented Nov 27, 2024 • edited Loading

kumardeepakamd commented Nov 28, 2024

Groverkss commented Dec 1, 2024

aviator19941 commented Nov 22, 2024 •

edited

Loading

sogartar commented Nov 22, 2024 •

edited

Loading

nirvedhmeshram commented Nov 22, 2024 •

edited

Loading

IanWood1 commented Nov 27, 2024 •

edited

Loading