[BuddyLLaMA] Add GPU lowering example for LLaMA inference. #229

SForeKeeper · 2023-11-03T04:36:38Z

This pull request adds basic instructions for lowering LLaMA MLIR for inference on GPU.
** It needs to be polished after buddy-mlir updates to the latest version of LLVM **
Notice that it doesn't involve any optimization for GPU, hence don't expect a high inference rate.
Optimizations would be implemented gradually using transform dialect, which involves another lowering pipeline.

notion-workspace · 2023-11-03T05:36:15Z

E2E inference Walkthrough

…y-compiler#268)

Co-authored-by: xtayex <[email protected]> Co-authored-by: zhanghb97 <[email protected]>

…uddy-compiler#279)

SForeKeeper force-pushed the liam-gpu branch from 45858d3 to 8c4ffa2 Compare December 4, 2023 04:14

SForeKeeper added 10 commits December 24, 2023 20:39

Add python venv to gitignore.

de8d569

Prevent generation of llama weight.

f49ae86

Add mmap to Memref Container.

8e3d880

Generate powf instead of fpowi in linalg lowering.

cbf09a1

Add instructions for lowering GPU code.

9c038b5

Fix pipeline filename.

36f095e

Fix compilation instruction.

de8064a

Add llama-gpu pipeline in cmake.

4038377

Add GPUHostRegister pass.

43faa06

Add temp pass test.

64184f0

SForeKeeper force-pushed the liam-gpu branch from 8c4ffa2 to 64184f0 Compare December 25, 2023 06:19

SForeKeeper and others added 17 commits December 25, 2023 21:21

Add function skeleton for alloc checking.

afbbc18

Update host registration process.

8a0f925

Update host register logic.

d6fcd04

Update host registration so it works correctly again.

679a500

[tests] Fix ContainerTest check error on MacOS + Apple Silicon. (budd…

4db76cc

…y-compiler#268)

[Git] Add shallow option to speed up pulling code (buddy-compiler#259)

f751446

[DAP] Add IIR vectorization pass. (buddy-compiler#241)

1e88a5c

[VectorExp] Add AOT pipeline.

cc7eab4

[DAP] Fix LowerDAPPass (buddy-compiler#245)

2298a50

[examples] Improve Tosa Dialect examples (buddy-compiler#271)

8ca5b72

[Python] Add pybind11 dependency (buddy-compiler#272)

d2a1d52

[test] Exclude bert examples.

f5aeb8f

[NFC] Fix example paths in examples/README.md

4a76ae8

[VectorExp] Add GetVL and SetVL operations to VectorExp dialect.

5ba44b5

[examples] Fix examples of MLIR Tensor dialect.

531460a

[Benchmark] Fix conv-opt lowering flops benchmark

8777287

[frontend] Add Initial Graph Infra.

482b463

Co-authored-by: xtayex <[email protected]> Co-authored-by: zhanghb97 <[email protected]>

meshtag and others added 10 commits January 31, 2024 20:56

[DIP] Modify imgcodecs and ImageContainer to update buddy-benchmark (b…

a679d1a

…uddy-compiler#279)

[NFC] Format the code and eliminate unused variables.

574fcaa

Update gitignores.

9051ae5

Add files.

fb18ab1

Fix incorrect order.

b41981b

Add working example.

7e52db1

Add openmp to current implementation.

68b04d6

Use non-packed version for GPU.

8380ffd

First version of gpu memcpy.

c58bfcd

Finish memref to gpu operation.

a3f0845

SForeKeeper closed this Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BuddyLLaMA] Add GPU lowering example for LLaMA inference. #229

[BuddyLLaMA] Add GPU lowering example for LLaMA inference. #229

SForeKeeper commented Nov 3, 2023

notion-workspace bot commented Nov 3, 2023

[BuddyLLaMA] Add GPU lowering example for LLaMA inference. #229

[BuddyLLaMA] Add GPU lowering example for LLaMA inference. #229

Conversation

SForeKeeper commented Nov 3, 2023

notion-workspace bot commented Nov 3, 2023