Inference super slow #15

SinanAkkoyun · 2023-05-29T00:22:56Z

Hello, I only get maybe one token/second whereas I get 30 tokens/second with the default pytorch implementation (running on an H100)

DungMinhDao · 2023-06-01T06:47:57Z

I guess you can try inference with GPU, after making some modifications to the code:

llama/memory_pool.py:        self.sess = ort.InferenceSession(onnxfile, providers=['CUDAExecutionProvider'])

Find all the files with import onnxruntime and add import torch before it.
Also remember to uninstall onnxruntime and install onnxruntime-gpu instead.
Note: it takes 34GB GPU memory for me to load the model, but the inference is fast.

SinanAkkoyun · 2023-06-01T09:10:06Z

I am struggling to get it to run, did you already make it run? Could you please tell me how many tokens/second you get out of the 7b or 13b model? Thank you so much!

DungMinhDao · 2023-06-01T12:43:17Z

I am struggling to get it to run, did you already make it run? Could you please tell me how many tokens/second you get out of the 7b or 13b model? Thank you so much!

I ran the 7B model downloaded from the repo’s given link. About 0.2 token/s for CPU and 20 for GPU

tpoisonooo · 2023-06-02T03:47:37Z

1B needs 4GB memory with float32 format. It is really hard to inference fastly on single CPU.

If you want performance on mobile/laptop CPU, try InferLLM repo https://github.com/MegEngine/InferLLM
For model conversion to NPU/DSP, use llama.onnx

tpoisonooo mentioned this issue Jul 6, 2023

GPU Inference #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference super slow #15

Inference super slow #15

SinanAkkoyun commented May 29, 2023

DungMinhDao commented Jun 1, 2023

SinanAkkoyun commented Jun 1, 2023

DungMinhDao commented Jun 1, 2023

tpoisonooo commented Jun 2, 2023

Inference super slow #15

Inference super slow #15

Comments

SinanAkkoyun commented May 29, 2023

DungMinhDao commented Jun 1, 2023

SinanAkkoyun commented Jun 1, 2023

DungMinhDao commented Jun 1, 2023

tpoisonooo commented Jun 2, 2023