convert Onnx problem #12

xcxhy · 2023-05-08T09:06:30Z

Hi, thanks for your open source. I would like to ask why the llama-7b model I converted using torch.onnx.export is not the same as the model published on your hugging face.
I directly run your tools/export-onnx.py with the llama-7b model and it will be OOM. If I open torch_dtype=torch.float16 directly when loading the model, there will be no onnx model at the end of the run.
I only have 8 blocks of 3090. Is there any way to deploy LLAMA's 13B with lora?

tpoisonooo · 2023-05-08T10:29:43Z

Sorry for bad doc.
export-onnx.py is just an entrance to call llama inference.

If you want to convert onnx, you need this hacking branch https://github.com/tpoisonooo/transformers/tree/add-convert

You will get 3 simple commits here:

tpoisonooo · 2023-05-08T10:30:22Z

I just add some torch.onnx.export and verify inside it.

tpoisonooo · 2023-05-08T10:35:01Z

3090 has 24GB memory, that is enough to load fp16 llama7B model, not enough for fp32, so please check the precision.

For llama 13B with lora, I suggest that you

merge lora weight, there should be many scripts about it
quantize to 4B with GPTQ-for-LLaMa on triton cuda
or llama.cpp on CPU

xcxhy · 2023-05-08T11:40:36Z

@tpoisonooo thanks for your response, l have tried the llama.cpp before, and it can indeed accelerate on the cpu, but the inference speed is not significantly improved compared to the GPU.
Also, I tried to quantized to 4bit, but the but the performance will drop sharply.
I hope that convert to TRT can have a greater speed increase.

tpoisonooo · 2023-05-09T01:55:01Z

for 4-bit precision problem, I have contributed --observe option to GPTQ-for-LLaMa, but my inference kernel (layer has different quant option) not finished, AutoGPTQ.tvm would be a good trial while he archived the code two days ago.

for TRT, I have converted to .engine files, precision check is on the way.

tpoisonooo · 2023-05-09T08:29:11Z

cc @xcxhy NVIDIA/TensorRT#2928

xcxhy · 2023-05-09T09:57:17Z

@tpoisonooo Thank you for response, but I'm still stuck at the convert onnx part. Sorry, I pulled your brank, but don't know how to convert onnx. I tried both onnx.export and optimal-cli today, but neither worked.Looking forward to your reply.

tpoisonooo · 2023-05-12T06:00:42Z

@tpoisonooo Thank you for response, but I'm still stuck at the convert onnx part. Sorry, I pulled your brank, but don't know how to convert onnx. I tried both onnx.export and optimal-cli today, but neither worked.Looking forward to your reply.

STEP1. git clone https://github.com/tloen/alpaca-lora and run generate.py example. It requires you install huggingface/transformers

STEP2. After transformers installed, it existed in your conda/pip environment. Find it.

STEP3. Read these commit history, update transformers source code in your conda/pip environment.

STEP4. Run generate.py again, torch.onnx.export in STEP3 would give you onnx files.

xcxhy · 2023-05-16T07:26:32Z

@tpoisonooo Thanks your response. I have studied carefully for many days, and basically got through the process between them, but now I am converting pytorch to ONNX, there will be some if modules in ONNX, but they will still exist after optimization with onnxsim. This leads to an error when using trtexec.

tpoisonooo · 2023-05-23T02:20:00Z

I guess that the if module comes from LLaMa past_keyvalue, try zero_tensor to eliminate it. @xcxhy

tpoisonooo · 2023-05-23T02:20:48Z

AKA build a torch.tensor or np.array with the shape [1,x,0,x]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert Onnx problem #12

convert Onnx problem #12

xcxhy commented May 8, 2023

tpoisonooo commented May 8, 2023

tpoisonooo commented May 8, 2023

tpoisonooo commented May 8, 2023 •

edited

Loading

xcxhy commented May 8, 2023

tpoisonooo commented May 9, 2023

tpoisonooo commented May 9, 2023

xcxhy commented May 9, 2023

tpoisonooo commented May 12, 2023 •

edited

Loading

xcxhy commented May 16, 2023

tpoisonooo commented May 23, 2023

tpoisonooo commented May 23, 2023

convert Onnx problem #12

convert Onnx problem #12

Comments

xcxhy commented May 8, 2023

tpoisonooo commented May 8, 2023

tpoisonooo commented May 8, 2023

tpoisonooo commented May 8, 2023 • edited Loading

xcxhy commented May 8, 2023

tpoisonooo commented May 9, 2023

tpoisonooo commented May 9, 2023

xcxhy commented May 9, 2023

tpoisonooo commented May 12, 2023 • edited Loading

xcxhy commented May 16, 2023

tpoisonooo commented May 23, 2023

tpoisonooo commented May 23, 2023

tpoisonooo commented May 8, 2023 •

edited

Loading

tpoisonooo commented May 12, 2023 •

edited

Loading