-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert Onnx problem #12
Comments
Sorry for bad doc. If you want to convert onnx, you need this hacking branch https://github.com/tpoisonooo/transformers/tree/add-convert |
I just add some |
3090 has 24GB memory, that is enough to load fp16 llama7B model, not enough for fp32, so please check the precision. For llama 13B with lora, I suggest that you
|
@tpoisonooo thanks for your response, l have tried the llama.cpp before, and it can indeed accelerate on the cpu, but the inference speed is not significantly improved compared to the GPU. |
for 4-bit precision problem, I have contributed for TRT, I have converted to .engine files, precision check is on the way. |
@tpoisonooo Thank you for response, but I'm still stuck at the convert onnx part. Sorry, I pulled your brank, but don't know how to convert onnx. I tried both onnx.export and optimal-cli today, but neither worked.Looking forward to your reply. |
STEP1. git clone https://github.com/tloen/alpaca-lora and run STEP2. After STEP3. Read these commit history, update STEP4. Run |
@tpoisonooo Thanks your response. I have studied carefully for many days, and basically got through the process between them, but now I am converting pytorch to ONNX, there will be some if modules in ONNX, but they will still exist after optimization with onnxsim. This leads to an error when using trtexec. |
I guess that the |
AKA build a |
Hi, thanks for your open source. I would like to ask why the llama-7b model I converted using torch.onnx.export is not the same as the model published on your hugging face.
I directly run your tools/export-onnx.py with the llama-7b model and it will be OOM. If I open torch_dtype=torch.float16 directly when loading the model, there will be no onnx model at the end of the run.
I only have 8 blocks of 3090. Is there any way to deploy LLAMA's 13B with lora?
The text was updated successfully, but these errors were encountered: