Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mnn推理时如何降低内存占用 #3086

Open
lxh0510 opened this issue Nov 16, 2024 · 6 comments
Open

mnn推理时如何降低内存占用 #3086

lxh0510 opened this issue Nov 16, 2024 · 6 comments
Labels
User The user ask question about how to use. Or don't use MNN correctly and cause bug.

Comments

@lxh0510
Copy link

lxh0510 commented Nov 16, 2024

目前使用--fp16将模型大小降低了一倍,但运行过程中内存并无变化,请问该如何修改以降低内存呢

@jxt1234 jxt1234 added the User The user ask question about how to use. Or don't use MNN correctly and cause bug. label Nov 18, 2024
@jxt1234
Copy link
Collaborator

jxt1234 commented Nov 18, 2024

模型转换的 --fp16 与是否使用 fp16 推理没有关联,使用 fp16 的开关是:编译 mnn 打开 MNN_ARM82 ,创建 session 或者 module 时,precision 设成 low ,这样如果设备支持便会启用 fp16 优化

此外可以考虑用动态量化的方式:

  1. 加 --weightQuantBits=8 量化模型
  2. 编译 mnn 打开 MNN_LOW_MEMORY 宏
  3. 设置 memory = low

@lxh0510
Copy link
Author

lxh0510 commented Nov 21, 2024

谢谢您,我还想问一下,使用动态量化将模型转化为int8后,是不是也是只有模型大小减少,但推理时会反量化,运行内存并没有变化呢

@jxt1234
Copy link
Collaborator

jxt1234 commented Nov 27, 2024

使用动态量化(编译 mnn 打开 MNN_LOW_MEMORY 宏 + 设置 memory = low)内存会减少。否则仍然反量化

@lxh0510
Copy link
Author

lxh0510 commented Nov 27, 2024

这个配置是会让程序按照int8来计算吗

@lxh0510
Copy link
Author

lxh0510 commented Nov 28, 2024 via email

@jxt1234
Copy link
Collaborator

jxt1234 commented Nov 28, 2024

这个配置是会让程序按照int8来计算吗

会的。
这个在编译 MNN 时打开宏就可以用

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
User The user ask question about how to use. Or don't use MNN correctly and cause bug.
Projects
None yet
Development

No branches or pull requests

2 participants