-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mnn推理时如何降低内存占用 #3086
Labels
User
The user ask question about how to use. Or don't use MNN correctly and cause bug.
Comments
jxt1234
added
the
User
The user ask question about how to use. Or don't use MNN correctly and cause bug.
label
Nov 18, 2024
模型转换的 --fp16 与是否使用 fp16 推理没有关联,使用 fp16 的开关是:编译 mnn 打开 MNN_ARM82 ,创建 session 或者 module 时,precision 设成 low ,这样如果设备支持便会启用 fp16 优化 此外可以考虑用动态量化的方式:
|
谢谢您,我还想问一下,使用动态量化将模型转化为int8后,是不是也是只有模型大小减少,但推理时会反量化,运行内存并没有变化呢 |
使用动态量化(编译 mnn 打开 MNN_LOW_MEMORY 宏 + 设置 memory = low)内存会减少。否则仍然反量化 |
这个配置是会让程序按照int8来计算吗 |
你好,这是只有大模型框架才可以用吗,我普通pytorch模型的框架并没有找到这个接口
在 2024-11-27 16:07:18,"jxt1234" ***@***.***> 写道:
使用动态量化(编译 mnn 打开 MNN_LOW_MEMORY 宏 + 设置 memory = low)内存会减少。否则仍然反量化
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
会的。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
目前使用--fp16将模型大小降低了一倍,但运行过程中内存并无变化,请问该如何修改以降低内存呢
The text was updated successfully, but these errors were encountered: