-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bugfix] enable faster rcnn and sd model with oneflow backend #10439
base: master
Are you sure you want to change the base?
[bugfix] enable faster rcnn and sd model with oneflow backend #10439
Conversation
|
||
of_g = OfGraph() | ||
of_g._dynamic_input_graph_cache.set_cache_size(9) | ||
of_g._dynamic_input_graph_cache.enable_shared(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个参数是不是对应了 compile_from_torch
接口 option
中 size
和 dynamic
参数。torch.compile接口参数中有dynamic
参数,我理解应该使用用户传进来的dynamic
参数而不是固定值 True
。size
这里设置为默认的9,可以定义一个常量表示,不使用魔鬼数字。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
基本上是对应的。size 这个确实可以改一下,我给加一个常量。dynamic 这个参数我觉得不用改,一是用户的参数传给了 torch,oneflow backend 拿不到,二是因为 torch compile 这个前端的存在,这里 dynamic 写死为 True 和 设置成用户传的值,两者是等价的。
return self.fx_md(*args, **kwargs) | ||
if self.fx_md.training: | ||
return self.fx_md(*args, **kwargs) | ||
with flow.no_grad(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
训练或者推理模式的区分,with flow.no_grad
,理论上不应该在这里的build
函数中体现,而是在用户模型表达中。对于issue中提到的报错,可以确认一下是不是真的缺少对应的反向算子,通过补充反向算子解决问题。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个我问了开发 fused_multi_head_attention_inference 的俊丞,他说这个算子只实现了前向,没实现反向。如果不在build 里面添加,那要修改 test compile 仓库里面的代码?我测试了只用 model.eval() 无法规避 issue中提到的报错
oneflow backend 对接 torch compile ,在关闭和打开动态形状的时候,跑通了 faster rcnn 和 sd 模型。相关 issue: oneflow backend 对接 torch compile ,运行 faster rcnn
主要改动包括: