-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow linear to be consumed by nvFuser by default #1371
base: main
Are you sure you want to change the base?
Allow linear to be consumed by nvFuser by default #1371
Conversation
There are multiple concerning CI failures:
|
Co-authored-by: Jingyue Wu <[email protected]>
We did not run Thunder benchmarks using nvfuser linear. Should we run other benchmarks as well before enabling it by default? Additionally, @wujingyue needed to remove support for 1D weights to facilitate DID-aware execution. We might have to add an additional check on Thunder side or use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
orthogonal to this PR but related to nvfuser knobs: are there enable_foobar
s other than linear and matmul?
Yes, there's also |
CI with torch-nightly is now passing with NVIDIA/Fuser#3369 fixed. CI with older versions of torch (and therefore older versions of nvFuser) still fail, because we can't fix a past version. @IvanYashchuk, can you up nvFuser's version and enable linear only for that? |
Yes, I will do that. |
I'm unsure about the lightning-thunder (ipynb) test. Is that an infra failure or indeed a regression? |
For some reason, there was a timeout in the cell execution. Let's try rerunning the failed job. |
@kiya00, could you please check what's going on with the Notebooks CI job? |
Env: When use 30 or more layers of linear+relu, the nvFusion0 becomes slow, so the notebook runs out of time
the nvfuser repro script I saved from
by print(fd.repro_script_for(args))
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR, but if this means needing to add correctness exceptions to half a dozen models, and the savings on Llama are marginal, we should not do this by default and instead use recipes to enable it where desired.
Which correctness exceptions are you referring to, @t-vi? I agree we can't enable something by default if it causes timeouts in some models, but that's just a speed motivation. @kiya00 Is there a bug for the slowness issue? @kevinstephano, do you think it's worth investigating? @kshitij12345, @IvanYashchuk, @jjsjann123, should we consider requiring nv fusions have enough nodes to be valuable (2+?) and not so many nodes they might be slow (< 30?). Maybe we could develop better heuristics in the future |
Yes, this is the issue Kevin created for nvfuser #1490 (comment) |
While there is a real issue, between 20 to 30 matmuls in a fusion that leads to an explosion in segmentation time, I am not sure how often we are going to see so many matmuls in one fusion. It would be interesting to see if this comes up in actual models. |
This change lowers peak memory usage of LitGPT implementations that use
mlp_class_name="GptNeoxMLP"
configuration (#1175, #1233, #246).Better memory usage comes from simplifying the setup for Thunder's fusion rematerialization. With this change, there are fewer "producer" fusions.
cc @Priya2698, @wujingyue