-
Notifications
You must be signed in to change notification settings - Fork 436
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 2 – features #1613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Pawel Gadzinski <[email protected]>
# | ||
# See LICENSE for license information. | ||
|
||
"""FakeQuant Feature support for nvidia-dlframework-inspect""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes from review: Make more recipe centric as opposed to format centric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that quant_format will be sufficient in all recipes we plan to add, so I haven't changed that. If something more complex will appear I can add new feature.
"""API call responsible for choice between high-precision and FP8 GEMM execution.""" | ||
|
||
for key in config: | ||
if key != "gemm": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand this:
- in the docstring you say that the options to this feature is "gemms", but here you look for "gemm". Did you intend this to be just
gemm
(the variable)? - I don't see how the option provided by the user has any impact on this feature behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is parsed by the api.py from gemms/gemms_struct to the gemm.
Nvidia-DL-Framework inspect handles enabled keyword, not the features.
lambda buffers: sum(_get(buffers, "underflows_num")), | ||
), | ||
"saturations_num": ( | ||
lambda x: (x == 126).sum(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does 126 come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the highest possible value for FP8 - 256 and 127 result in nan's. I added constant to make it clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this feature temporarily - nobody used it and Anmol and Shreyas said they added it only to have some fp8_stats features, I will re-add it in another PR.
Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Paweł Gadziński <[email protected]>
Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Paweł Gadziński <[email protected]>
Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Paweł Gadziński <[email protected]>
Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Paweł Gadziński <[email protected]>
Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Paweł Gadziński <[email protected]>
Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Paweł Gadziński <[email protected]>
Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Paweł Gadziński <[email protected]>
Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Paweł Gadziński <[email protected]>
Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Paweł Gadziński <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Pawel Gadzinski <[email protected]>
for more information, see https://pre-commit.ci
@pggPL thanks for keeping working on this great feature! Just curious do you have a rough eta of getting it merged? I am thinking whether i will just wait it to merge or pull your PR locally to do some tests. And other question, i saw you remove the saturations part, may i know what's the reason? |
@lengerfulluse we intend to merge this PR ~tomorrow. |
Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Paweł Gadziński <[email protected]>
for more information, see https://pre-commit.ci
I removed saturation part, because it was incorrect - it's more complex than I thought and I will readd it in one of the next PRs. |
Signed-off-by: Pawel Gadzinski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/te-ci pytorch |
@pggPL I am conducting the
I got the Unexpected type for quantizer exception at stacked lines: https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/csrc/common.cpp#L35 Looks it can not recognize DebugQuantizer, detailed error as:
I use 25.04 container(cuda 12.9 + rebuilt pytorch 2.6 + TE2.2(with patched inspector PR(1613 + 1614), is there any mis-configuration or some missing? Want to seek your insights. |
Hi @lengerfulluse, thank you for the feedback. We haven't merged the tests yet and it seems that something stopped working. I hope that the PR with the tests will be merged today and this will solve your problem. |
And another small import bug in
While the actual definition is
Maybe you are already aware of, just in case. |
Description
The docs for the NVIDIA-DL-Framework-Inspect support. Needs to be merged after part 1.