Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix QKV dtype in the bwd of FP8+CP #1134

Merged
merged 9 commits into from
Aug 30, 2024
Merged

Conversation

xrennvidia
Copy link
Collaborator

@xrennvidia xrennvidia commented Aug 26, 2024

Description

Fix the QKV dtype in the bwd of FP8+CP.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refractor

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@cyanguwa
Copy link
Collaborator

Can we test both the E4M3 and HYBRID fp8 recipes please? Can we add that to the unit tests?

@cyanguwa cyanguwa assigned cyanguwa and unassigned cyanguwa Aug 26, 2024
@cyanguwa cyanguwa self-requested a review August 26, 2024 18:02
@xrennvidia
Copy link
Collaborator Author

Can we test both the E4M3 and HYBRID fp8 recipes please? Can we add that to the unit tests?

This is easy, we just need to change a flag of FP8 recipe, but I think this probably will not add anything additional to the test. I am comparing the results of CP>1 vs. CP=1. E4M2 and HYBRID share the same code, and I think HYBRID test can cover everything of E4M3. For example the bug in this PR is for HYBRID only, it should work for E4M3 because fwd and bwd dtype are same.

Maybe I am misunderstanding your point. In your mind, do you have anything special of E4M3 that cannot be covered by HYBRID test?

@xrennvidia
Copy link
Collaborator Author

/te-ci pytorch

@xrennvidia
Copy link
Collaborator Author

/te-ci pytorch

@xrennvidia
Copy link
Collaborator Author

/te-ci pytorch

Copy link
Collaborator

@cyanguwa cyanguwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cyanguwa cyanguwa merged commit 9437ceb into NVIDIA:main Aug 30, 2024
15 checks passed
@xrennvidia xrennvidia deleted the xren/cp_fp8_fix branch August 30, 2024 17:04
ptrendx pushed a commit that referenced this pull request Aug 31, 2024
* fix qkv_dtype of FP8+CP

Signed-off-by: Xiaowei Ren <[email protected]>

* config cp correction dtype of FP8+CP

Signed-off-by: Xiaowei Ren <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* code style change

Signed-off-by: Xiaowei Ren <[email protected]>

* always do FP8 CP correction in FP32

Signed-off-by: Xiaowei Ren <[email protected]>

---------

Signed-off-by: Xiaowei Ren <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Charlene Yang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants