Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenSora-hpcai] OSv1.2 performance optimization #687

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

hadipash
Copy link
Collaborator

@hadipash hadipash commented Oct 9, 2024

TODO:

  • Validate accuracy and visual quality on long training.
  • Update performance tables in README.

Tests were conducted in dynamic DVM mode, on MS daily from 09.04 with CANN 8.0 RC2. Results include training step average time only (no data loading time):

Changes Shape
(res x frames x batch)
Time (s) Change (s) Comment
Original 720p x 51 x 2 30.409
144p x 204 x 10 19.934
Switch to repeat_interleave_ext_v2 720p x 51 x 2 28.913 -1.496 (-4.9%)
144p x 204 x 10 19.872 -0.062 (-0.3%)
Remove SiLU & GELU FP32 upcast 720p x 51 x 2 30.346 -0.062 (-0.2%) No performance improvement,
144p x 204 x 10 20.506 +0.572 (+2.9%) will consult with the MS team.
Convert parameters to BF16 720p x 51 x 2 28.957 -1.452 (-4.8%)
144p x 204 x 10 18.747 -1.187 (-3.9%)
Remove redundant ops.transpose in VAE 720p x 51 x 2 30.448 +0.040 (+0.1%) No changes due to the kernel fusion.
144p x 204 x 10 20.103 +0.168 (+0.8%) Beneficial in KBK & PyNative modes.
Final improvement 720p x 51 x 2 27.896 -2.512 (-8.3%)
144p x 204 x 10 18.804 -1.130 (-5.7%)

Copy link
Collaborator

@zhtmike zhtmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems no code change for Convert parameters to BF16 ?

@hadipash
Copy link
Collaborator Author

seems no code change for Convert parameters to BF16 ?

This refers to the network parameters that are explicitly defined with nn.Parameter(), such as self.scale_shift_table. For some reason, any calculations performed on self.scale_shift_table are upcast to the parameter type (i.e. fp32) and the new type is propagated in the network, even with AMP enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants