Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stable_train_samples #67

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Fix stable_train_samples #67

wants to merge 5 commits into from

Conversation

Rohan138
Copy link

@Rohan138 Rohan138 commented Nov 18, 2024

Fix how stable_train_samples is calculated. This is a ROCm/transformers specific change to add warmup before collecting perf numbers, but it is currently not working as expected. Specifically:

E.g. if batch_size is 10, total_steps is 150, first 10 steps take 2 seconds, next 140 steps take 1 second, then:

  • train_samples_per_second = (10 * 150) / (10 * 2 + 140 * 1) = 9.375
  • stable_train_samples_per_second (expected) = (10 * 150 - 10 * 10) / (140 * 1) = 10.000
  • stable_train_samples_per_second (current) = (10 * 150) / (140 * 1) = 10.714

Instead, I added a stable_train_warmup_steps argument (default=10) to perform as intended.
With this change, pyt_huggingface_gpt2 perf changes from 559.092 to 529.131 stable_train_samples_per_second

NOTE: This will affect HF perf for QA, execdb, etc.

@Rohan138 Rohan138 marked this pull request as ready for review November 18, 2024 13:07
@Rohan138 Rohan138 force-pushed the fix_stable_train_samples branch from f49dd07 to 3309265 Compare November 18, 2024 13:09
@Rohan138
Copy link
Author

Rohan138 commented Dec 4, 2024

Results from s83-5 with rocm/pytorch-private:20241203_exec_dashboard_pretuned_nightly:

main is the current implementation
nowarmup sets stable_train_warmup_steps to 0 i.e. same as train_samples_per_second
warmup sets stable_train_warmup_steps to 10 i.e. current intended impl, reflected in this DLM PR: https://github.com/ROCm/DeepLearningModels/pull/2044

model main nowarmup warmup nowarmup/main warmup/main metric
pyt_huggingface_bert 950.891 888.915 926.425 93.48% 97.43% stable_train_samples_per_second
pyt_huggingface_bart 3633.859 2926.991 3457.409 80.55% 95.14% stable_train_samples_per_second
pyt_huggingface_distilbert-base 3690.956 3280.507 3650.137 88.88% 98.89% stable_train_samples_per_second
pyt_huggingface_deberta-v2-xxlarge 1838.031 1613.469 1821.666 87.78% 99.11% stable_train_samples_per_second
pyt_huggingface_gpt_neo 608.781 561.899 609.767 92.30% 100.16% stable_train_samples_per_second
pyt_huggingface_gpt2 615.86 576.332 669.443 93.58% 108.70% stable_train_samples_per_second
pyt_huggingface_roberta-large 1576.508 1454.397 1602.028 92.25% 101.62% stable_train_samples_per_second

Looking at the numbers it looks like the two issues-incorrect number of samples in the current main (overestimating the perf) and warmup over the first 10 iters (underestimating the perf)-effectively cancel each other out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant