Fix stable_train_samples #67

Rohan138 · 2024-11-18T12:51:09Z

Fix how stable_train_samples is calculated. This is a ROCm/transformers specific change to add warmup before collecting perf numbers, but it is currently not working as expected. Specifically:

https://github.com/ROCm/transformers/blob/main/src/transformers/trainer.py#L2347 skips the first 10 steps of training
https://github.com/ROCm/transformers/blob/main/src/transformers/trainer.py#L2503 is supposed to skip the first 10 steps' samples, but args.warmup_steps is actually intended for learning rate warmup and defaults to 0.

E.g. if batch_size is 10, total_steps is 150, first 10 steps take 2 seconds, next 140 steps take 1 second, then:

train_samples_per_second = (10 * 150) / (10 * 2 + 140 * 1) = 9.375
stable_train_samples_per_second (expected) = (10 * 150 - 10 * 10) / (140 * 1) = 10.000
stable_train_samples_per_second (current) = (10 * 150) / (140 * 1) = 10.714

Instead, I added a stable_train_warmup_steps argument (default=10) to perform as intended.
With this change, pyt_huggingface_gpt2 perf changes from 559.092 to 529.131 stable_train_samples_per_second

NOTE: This will affect HF perf for QA, execdb, etc.

Rohan138 · 2024-12-04T17:09:40Z

Results from s83-5 with rocm/pytorch-private:20241203_exec_dashboard_pretuned_nightly:

main is the current implementation
nowarmup sets stable_train_warmup_steps to 0 i.e. same as train_samples_per_second
warmup sets stable_train_warmup_steps to 10 i.e. current intended impl, reflected in this DLM PR: https://github.com/ROCm/DeepLearningModels/pull/2044

model	main	nowarmup	warmup	nowarmup/main	warmup/main	metric
pyt_huggingface_bert	950.891	888.915	926.425	93.48%	97.43%	stable_train_samples_per_second
pyt_huggingface_bart	3633.859	2926.991	3457.409	80.55%	95.14%	stable_train_samples_per_second
pyt_huggingface_distilbert-base	3690.956	3280.507	3650.137	88.88%	98.89%	stable_train_samples_per_second
pyt_huggingface_deberta-v2-xxlarge	1838.031	1613.469	1821.666	87.78%	99.11%	stable_train_samples_per_second
pyt_huggingface_gpt_neo	608.781	561.899	609.767	92.30%	100.16%	stable_train_samples_per_second
pyt_huggingface_gpt2	615.86	576.332	669.443	93.58%	108.70%	stable_train_samples_per_second
pyt_huggingface_roberta-large	1576.508	1454.397	1602.028	92.25%	101.62%	stable_train_samples_per_second

Looking at the numbers it looks like the two issues-incorrect number of samples in the current main (overestimating the perf) and warmup over the first 10 iters (underestimating the perf)-effectively cancel each other out.

Rohan138 marked this pull request as ready for review November 18, 2024 13:07

fix stable_train_samples

3309265

Rohan138 force-pushed the fix_stable_train_samples branch from f49dd07 to 3309265 Compare November 18, 2024 13:09

Rohan138 added 4 commits December 3, 2024 13:00

wip

e2972bf

wip

fa8b8c8

wip

8b3dce2

fix merge

799dd28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix stable_train_samples #67

Fix stable_train_samples #67

Rohan138 commented Nov 18, 2024 •

edited

Loading

Rohan138 commented Dec 4, 2024 •

edited

Loading

Fix stable_train_samples #67

Are you sure you want to change the base?

Fix stable_train_samples #67

Conversation

Rohan138 commented Nov 18, 2024 • edited Loading

Rohan138 commented Dec 4, 2024 • edited Loading

Rohan138 commented Nov 18, 2024 •

edited

Loading

Rohan138 commented Dec 4, 2024 •

edited

Loading