Any explaination about the training speed changes? #923
-
Basically I built a QAT ViT (with some |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
During QAT, for the first training steps, the activation scale factors are computed through statics and this causes the initial slowdown. After that the scales are converted to a parameters, so it is faster since it is not necessary anymore to compute statics. |
Beta Was this translation helpful? Give feedback.
During QAT, for the first training steps, the activation scale factors are computed through statics and this causes the initial slowdown. After that the scales are converted to a parameters, so it is faster since it is not necessary anymore to compute statics.