You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Chengyang, thanks for your great code!
I'm trying to reproduce the GLAT+DSLP model, I checked your given training scripts, but I found there is no "--arch glat_sd" registered model in the code, is it should be "nat_sd_glat"?
BTW, what's the meaning of "ss" and "sd"? Does "sd" mean supervised deeply? how about "ss"
Thank for your answer!!
The text was updated successfully, but these errors were encountered:
The meaning of ss and sd was used for development, which I should have changed after writing the paper.
So ss means schedule sampling, where I mix the ground truth tokens with predicted tokens. The s is a notation for layer-wise prediction, but I don't really remember why I used s. d means deep supervision.
Hi Chengyang, thanks for your great code!
I'm trying to reproduce the GLAT+DSLP model, I checked your given training scripts, but I found there is no "--arch glat_sd" registered model in the code, is it should be "nat_sd_glat"?
BTW, what's the meaning of "ss" and "sd"? Does "sd" mean supervised deeply? how about "ss"
Thank for your answer!!
The text was updated successfully, but these errors were encountered: