Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No glat_sd arch #12

Open
bbo0924 opened this issue Jun 17, 2022 · 2 comments
Open

No glat_sd arch #12

bbo0924 opened this issue Jun 17, 2022 · 2 comments

Comments

@bbo0924
Copy link

bbo0924 commented Jun 17, 2022

Hi Chengyang, thanks for your great code!
I'm trying to reproduce the GLAT+DSLP model, I checked your given training scripts, but I found there is no "--arch glat_sd" registered model in the code, is it should be "nat_sd_glat"?
BTW, what's the meaning of "ss" and "sd"? Does "sd" mean supervised deeply? how about "ss"
Thank for your answer!!

@chenyangh
Copy link
Owner

Hello @bbo0924 .

Yes, you are right. It should be nat_sd_glat. Sorry for the mistake, I will fix it. Thanks.

@chenyangh
Copy link
Owner

chenyangh commented Jun 22, 2022

The meaning of ss and sd was used for development, which I should have changed after writing the paper.
So ss means schedule sampling, where I mix the ground truth tokens with predicted tokens. The s is a notation for layer-wise prediction, but I don't really remember why I used s. d means deep supervision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants