Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the _step_slice function in nmt.py file #10

Open
Jomensy opened this issue Jul 4, 2018 · 5 comments
Open

Question about the _step_slice function in nmt.py file #10

Jomensy opened this issue Jul 4, 2018 · 5 comments

Comments

@Jomensy
Copy link

Jomensy commented Jul 4, 2018

Hi Jianshu! I'm reading your papers and code now. I have a question about the function _step_slice in nmt.py file. It seems that there are two GRU layers in this funciton and they output h1 and h2. In your DenseNet paper, there are two GRU layers in multi-scale attention model. So is h1 represents $\hat{s}t$ in Eq.(12) and h2 represents $s_t$ in Eq. (16? I'm confused about this because I have read that there are h1 and h2 in VGG model too but the WAP paper doesn't use two GRU layers. Waiting for your reply~ Thanks! ^^

@JianshuZhang
Copy link
Owner

Yes, correct.
When we sumbitted our WAP paper, our decoder only had one GRU layer. It was two years ago and we didn't release our WAP code at that time. For now, we first release the DenseNet code and the VGG code only changes the encoder architecture, so the decoder in VGG code also has two GRU layers. But the difference is trival, little effect on the performance (may be only for CROHME dataset, it is too small).

@Jomensy
Copy link
Author

Jomensy commented Jul 4, 2018

Thanks!

@Jomensy
Copy link
Author

Jomensy commented Sep 17, 2018

Hi Jianshu! Recently I have accomplished the multi-scale attention branch based on your code. I have trained the model using my GPU and the time cost of training greatly raises after adding the multi-scale attention branch (without multi-scale branch a epoch training spends 5min and a epoch training spends 15min after adding the branch). I'm not sure whether my codes are written correctly. Do you spend much more time on your training after adding the multi-scale branch? Besides, should I need to change the hyper-parameters such as learning rate, patience(your code sets it to 15) and optimizer after adding the branch since my code seems to converge slower than your original code. Waiting for your reply~ Thanks!

@JianshuZhang
Copy link
Owner

JianshuZhang commented Sep 17, 2018 via email

@Jomensy
Copy link
Author

Jomensy commented Sep 17, 2018

Yes, time cost raises much, that's why I didn't use higher resolution features, although it brings pleasant performance improvement. Guess you used Ti GPU? I used Telsa K40 GPU, 10 min one epoch for single branch, 16 min one epoch for two branches, while your time cost increases nearly three times, that's too much compared with mine, may be need to be checked. I didn't change learning rate and other hyper-parameters, only multi-scale branch CNN parameters are added. [email protected] From: Jomensy Date: 2018-09-17 20:02 To: JianshuZhang/WAP CC: Jianshu Zhang; Comment Subject: Re: [JianshuZhang/WAP] Question about the _step_slice function in nmt.py file (#10) Hi Jianshu! Recently I have accomplished the multi-scale attention branch based on your code. I have trained the model using my GPU and the time cost of training greatly raises after adding the multi-scale attention branch (without multi-scale branch a epoch training spends 5min and a epoch training spends 15min after adding the branch). I'm not sure whether my codes are written correctly. Do you spend much more time on your training after adding the multi-scale branch? Besides, should I need to change the hyper-parameters such as learning rate, patience(your code sets it to 15) and optimizer after adding the branch since my code seems to converge slower than your original code. Waiting for your reply~ Thanks! — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Thanks for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants