Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot reproduct the reuslt in pointer-generator with coverage mechanism, always inferior to pgen model. #23

Open
gm0616 opened this issue Dec 13, 2018 · 4 comments

Comments

@gm0616
Copy link

gm0616 commented Dec 13, 2018

My batch_size is 64, I pretrain my model for about 50000 iterations, and get a better result than pgen`s. Then I turn on the coverage mechanism, and train the model with another 2000 iterations. The coverage loss cannot decrease to 0.2 which has been mentioned in pgen model. The final result on rouge-1 metric is about 38.90. Is there any tricks to add coverage mechanism? How can I get the similar result with pgen model ?

@yaserkl
Copy link
Owner

yaserkl commented Dec 13, 2018

No, this issue is well-discussed in the original pointer-generator model page.
Every time you run this model, it will generate a different result due to the multi-processing batching used in this model.
The only solution that I usually use for fixing my model parameters is to use 1 queue for batching and make sure to use seed for randomizers throughout the framework.
Try setting these parameters to 1:
example_queue_threads
batch_queue_threads

If you vary the seed parameter, you might manage to get even better result than the original paper. I've got better result myself as presented in our latest paper.

My personal experience is that the running average loss (at least the way it is defined in this paper) is not the best indicator for selecting the best evaluation model. In the above paper, I'm using the average ROUGE reward during evals as another way of saving my best model and it sometimes work better the running average loss.

@gm0616
Copy link
Author

gm0616 commented Dec 18, 2018

Well, thanks for your response, I`ll try the methods you have mentioned above to manage with coverage mechanism.
You said you use ROUGE reward during evals. As far as I know, the calculation of ROUGE is quite slow, how to implement this metric to evaluate a certain ckpt? And which ROUGE metric you use for evaluation? 1, 2, or L ?

@yaserkl
Copy link
Owner

yaserkl commented Dec 25, 2018

Yes, it's quite slow and will increase the evaluation time per batch by two to three times (without ROUGE based eval, each evaluation will take around 0.5 sec on a P100 GPU with batch size 8, but with ROUGE it rise up to 1.5 secs which is still fine for my case). Also, I'm using ROUGE L to get the best training ckpt.

@xiangriconglin
Copy link

No, this issue is well-discussed in the original pointer-generator model page.
Every time you run this model, it will generate a different result due to the multi-processing batching used in this model.
The only solution that I usually use for fixing my model parameters is to use 1 queue for batching and make sure to use seed for randomizers throughout the framework.
Try setting these parameters to 1:
example_queue_threads
batch_queue_threads

If you vary the seed parameter, you might manage to get even better result than the original paper. I've got better result myself as presented in our latest paper.

My personal experience is that the running average loss (at least the way it is defined in this paper) is not the best indicator for selecting the best evaluation model. In the above paper, I'm using the average ROUGE reward during evals as another way of saving my best model and it sometimes work better the running average loss.

Excuse me,
When evaluating, is it necessary to add the train_operation to run?
The function def run_train_steps() is:
to_return = {
'train_op': self._shared_train_op,
'summaries': self._summaries,
'pgen_loss': self._pgen_loss,
'global_step': self.global_step,
'decoder_outputs': self.decoder_outputs
}
However, def run_eval_steps() is:
to_return = {
'summaries': self._summaries,
'pgen_loss': self._pgen_loss,
'global_step': self.global_step,
'decoder_outputs': self.decoder_outputs
}
When i ran eval steps, the model did not update and the average loss kept same. Is anything wrong in my running process?
Expect for replying, thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants