Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulty in evaluating perplexity in story generation task #10

Closed
fangleai opened this issue Mar 7, 2020 · 1 comment
Closed

Difficulty in evaluating perplexity in story generation task #10

fangleai opened this issue Mar 7, 2020 · 1 comment

Comments

@fangleai
Copy link

fangleai commented Mar 7, 2020

Thanks for the source code sharing. I re-train the model in story generation task on Writingprompts dataset using almost same config (except less GPUs) and your provided .bpe files. My goal is to evaluate the test perplexity as reported in Table 3 of your paper.

However, the only way I find promising seems not working: the OpenNMT library in your code can evaluate "GOLD score" for the target sequences. The GOLD score is printed after running 'python translate.py' with given target. However, I got unreasonable PPL results as following:
PRED AVG SCORE: -0.0040, PRED PPL: 1.0040
GOLD AVG SCORE: -9.7727, GOLD PPL: 17548.5739
The GOLD scores are incredibly high. I am wondering how you evaluate the PPL just as you reported in Table 3. Whether or not you would like to share this evaluation code in the repository? Many thanks.

@fangleai
Copy link
Author

Linked to pull request #11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant