Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing the paper's results #8

Open
vibhavagarwal5 opened this issue May 15, 2020 · 8 comments
Open

Reproducing the paper's results #8

vibhavagarwal5 opened this issue May 15, 2020 · 8 comments

Comments

@vibhavagarwal5
Copy link

Hi,

I'm not able to reproduce the same numbers as reported in the paper for Predict&Explain model even after using the same parameters. I'm getting ~76% label accuracy while the paper reports ~84%. Any idea what I might be missing?

Also, the expl_to_label model for predicting the label using the expl is not working. The model produces a static accuracy curve of ~33% and just early stops after epoch=7. Can you help me out in this as well?

@vibhavagarwal5
Copy link
Author

I'm also getting out of memory error in the attention model training. How large was your training GPU size @OanaMariaCamburu ? Coz, I'm trying on a 12GB GPU and its not fitting.

@OanaMariaCamburu
Copy link
Owner

Hi,

Sorry to hear this. I also had a 12GB GPU and it was filling the memory at the limit but didn't crash.

Are you able to reproduce the results when you use the py2 version? maybe there were too many changes at a time to port it to py3. There's clearly a bug since you basically get random accuracy for Expl2Lbl.

@vibhavagarwal5
Copy link
Author

I see. I couldn't check with py2 due to the obsoleteness of the py2 and the older PyTorch version 😞

Due to this, I had to shift to py3. I had a double-check with the py2 code as well and the changes seemed pretty independent of the py2/py3 changeability. I'll check again anyways. :/

@vibhavagarwal5
Copy link
Author

I just checked once, the logic remains the same. Beside this, I don't understand why the expl_to_label model is not training, it is essentially the same infersent arch encoder with the same classifier model. :/

@vibhavagarwal5
Copy link
Author

I feel there's something wrong with the decoder architecture. The loss for the expl in predict2expl model drops for a few epochs and then flats out. Even the expl generated is not that great. I feel I might be missing something but I just ran the plain code given.

@OanaMariaCamburu
Copy link
Owner

The architecture is very straightforward: the encoder is a BiLSTM, the decoder is an LSTM (of course, at training time you input the correct tokens and at test time you input the previously outputted tokens), the loss is the typical cross-entropy loss. You could try to code it from scratch and only look in the code for details. I'd suggest to start with the Expl2Label model, it's very clear that given the for of the explanations it should train at very high accuracy, i got ~97%, so getting 33% on that shows a fundamental problem. Hope this helps.

@vibhavagarwal5
Copy link
Author

vibhavagarwal5 commented May 17, 2020

I see. Thanks for the help @OanaMariaCamburu! 😄

@OanaMariaCamburu
Copy link
Owner

No worries, please reach out if you are unsure about any details in the architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants