Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low scores when running scripts for BART #2

Open
Ago3 opened this issue Nov 11, 2021 · 3 comments
Open

Low scores when running scripts for BART #2

Ago3 opened this issue Nov 11, 2021 · 3 comments

Comments

@Ago3
Copy link

Ago3 commented Nov 11, 2021

Hi,

Thank you for making the code for the paper available! I'm trying to replicate your experiments with BART, but when running your run.sh script I get the following scores:

***** Eval results *****
{
    "all_slot_f1": 2.51,
    "all_slot_precision": 2.51,
    "all_slot_recall": 2.51,
    "intent_f1": 65.51,
    "type_II_exact_match": 0.0,
    "type_II_slot_f1": 31.48,
    "type_II_slot_precision": 31.48,
    "type_II_slot_recall": 31.48,
    "type_I_exact_match": 0.0,
    "type_I_slot_f1": 5.29,
    "type_I_slot_precision": 5.29,
    "type_I_slot_recall": 5.29
}

Do you have any idea of what could be wrong here?

Note that I run prepare.sh in the bart directory, but the run.sh script wouldn't work because it looks for files like "test.target" in the bart/resources directory, but these files are not there. I just copied the corresponding files from data/s2s_format to bart/resources, is that correct?

Thanks!
Agostina

@wasiahmad
Copy link
Owner

Yes, copying the file is ok. I am not sure why are you getting such a low score. It seems like the model is not trained appropriately. Although the code we released is verified, I can try to reproduce the paper results again and share the fine-tuned checkpoint. But it needs some time.

@Ago3
Copy link
Author

Ago3 commented Nov 12, 2021

That would be really useful, thanks!
Would it be possible for you to reproduce the results on a Colab Notebook? This would enable me and everyone else to reproduce the training procedure as well :)

[One more difference I can think of is that the code wouldn't run installing fairseq==0.9.0 as specified, I had to install fairseq==0.10.2]

@kevinmtian
Copy link

I also found this issue following identical steps in the training script for bart, (with fairseq==0.9.0), here are my scores

***** Eval results *****
{
    "all_slot_f1": 10.7,
    "all_slot_precision": 11.13,
    "all_slot_recall": 11.43,
    "intent_f1": 0.0,
    "type_II_exact_match": 0.0,
    "type_II_slot_f1": 27.9,
    "type_II_slot_precision": 28.13,
    "type_II_slot_recall": 27.81,
    "type_I_exact_match": 0.0,
    "type_I_slot_f1": 14.61,
    "type_I_slot_precision": 14.64,
    "type_I_slot_recall": 15.85
}

I wonder how it is going at the moment? Did you resolve the reproduction issue? Or could provide some pointers?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants