Low scores when running scripts for BART #2

Ago3 · 2021-11-11T13:21:43Z

Hi,

Thank you for making the code for the paper available! I'm trying to replicate your experiments with BART, but when running your run.sh script I get the following scores:

***** Eval results *****
{
    "all_slot_f1": 2.51,
    "all_slot_precision": 2.51,
    "all_slot_recall": 2.51,
    "intent_f1": 65.51,
    "type_II_exact_match": 0.0,
    "type_II_slot_f1": 31.48,
    "type_II_slot_precision": 31.48,
    "type_II_slot_recall": 31.48,
    "type_I_exact_match": 0.0,
    "type_I_slot_f1": 5.29,
    "type_I_slot_precision": 5.29,
    "type_I_slot_recall": 5.29
}

Do you have any idea of what could be wrong here?

Note that I run prepare.sh in the bart directory, but the run.sh script wouldn't work because it looks for files like "test.target" in the bart/resources directory, but these files are not there. I just copied the corresponding files from data/s2s_format to bart/resources, is that correct?

Thanks!
Agostina

The text was updated successfully, but these errors were encountered:

wasiahmad · 2021-11-11T23:15:39Z

Yes, copying the file is ok. I am not sure why are you getting such a low score. It seems like the model is not trained appropriately. Although the code we released is verified, I can try to reproduce the paper results again and share the fine-tuned checkpoint. But it needs some time.

Ago3 · 2021-11-12T08:10:47Z

That would be really useful, thanks!
Would it be possible for you to reproduce the results on a Colab Notebook? This would enable me and everyone else to reproduce the training procedure as well :)

[One more difference I can think of is that the code wouldn't run installing fairseq==0.9.0 as specified, I had to install fairseq==0.10.2]

kevinmtian · 2021-11-18T11:39:34Z

I also found this issue following identical steps in the training script for bart, (with fairseq==0.9.0), here are my scores

***** Eval results *****
{
    "all_slot_f1": 10.7,
    "all_slot_precision": 11.13,
    "all_slot_recall": 11.43,
    "intent_f1": 0.0,
    "type_II_exact_match": 0.0,
    "type_II_slot_f1": 27.9,
    "type_II_slot_precision": 28.13,
    "type_II_slot_recall": 27.81,
    "type_I_exact_match": 0.0,
    "type_I_slot_f1": 14.61,
    "type_I_slot_precision": 14.64,
    "type_I_slot_recall": 15.85
}

I wonder how it is going at the moment? Did you resolve the reproduction issue? Or could provide some pointers?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low scores when running scripts for BART #2

Low scores when running scripts for BART #2

Ago3 commented Nov 11, 2021

wasiahmad commented Nov 11, 2021

Ago3 commented Nov 12, 2021

kevinmtian commented Nov 18, 2021

Low scores when running scripts for BART #2

Low scores when running scripts for BART #2

Comments

Ago3 commented Nov 11, 2021

wasiahmad commented Nov 11, 2021

Ago3 commented Nov 12, 2021

kevinmtian commented Nov 18, 2021