No evaluation script for Grounding #13

acha21 · 2021-01-26T23:43:05Z

Hello. Lianhui Qin.

I cannot find the script for evaluating Grounding which is presented The table 2.
I have tried to reproduce the result using stopwords_700+.txt that you provided, but failed.
Can you share your script for this?

Yeonchan Ahn.

qkaren · 2021-01-27T02:14:48Z

Hi Yeonchan,

Did you try to follow this page?
https://github.com/qkaren/converse_reading_cmr/tree/master/evaluation

acha21 · 2021-01-27T02:27:00Z

Yes. I can find the evaluation script for NIST, BLEU, Meteor, and diversity measures.
But, I cannot find evaluating Grounding metrics such as Precision and recall.

I think the evaluation metrics are not included in the link you mentioned above.
Is it?

qkaren · 2021-02-28T23:49:27Z

oh for those grounding metrics, we probably didn't include them here. They are just the normal Precision and Recall methods that should be easy to implement.

acha21 · 2021-04-21T02:21:50Z

Thank you for your comment.
Although it is simple, I cannot figure out the cause of failing the result. I experimented using the known human's reference response, which I can reproduce BLUE, NIST scores but failed.
Could you tell me about the details of your evaluation code, including which stop word set or tokenizer you used, if you mind share the evaluation code?

qkaren · 2021-05-07T00:58:37Z

Yes, for tokenizer and stemmer:

from nltk.tokenize import TweetTokenizer
from nltk.stem.porter import *

for the stop word set:

stopwords_700+.txt

qkaren · 2021-05-07T01:04:20Z

def count_grounded(facts, fresult, ids, writer, count_examples,file):
    g_count = 0
    w_count = 0
    f_count = 0
    lines_count = 0
    fact_dict = dict()
    stop_words = _get_stop_words()
    print(len(facts), len(fresult))
    assert len(facts) == len(fresult)

    for i, (fact, result) in enumerate(zip(facts, fresult)):
        fact = fact.strip().split()
        id = result.strip().split('\t')[0]
        que = result.strip().split('\t')[-2].split()
        res = result.strip().split('\t')[-1]
        if res in res_set:
            continue
        else:
            res_set.add(res)
        res = res.split()
        
        lines_count += 1
        fact_dict[id] = fact
        grounded = []
        qued = []
        for x in _stem(res):
            if x not in stop_words:
                w_count += 1
                if x not in _stem(que):
                    if x in _stem(fact):
                        grounded.append(x)
                        g_count += 1
                else:
                    qued.append(x)
        for x in _stem(fact):
            if x not in stop_words:
                f_count += 1

    precision = g_count/w_count
    recall = g_count/f_count
    f1 = 2 * (precision * recall) / (precision + recall)
    print('lines' + str(lines_count))
    print('grounded words: {}'.format(g_count))
    print('words: {}'.format(w_count))
    print('precision: {:.2f}%\n'.format(precision * 100))
    print('fact_len: {}\n'.format(f_count))
    print('recall: {:.2f}%\n'.format(recall * 100))
    print('f1: {:.2f}'.format(f1))
    with open('f1_score.tsv', 'a+') as fs:
        fs.write('{}\t{:.2f%}\t{:.2f}\t{:.2f}\n'.format(file, precision*100, recall, f1))

    return fact_dict

qkaren closed this as completed May 7, 2021

qkaren reopened this May 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No evaluation script for Grounding #13

No evaluation script for Grounding #13

acha21 commented Jan 26, 2021

qkaren commented Jan 27, 2021

acha21 commented Jan 27, 2021 •

edited

Loading

qkaren commented Feb 28, 2021

acha21 commented Apr 21, 2021

qkaren commented May 7, 2021

qkaren commented May 7, 2021

No evaluation script for Grounding #13

No evaluation script for Grounding #13

Comments

acha21 commented Jan 26, 2021

qkaren commented Jan 27, 2021

acha21 commented Jan 27, 2021 • edited Loading

qkaren commented Feb 28, 2021

acha21 commented Apr 21, 2021

qkaren commented May 7, 2021

qkaren commented May 7, 2021

acha21 commented Jan 27, 2021 •

edited

Loading