Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No evaluation script for Grounding #13

Open
acha21 opened this issue Jan 26, 2021 · 6 comments
Open

No evaluation script for Grounding #13

acha21 opened this issue Jan 26, 2021 · 6 comments

Comments

@acha21
Copy link

acha21 commented Jan 26, 2021

Hello. Lianhui Qin.

I cannot find the script for evaluating Grounding which is presented The table 2.
I have tried to reproduce the result using stopwords_700+.txt that you provided, but failed.
Can you share your script for this?

Yeonchan Ahn.

@qkaren
Copy link
Owner

qkaren commented Jan 27, 2021

Hi Yeonchan,

Did you try to follow this page?
https://github.com/qkaren/converse_reading_cmr/tree/master/evaluation

@acha21
Copy link
Author

acha21 commented Jan 27, 2021

Yes. I can find the evaluation script for NIST, BLEU, Meteor, and diversity measures.
But, I cannot find evaluating Grounding metrics such as Precision and recall.

I think the evaluation metrics are not included in the link you mentioned above.
Is it?

@qkaren
Copy link
Owner

qkaren commented Feb 28, 2021

oh for those grounding metrics, we probably didn't include them here. They are just the normal Precision and Recall methods that should be easy to implement.

@acha21
Copy link
Author

acha21 commented Apr 21, 2021

Thank you for your comment.
Although it is simple, I cannot figure out the cause of failing the result. I experimented using the known human's reference response, which I can reproduce BLUE, NIST scores but failed.
Could you tell me about the details of your evaluation code, including which stop word set or tokenizer you used, if you mind share the evaluation code?

@qkaren
Copy link
Owner

qkaren commented May 7, 2021

Yes, for tokenizer and stemmer:

from nltk.tokenize import TweetTokenizer
from nltk.stem.porter import *

for the stop word set:

stopwords_700+.txt

@qkaren qkaren closed this as completed May 7, 2021
@qkaren qkaren reopened this May 7, 2021
@qkaren
Copy link
Owner

qkaren commented May 7, 2021

def count_grounded(facts, fresult, ids, writer, count_examples,file):
    g_count = 0
    w_count = 0
    f_count = 0
    lines_count = 0
    fact_dict = dict()
    stop_words = _get_stop_words()
    print(len(facts), len(fresult))
    assert len(facts) == len(fresult)

    for i, (fact, result) in enumerate(zip(facts, fresult)):
        fact = fact.strip().split()
        id = result.strip().split('\t')[0]
        que = result.strip().split('\t')[-2].split()
        res = result.strip().split('\t')[-1]
        if res in res_set:
            continue
        else:
            res_set.add(res)
        res = res.split()
        
        lines_count += 1
        fact_dict[id] = fact
        grounded = []
        qued = []
        for x in _stem(res):
            if x not in stop_words:
                w_count += 1
                if x not in _stem(que):
                    if x in _stem(fact):
                        grounded.append(x)
                        g_count += 1
                else:
                    qued.append(x)
        for x in _stem(fact):
            if x not in stop_words:
                f_count += 1

    precision = g_count/w_count
    recall = g_count/f_count
    f1 = 2 * (precision * recall) / (precision + recall)
    print('lines' + str(lines_count))
    print('grounded words: {}'.format(g_count))
    print('words: {}'.format(w_count))
    print('precision: {:.2f}%\n'.format(precision * 100))
    print('fact_len: {}\n'.format(f_count))
    print('recall: {:.2f}%\n'.format(recall * 100))
    print('f1: {:.2f}'.format(f1))
    with open('f1_score.tsv', 'a+') as fs:
        fs.write('{}\t{:.2f%}\t{:.2f}\t{:.2f}\n'.format(file, precision*100, recall, f1))

    return fact_dict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants