Why is the effect obtained after running the evaluated code less than 0.05 instead of 0.8 in the table (such as Thai BM25)? #10

talk2much · 2023-10-25T08:38:37Z

Why is the effect obtained after running the evaluated code less than 0.05 instead of 0.8 in the table (such as Thai BM25)?

lintool · 2023-11-01T00:53:48Z

Can you provide more details please?

talk2much · 2023-11-01T02:32:51Z

Thanks, I followed the steps above in readme. Execute this first: python -m pyserini.search --bm25 \ --language th \ --topics mrtydi-v1.1-thai-test \ --index mrtydi-v1.1-thai \ --output runs/run.bm25.mrtydi-v1.1-thai.test.txt Then execute this: python -m pyserini.eval.trec_eval -c -m recip_rank -m recall.100 dataset/qrels.test.txt  runs/run.bm25. mrtydi-v1.1-thai.test.txt The final result is: Did my steps go wrong？ In addition, I have another question, if we can directly calculate the results of recall from topic.test.csv. So what does topic.train.csv do here? Is my thinking or procedure wrong? Looking forward to your reply, thank you.   朱铭洋 ***@***.***  

…

------------------ 原始邮件 ------------------ 发件人: "Jimmy ***@***.***>; 发送时间: 2023年11月1日(星期三) 上午8:53 收件人: ***@***.***>; 抄送: ***@***.***>; ***@***.***>; 主题: Re: [castorini/mr.tydi] Why is the effect obtained after running the evaluated code less than 0.05 instead of 0.8 in the table (such as Thai BM25)? (Issue #10) Can you provide more details please? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

crystina-z · 2023-11-01T02:57:31Z

Hi @talk2much! Thanks for your interest in our work!

I would suggest on verifying the content on your dataset/qrels.test.txt? I'm running the following comment on my end:

First:

$ python -m pyserini.search --bm25 \
> --language th \
> --topics mrtydi-v1.1-thai-test \
> --index mrtydi-v1.1-thai \
> --output runs/run.bm25.mrtydi-v1.1-thai.test.txt

which is the same as you shared; and then evaluate using:

$ python -m pyserini.eval.trec_eval -c -m recip_rank -m recall.100 mrtydi-v1.1-thai-test runs/run.bm25.mrtydi-v1.1-thai.test.txt

This gives results that match to the paper:

Results:
recip_rank              all     0.4016
recall_100              all     0.8529

In addition, I have another question, if we can directly calculate the results of recall from topic.test.csv. So what does topic.train.csv do here? Is my thinking or procedure wrong?

I'm not sure if I fully understand the question, but the topic.train.csv is only needed if you want to fine-tune the model using the training set, and just for evaluation we'd only need topic.test.csv.

Let us know if you have more questions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the effect obtained after running the evaluated code less than 0.05 instead of 0.8 in the table (such as Thai BM25)? #10

Why is the effect obtained after running the evaluated code less than 0.05 instead of 0.8 in the table (such as Thai BM25)? #10

talk2much commented Oct 25, 2023

lintool commented Nov 1, 2023

talk2much commented Nov 1, 2023 via email

crystina-z commented Nov 1, 2023

Why is the effect obtained after running the evaluated code less than 0.05 instead of 0.8 in the table (such as Thai BM25)? #10

Why is the effect obtained after running the evaluated code less than 0.05 instead of 0.8 in the table (such as Thai BM25)? #10

Comments

talk2much commented Oct 25, 2023

lintool commented Nov 1, 2023

talk2much commented Nov 1, 2023 via email

crystina-z commented Nov 1, 2023