Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It's slow to evaluate #12

Open
liujiqiang999 opened this issue Jul 10, 2020 · 7 comments
Open

It's slow to evaluate #12

liujiqiang999 opened this issue Jul 10, 2020 · 7 comments

Comments

@liujiqiang999
Copy link

Hi,
Sometimes, it is very slow to evalute using m2scorer. How to fix it? And Could I evaluate scores of each errror types separately? How to achieve this function? Thank you very much.

@kouhonglady
Copy link

Hi,
Sometimes, it is very slow to evalute using m2scorer. How to fix it? And Could I evaluate scores of each errror types separately? How to achieve this function? Thank you very much.

Have you solved this problem? i just run on the 2014 conll GEC dataset,only 1313 sentences,it takes more than 5 hours but not gives out the result.

@liujiqiang999
Copy link
Author

@kouhonglady
Hi,
It is a good way to cut long sentences into some short ones in the CoNLL14 test-set.

@YovaKem
Copy link

YovaKem commented Nov 2, 2021

I found that in my case, the reason for the "never-ending" computation of the metric were some bad predictions where the same ngram was repeated multiple times at the end of a sentence.

@Yusuke196
Copy link

In order to find which sentence is causing a trouble, -v option of m2scorer helped for me.

@nymwa
Copy link

nymwa commented Jan 4, 2023

I think the edit lattice of M2 scorer is DAG. So it is topological sortable. If the graph is topological sorted, the shortest path can be calculated by O(V + E). And topological sort can be done by O(V + E). Therefore, the total calculation is O(V + E). This is faster than Bellman-Ford algorithm with O(V×E). This can be one solution of this problem.

@shotakoyama
Copy link

It seems that transitive_args() of levenshtein.py is very time-consuming.
https://github.com/nusnlp/m2scorer/blob/version3.2/scripts/levenshtein.py#L649

These 3 for loops of adding transitive arcs may be replaced with a more efficient algorithm.

@craggy-otake
Copy link

craggy-otake commented Jun 14, 2023

As nymwa said, in this case, Bellman-Ford algorithm seems to be too slow.
Given the nature of this graph, there is no negative closed path, and the Dijkstra algorithm is sufficient. At any rate, I rewrote the code to Dijkstra. The code is available here.
https://github.com/craggy-otake/m2scorer_python3_fast

Please let me know if you need to delete my repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants