Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative evaluation matrices beyond F1 score and exact match #2

Open
2 tasks
wbcbugfree opened this issue Jun 6, 2024 · 1 comment
Open
2 tasks
Assignees

Comments

@wbcbugfree
Copy link
Collaborator

wbcbugfree commented Jun 6, 2024

Find and/or develop other possible matrices to evaluate different strategies for converting text to RDF statements. Current matrices like F1 score and exact match are not able to match RDF triples semantically. They treat a RDF triple as a string (exact match) or three strings (general F1 score). Consequently, a string from RDF triples can be recognized as correct only if it is exactly the same as the ground truth; otherwise, it is wrong. However, for example, for the concept "Soil Health", some times it was defined as "ex:SoilHealth" for URI, other times as "ex:HealthySoils". Semantically, they are not that different. But for the current metrics, only one of these two definitions is likely to be correct, and any other definitions score no points. This potentially underestimates the performance of zero-shot learning because it is much less likely to consistently define the URI of the concept.

Possible solutions:

  • RDF2vec
  • Convert RDF statements back to plain text, embed them and compute similarity
@wbcbugfree wbcbugfree self-assigned this Jun 6, 2024
@wbcbugfree wbcbugfree changed the title Alternative evaluation matrices beyond F1 score and Exact match Alternative evaluation matrices beyond F1 score and exact match Jun 6, 2024
@wbcbugfree
Copy link
Collaborator Author

The metrics we currently have are:

  • Vanilla precision, recall and F1 score based on triple-level exact match;
  • Graph BERTScore;
  • Bleu-F1 & ROUGE-F1.

To-do:

  • Graph Edit Distance;
  • Optimal Edit Paths.

What we won't do anymore:

  • Matching S, P and O separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant