Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Refactor - MatcherResults and metrics #70

Merged
merged 11 commits into from
Jan 31, 2024

Conversation

Archer6621
Copy link
Contributor

@Archer6621 Archer6621 commented Jan 24, 2024

This PR refactors the API #61 so that a MatcherResults object is returned which inherits from dict, instead of a plain dict, when using either valentine_match or valentine_batch_match. This should not break the existing API too much, as the return type is still a dict as before.

This dictionary is sorted upon instantiation according to its values, from high similarity to low similarity (dictionaries preserve sorting/insertion order starting from Python 3.6).

This MatcherResults object exposes the following API methods:

  • get_metrics - gets metrics according to the matches
  • one_to_one - transforms the matches so that they are one-to-one and returns a new MatcherResults with this
  • take_top_percent - takes the top n percent of the matches and returns this as a new MatcherResults
  • take_top_n - takes the top n matches and returns this as a new MatcherResults

Aside from this new MatcherResults object, the metrics API has been overhauled as well. Metrics are now classes that inherit from the abstract Metric class. These need to be instantiated with the appropriate parameters in order to be used, although all of these parameters should be keyword arguments and thus have a default.

The API for this is as follows:

from valentine.metrics import Precision, F1Score, PrecisionTopNPercent, METRICS_PRECISION_INCREASING_N

matches = valentine_match(df1, df2, ALGORITHM)

# Getting the default metrics, defined in `metrics.__init__`
metrics = matches.get_metrics(ground_truth)

# Using custom metrics, imported from `metrics`
metrics_custom = matches.get_metrics(ground_truth, metrics={Precision(), F1Score(one_to_one=False), PrecisionTopNPercent(n=25)}

# Using a predefined set
metrics_set = matches.get_metrics(ground_truth, metrics=METRICS_PRECISION_INCREASING_N)

A final minor change is that the Match class got converted to a dataclass.

Tests and numpy-style documentation have been added for the new additions, and the example + readme has been updated as well.

Furthermore, with this change, it will become easy to also integrate #55 into the MatcherResults class, where it fits much better.

Copy link

codecov bot commented Jan 25, 2024

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (96430e7) 86.36% compared to head (5dc95df) 87.15%.

Files Patch % Lines
valentine/metrics/base_metric.py 88.23% 2 Missing ⚠️
valentine/algorithms/matcher_results.py 98.00% 1 Missing ⚠️
valentine/metrics/metric_helpers.py 96.29% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #70      +/-   ##
==========================================
+ Coverage   86.36%   87.15%   +0.79%     
==========================================
  Files          37       40       +3     
  Lines        1679     1760      +81     
==========================================
+ Hits         1450     1534      +84     
+ Misses        229      226       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kPsarakis kPsarakis self-requested a review January 26, 2024 09:16
@kPsarakis
Copy link
Member

@Archer6621 Is this ready to review?

Copy link
Member

@kPsarakis kPsarakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks Shaad!

@kPsarakis kPsarakis merged commit dd15f95 into delftdata:master Jan 31, 2024
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants