Skip to content

Commit

Permalink
Prepare 2.2.1 release
Browse files Browse the repository at this point in the history
  • Loading branch information
OlivierBinette committed Nov 8, 2023
1 parent 2c00d69 commit 8fe7beb
Show file tree
Hide file tree
Showing 5 changed files with 9 additions and 5 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@
Changelog
=========

2.2.1 (November 8, 2023)
------------------------
* Small fixes to paper and documentation.

2.2.0 (October 26, 2023)
------------------------
* Streamline package structure
Expand Down
2 changes: 1 addition & 1 deletion docs/00-overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Throughout this user guide and the documentation of the package, we use the foll
- The output of an entity resolution system is a **predicted clustering,** i.e. an attempt at correctly clustering records/mentions according to the entity to which they refer. There may be errors in the predicted clustering, e.g. records/mentions may be incorrectly clustered together or split into multiple clusters.
- A **reference** dataset, or a set of **ground truth** clusters, is a clustering of mentions/records that is assumed to be correct.

For more information on entity resolution, we refer the reader to [Binette & Steorts (2022)](https://www.science.org/doi/10.1126/sciadv.abi8021) and [Christophides et al. (2019)](https://arxiv.org/abs/1905.06397).
For more information on entity resolution, we refer the reader to `Binette & Steorts (2022) <https://www.science.org/doi/10.1126/sciadv.abi8021>`_ and `Christophides et al. (2019) <https://arxiv.org/abs/1905.06397>`_.

We recommend `Splink <https://github.com/moj-analytical-services/splink>`_ as a state-of-the-art large-scale entity resolution software. The Splink team provides a large list of `tutorials <https://moj-analytical-services.github.io/splink/demos/tutorials/00_Tutorial_Introduction.html>`_ and `training materials <https://moj-analytical-services.github.io/splink/topic_guides/topic_guides_index.html>`_ on their website. The book `"Hands-On Entity Resolution" <https://www.oreilly.com/library/view/hands-on-entity-resolution/9781098148478/>`_ provides an introduction to entity resolution with Splink.

Expand Down
4 changes: 2 additions & 2 deletions docs/04-error_analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
"- **Expected Missing Elements:** This metric represents the expected number of missing elements for each true cluster. It calculates the average number of elements that are missing from the predicted clusters compared to the true clusters.\n",
"- **Expected Relative Missing Elements:** This metric represents the expected relative number of missing elements for each true cluster. It calculates the average relative number of elements that are missing from the predicted clusters compared to the true clusters.\n",
"\n",
"You can find more information about these metrics, including formal mathematical definitions, in the {py:module}`er_evaluation.error_analysis` module."
"You can find more information about these metrics, including formal mathematical definitions, in the [er_evaluation.error_analysis](https://er-evaluation.readthedocs.io/en/latest/er_evaluation.error_analysis.html) module."
]
},
{
Expand Down Expand Up @@ -96,7 +96,7 @@
"source": [
"## Error Analysis with Decision Trees\n",
"\n",
"To identify combinations of features leading to performance disparities, we recommend doing error analysis using decision trees. First, define features associated with each cluster and choose an error metric to target. You can use any error metric from the {py:func}`er_evaluation.error_analysis` module. We recommend using thresholded 0-1 features for interpretability."
"To identify combinations of features leading to performance disparities, we recommend doing error analysis using decision trees. First, define features associated with each cluster and choose an error metric to target. You can use any error metric from the [er_evaluation.error_analysis](https://er-evaluation.readthedocs.io/en/latest/er_evaluation.error_analysis.html) module. We recommend using thresholded 0-1 features for interpretability."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion er_evaluation/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "2.2.0"
__version__ = "2.2.1"

import er_evaluation.data_structures
import er_evaluation.datasets
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,6 @@
name="ER-Evaluation",
packages=find_packages(),
url="https://github.com/OlivierBinette/er_evaluation",
version="2.2.0",
version="2.2.1",
zip_safe=False,
)

0 comments on commit 8fe7beb

Please sign in to comment.