Fuzz Up [W.I.P.]

fuzzup offers a simple approach for clustering string entitities based on Levenshtein Distance using Fuzzy Matching in conjunction with a simple rule-based clustering method.

fuzzup also provides functions for computing the prominence of the resulting entity clusters and to match them with entity whitelists.

An important use-case for fuzzup is organizing, structuring and analyzing output from Named-Entity Recognition(=NER). fuzzup also provides (2) functions for computing the prominence of the resulting entity clusters resulting from (1) as well as whitelist matching (3).

fuzzup has been designed to fit the output from NER predictions from the Hugging Face transformers NER pipeline specifically.

Installation guide

fuzzup can be installed from the Python Package Index (PyPI) by:

pip install fuzzup

If you want the development version then install directly from Github.

Workflow

fuzzup offers functionality for:

Computing all of the mutual string distances (Levensteihn Distances/fuzzy ratios) between the string entities
Forming clusters of string entities based on the distances from (1)
Computing prominence of the clusters from (2) based on the number of entity occurrences, their positions in the text etc.
Matching entities (clusters) with entity whitelists

Together these steps constitute an end-to-end approach for organizing and structuring the output from NER. Here is an example of how to use fuzzup for forming entity clusters based on edit distances.

To do

document whitelist matching in showcase
update readme with workflow
tests for whitelist
cutoff_threshold -> score_cutoff -> cdist
~~try and tune on junges entitites~~
~~run against tores list~~
~~document whitelist~~
~~update docs~~

Background

fuzzup is developed as a part of Ekstra Bladet’s activities on Platform Intelligence in News (PIN). PIN is an industrial research project that is carried out in collaboration between the Technical University of Denmark, University of Copenhagen and Copenhagen Business School with funding from Innovation Fund Denmark. The project runs from 2020-2023 and develops recommender systems and natural language processing systems geared for news publishing, some of which are open sourced like fuzzup.

Contact

We hope, that you will find fuzzup useful.

Please direct any questions and feedbacks to us!

If you want to contribute (which we encourage you to), open a PR.

If you encounter a bug or want to suggest an enhancement, please open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github/workflows		.github/workflows
docs		docs
fuzzup		fuzzup
optimization		optimization
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
dev-requirements.txt		dev-requirements.txt
field_test.py		field_test.py
field_test_requirements.txt		field_test_requirements.txt
logo.png		logo.png
mkdocs.yml		mkdocs.yml
playground.py		playground.py
pytest.ini		pytest.ini
setup.cfg		setup.cfg
setup.py		setup.py
transformer-pipeline.py		transformer-pipeline.py
utils.py		utils.py
weight_placement.py		weight_placement.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fuzz Up [W.I.P.]

Installation guide

Workflow

To do

Background

Read more

Contact

About

Releases

Packages

Contributors 2

Languages

License

ebanalyse/fuzzup

Folders and files

Latest commit

History

Repository files navigation

Fuzz Up [W.I.P.]

Installation guide

Workflow

To do

Background

Read more

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages