diff --git a/README.md b/README.md index 376b3a6..a9487e8 100644 --- a/README.md +++ b/README.md @@ -12,41 +12,11 @@ REL utilizes *English* Wikipedia as a knowledge base and can be used for the fol - **Entity linking (EL)**: Given a text, the system outputs a list of mention-entity pairs, where each mention is a n-gram from text and each entity is an entity in the knowledge base. - **Entity Disambiguation (ED)**: Given a text and a list of mentions, the system assigns an entity (or NIL) to each mention. -Documentation: - -To install: - -```console -pip install radboud-el -``` - -The source code is available from [Github](https://github.com/informagi/REL). +**Documentation available at ** Suggestions, improvements, and edits are most welcome. -## REL variants - -REL comes in two variants for identifying entity mentions: - -- **Case-sensitive**: This setup is suitable for properly written texts (e.g., news articles) and is the default setup of the REL package. In this setup, we use the `ner-fast` FLAIR model, which is case-sensitive. The results reported in the REL paper are based on this model. - -- **Case-insensitive**: This setup is well suited for noisy texts (e.g., queries), where entity mentions can be (often) lowercased. In this setup, we use the `ner-fast-with-lowercase` model, which is the `ner-fast` FLAIR architucture trained on randomly cased and uncased text. This variant is the default setup of our API. - -Below is a comparison of these two models on [CoNLL-2003 NER](https://www.clips.uantwerpen.be/conll2003/ner/) dataset. - -| Model | CoNLL-2003 test | F1 | -| ------ | --------------- | -- | -| `ner-fast` | original | 92.78 | -| `ner-fast` | lower-cased | 58.42 | -| `ner-fast` | random | 70.64 | -| `ner-fast-with-lowercase` | original | 91.53| -| `ner-fast-with-lowercase` | lower-cased | 89.73 | -| `ner-fast-with-lowercase` | random | 89.66 | - -See [Notes on using custom models](https://rel.readthedocs.io/en/latest/tutorials/custom_models/) for further information on switiching between these variants. - - ## Calling our API Users may access our API by using the example script below. @@ -71,11 +41,17 @@ ed_result = requests.post(API_URL, json={ }).json() ``` -## Setup package +## Installation This section describes how to deploy REL on a local machine and setup the API. If you want to do anything more than simply running our API locally, you can skip the Docker steps and continue with installation from source. -### Option 1: Installation using Docker +### Option 1: Installation using pip + +```console +pip install radboud-el +``` + +### Option 2: Installation using Docker First, download the necessary data; you need the generic files and a Wikipedia version (2014 or 2019) (see [Download](#download)). Extract them anywhere, we will bind the directories to the Docker container as volumes. @@ -119,7 +95,7 @@ docker build . -t informagi/rel To run the API locally, use the same commands as mentioned in the previous section. -### Option 2: Installation from source code +### Option 3: Installation from source code Run the following command in a terminal to install REL: @@ -155,6 +131,27 @@ The remainder of the tutorials are optional and for users who wish to e.g. train 6. [REL as systemd service](https://rel.readthedocs.io/en/latest/tutorials/systemd_instructions/) 7. [Notes on using custom models](https://rel.readthedocs.io/en/latest/tutorials/custom_models/) +## REL variants + +REL comes in two variants for identifying entity mentions: + +- **Case-sensitive**: This setup is suitable for properly written texts (e.g., news articles) and is the default setup of the REL package. In this setup, we use the `ner-fast` FLAIR model, which is case-sensitive. The results reported in the REL paper are based on this model. + +- **Case-insensitive**: This setup is well suited for noisy texts (e.g., queries), where entity mentions can be (often) lowercased. In this setup, we use the `ner-fast-with-lowercase` model, which is the `ner-fast` FLAIR architucture trained on randomly cased and uncased text. This variant is the default setup of our API. + +Below is a comparison of these two models on [CoNLL-2003 NER](https://www.clips.uantwerpen.be/conll2003/ner/) dataset. + +| Model | CoNLL-2003 test | F1 | +| ------ | --------------- | -- | +| `ner-fast` | original | 92.78 | +| `ner-fast` | lower-cased | 58.42 | +| `ner-fast` | random | 70.64 | +| `ner-fast-with-lowercase` | original | 91.53| +| `ner-fast-with-lowercase` | lower-cased | 89.73 | +| `ner-fast-with-lowercase` | random | 89.66 | + +See [Notes on using custom models](https://rel.readthedocs.io/en/latest/tutorials/custom_models/) for further information on switiching between these variants. + ## Efficiency of REL We measured the efficiency of REL on a per-document basis. We ran our API with 50 documents from AIDA-B with > 200 words, which is 323 (± 105) words and 42 (± 19) mentions per document. The results are added to the table below. @@ -190,7 +187,7 @@ If you are using REL, please cite the following paper: ## Contact -If you find any bugs or experience difficulties when using REL, please create a issue on this Github page. If you have any specific questions with respect to our research with REL, please email [Mick van Hulst](mailto:mick.vanhulst@gmail.com). +If you find any bugs or experience difficulties when using REL, please create a issue on this Github page. If you have any specific questions with respect to our research with REL, please email [Faegheh Hasibi](mailto:f.hasibi@cs.ru.nl). ## Acknowledgements