Personal Entity, Concept, and Named Entity Linking in Conversations

This repository provides (1) conversational entity linking dataset (ConEL-2) and (2) conversational entity linking tool (CREL), as resources for the following research:

Personal Entity, Concept, and Named Entity Linking in Conversations, Hideaki Joko and Faegheh Hasibi, CIKM 2022

Figure: The example of entity linking in conversations.

This repository is structured in the following way:

tool/ : EL tool for conversation (CREL), with the example script.
dataset/ : Conversational entity linking datasets (ConEL-2), with the documentation of the statistics and format.
eval/ : Tool to calculate the performance of the entity linking method, with the run files of baseline and our method.

CREL: Conversational Entity Linking Tool

CREL is the conversational entity linking tool trained on the ConEL-2 dataset. Unlike existing EL methods, CREL is developed to identify both named entities and concepts. It also utilizes coreference resolution techniques to identify personal entities and references to the explicit entity mentions in the conversations.

Quickstart with Google Colab

The easiest way to get started with this project is to use our Google Colab code. By just running the notebook, you can try our entity linking approach.

The usage of the tool is as follows:

from conv_el import ConvEL
cel = ConvEL()

conversation_example = [
    {"speaker": "USER", 
    "utterance": "I am allergic to tomatoes but we have a lot of famous Italian restaurants here in London.",}, 

    # System turns are not annotated
    {"speaker": "SYSTEM", 
    "utterance": "Some people are allergic to histamine in tomatoes.",},

    {"speaker": "USER", 
    "utterance": "Talking of food, can you recommend me a restaurant in my city for our anniversary?",},
]

annotation_result = cel.annotate(conversation_example)
print_results(annotation_result) # This function is defined in the notebook.

# Output:
# 
# USER: I am allergic to tomatoes but we have a lot of famous Italian restaurants here in London.
# 	 [17, 8, 'tomatoes', 'Tomato']
# 	 [54, 19, 'Italian restaurants', 'Italian_cuisine']
# 	 [82, 6, 'London', 'London']
# SYST: Some people are allergic to histamine in tomatoes.
# USER: Talking of food, can you recommend me a restaurant in my city for our anniversary?
# 	 [11, 4, 'food', 'Food']
# 	 [40, 10, 'restaurant', 'Restaurant']
# 	 [54, 7, 'my city', 'London']

where, input for our tool is a conversation which has two keys for each turn: speaker and utterance. The speaker is the speaker of the utterance (either USER or SYSTEM), and the utterance is the utterance itself.

Note

Use CPU to run this notebook.
- The code also run on GPU, however, because of the storage limitation, you cannot try GPU on Google Colab if you use free version.
It takes approx 30 mins to download the models. Please wait for a while.

Start on your local machine

You can also use our method locally. The documentation is available at ./tool/README.md.

ConEL-2: Conversational Entity Linking Dataset

Dataset

Our ConEL-2 dataset contains concepts, named entities (NEs), and personal entity annotations for conversations. This annotations is collected on Wizard of Wikipedia dataset. The format and detailed statistics of the dataset are described here ./dataset/README.md.

Table: Statistics of conversational entity linking dataset

	Train	Val	Test
Conversations	174	58	58
User utterance	800	267	260
NE and concept annotations	1428	523	452
Personal entity annotations	268	89	73

The format of the dataset is as follows:

{
    "dialogue_id": "9161",
    "turns": [
        {
            "speaker": "USER", # or "SYSTEM"
            "utterance": "Alpacas are definitely my favorite animal.  I have 10 on my Alpaca farm in Friday harbor island in Washington state.",
            "turn_number": 0,
            "el_annotations": [ # Ground truth annotations
                {
                    "mention": "Alpacas",
                    "entity": "Alpaca",
                    "span": [0, 7],
                }, ...]
            "personal_entity_annotations": [ # Personal entity annotations
                {
                    "personal_entity_mention": "my favorite animal",
                    "explicit_entity_mention": "Alpacas",
                    "entity": "Alpaca"
                }
            ],
            "personal_entity_annotations_without_eems": [ # Personal entity annotations where EEM annotated as not found
                {
                    "personal_entity_mention": "my Alpaca farm"
                }
            ]
        },

You can find more details about the format of the dataset in the ./dataset/README.md

Additionally, we also provide personal entity linking mention detection dataset, which contains 985 conversations with 1369 personal entity mention annotations.

Evaluation Tool

The tool to evaluate your entity linking method is provided in the eval/ directory. The detail explanations are available here ./eval/README.md.

Contact

If you have any questions, please contact Hideaki Joko at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
dataset		dataset
eval		eval
tool		tool
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
example.png		example.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Personal Entity, Concept, and Named Entity Linking in Conversations

CREL: Conversational Entity Linking Tool

Quickstart with Google Colab

Start on your local machine

ConEL-2: Conversational Entity Linking Dataset

Dataset

Evaluation Tool

Contact

About

Releases

Packages

Languages

License

informagi/conversational-entity-linking-2022

Folders and files

Latest commit

History

Repository files navigation

Personal Entity, Concept, and Named Entity Linking in Conversations

CREL: Conversational Entity Linking Tool

Quickstart with Google Colab

Start on your local machine

ConEL-2: Conversational Entity Linking Dataset

Dataset

Evaluation Tool

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages