This repository contains resources for EMNLP-20 main conference paper:
Profile Consistency Identification for Open-domain Dialogue Agents. [Paper]
The code here is ready for running. And all resources are ready.
-
Source codes for KvBERT model: [Github]
-
Download the full dataset: [KvPI Dataset]
-
Download checkpoint to reproduce the reported results: [GoogleDrive], [BaiduNetdisk] pwd: pt4g; MD5 for the checkpoint: 0993c09872f074a04d29a4851cf2cfce
Here is an example that shows the process of understanding profile consistency. The table on the left is the profiles, consisting of several key-value pairs. And an open-domain dialogue session is on the right, with an input message, and two different responses:
We can see that both responses incorporate the location word, Beijing, in the given profile. The first response, which is marked green, expresses the meaning of welcoming others to come to their places. It indicates the speaker is currently in Beijing. Therefore, it is consistent with the given profile. However, for the red marked response, it expresses the hope of going to Beijing once, thus indicates the speaker had never been to Beijing before. Obviously, this response contradicts the profile.
For humans, they can easily understand the differences between these responses. But for machines, currently, they can hardly tell the differences. This work is intended to address this issue.
Here are some explanations for the above example:
Elements | Explanations |
---|---|
Profile | Attribute information of the respondent, including three groups of attributes: gender, location, and constellation. |
Post | Input information in a single-turn dialogue. Notice that the speaker on this side is not profiled. |
Response | Responses in a single-turn dialogue. It contains attribute related information, but not necessarily related to the response speaker's own attributes. |
Domain | Attribute field to which the dialogue response belongs. |
Annotated Attributes | Human-extracted attribute information from the dialogue responses. Different from the given profile under some circumstances. |
Label | Human annotated labels for consistency relations between Profile and Response, including Irrelevant, Entailed, and Contradicted. For details of the consistency relations please refer to the next section. |
- ENTAILED: The response is exactly talking about the dialogue agent’s attribute information, and the attribute is consistent with its key-value profile.
- CONTRADICTED: Although the response is talking about the dialogue agent’s attribute information, it is contradicted to at least one of the given key-value pairs. For example, given the profile “{location: Beijing}”, “I am in Seattle” is contradicted to the profile, while “She lives in Seattle” is not, because the latter is not talking about the dialogue agent’s attribute.
- IRRELEVANT: The response contains profile-related information, but the information does not reveal the dialogue agent’s own attributes. As exemplified above, “She lives in Seattle” is irrelevant, rather than contradicted, to the dialogue agent’s profile “{location: Beijing}”. Another example is “I’m interested in the history of Beijing”. Although there is the attribute word “Beijing”, this response still does not reveal the dialogue agent’s location.
The released codes have been tested with the following environments:
- pytorch=1.3.0
- cudatoolkit=9.2
- python=3.6
- tqdm
- sklearn
Higher cudatoolkit version may encounter unexpected errors. The pytorch/python dependencies can be installed using Anaconda virtual environment. For example:
conda create -n kvpi python=3.6
conda activate kvpi
conda install pytorch=1.3.0 torchvision cudatoolkit=9.2 -c pytorch
Then in your environment install the following dependencies:
pip install sklearn
pip install tqdm
sklearn
is used to calculate f1 score and accuracy. tqdm
is a lib for the progress bar.
First download the following data and put it into the ./ckpt folder:
- kvbert_epoch_3 (trained checkpoints)
And make sure the data folder has the KvPI_test.txt file, which is organized in a format that the model can read and is already in the repository.
Then run the following script:
./inference.sh
Run the script will make predictions on the test data, and the output is redirected to test_prediction.txt. When finishing the prediction, the script will call f1_acc.py to present final scores. In the end, there should be something like:
precision recall f1-score support
Entailed 0.927 0.939 0.933 5116
Contradicted 0.902 0.918 0.910 3041
Irrelevant 0.920 0.882 0.901 2843
accuracy 0.918 11000
macro avg 0.917 0.913 0.915 11000
weighted avg 0.919 0.918 0.918 11000
0.9184545454545454
Details will be updated later.
-
If the datasets, codes or checkpoints are of help to your work, please cite the following papers:
@inproceedings{song-etal-2020-profile, title = "Profile Consistency Identification for Open-domain Dialogue Agents", author = "Song, Haoyu and Wang, Yan and Zhang, Wei-Nan and Zhao, Zhengyu and Liu, Ting and Liu, Xiaojiang", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.emnlp-main.539", pages = "6651--6662", }
-
Notice that we trained the KvBERT model from a private Chinese BERT-base checkpoint and thus didn't provide the training codes and scripts in this repository. If you have a reasonable purpose and indeed need the training scripts, please email [email protected] with your institution email.