Skip to content

"Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems" in SIGIR'21

Notifications You must be signed in to change notification settings

sunnweiwei/user-satisfaction-simulation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems

We annotated a dialogue data set, User Satisfaction Simulation (USS), that includes 6,800 dialogues. All user utterances in those dialogues, as well as the dialogues themselves, have been labeled based on a 5-level satisfaction scale. See dataset.

These resources are developed within the following paper:

Weiwei Sun, Shuo Zhang, Krisztian Balog, Zhaochun Ren, Pengjie Ren, Zhumin Chen, Maarten de Rijke. "Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems". In SIGIR. Paper link

Data

The dataset (see dataset) is provided a TXT format, where each line is separated by "\t":

  • speaker role (USER or SYSTEM),
  • text,
  • action,
  • satisfaction (repeated annotation are separated by ","),
  • explanation text (only for JDDC at dialogue level, and repeated annotation are separated by ";")

And sessions are separated by blank lines.

Since the original dataset does not provide actions, we use the action annotation provided by IARD and included it in ReDial-action.txt.

The JDDC data set provides the action of each user utterances, including 234 categories. We compress them into 12 categories based on a manually defined classification method (see JDDC-ActionList.txt).

Data Statistics

The USS dataset is based on five benchmark task-oriented dialogue datasets: JDDC, Schema Guided Dialogue (SGD), MultiWOZ 2.1, Recommendation Dialogues (ReDial), and Coached Conversational Preference Elicitation (CCPE).

Domain JDDC SGD MultiWOZ ReDial CCPE
Language Chinese English English English English
#Dialogues 3,300 1,000 1,000 1,000 500
Avg# Turns 32.3 26.7 23.1 22.5 24.9
#Utterances 54,517 13,833 12,553 11,806 6,860
Rating 1 120 5 12 20 10
Rating 2 4,820 769 725 720 1,472
Rating 3 45,005 11,515 11,141 9,623 5,315
Rating 4 4,151 1,494 669 1,490 59
Rating 5 421 50 6 34 4

Baselines

The code for baseline reproduction can be found within /baselines.

Performance for user satisfaction prediction. Bold face indicates the best result in terms of the corresponding metric. Underline indicates comparable results to the best one.

 Performance for user action prediction. Bold face indicates the best result in terms of the corresponding metric. Underline indicates comparable results to the best one.

Cite

@inproceedings{Sun:2021:SUS,
  author =    {Sun, Weiwei and Zhang, Shuo and Balog, Krisztian and Ren, Zhaochun and Ren, Pengjie and Chen, Zhumin and de Rijke, Maarten},
  title =     {Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems},
  booktitle = {Proceedings of the 44rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
  series =    {SIGIR '21},
  year =      {2021},
  publisher = {ACM}
}

Contact

If you have any questions, please contact [email protected]

About

"Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems" in SIGIR'21

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published