This repository contains code and data for the SPOTTER Framework, a framework for investigating convention formation in Human-Robot Interaction, which is described in this paper.
The SPOTTER game is a two-person reference game. It consists of six rounds in which the goal is to locate the position of characters in a visual scene. The visual scene for each player contains the same characters, but they are in a different order. Players must communicate to find the position of each character in the other player's picture:
The game is designed to support Human-Robot Interaction. However, it can also be used to investigate Human-Human Interaction.
HTML source code for the SPOTTER game can be found in the folder SPOTTER-game
.
There are two versions of the game:
version_1
contains the original version which was used in a pilot experiment. This version uses cartoon-like figures. Players also only had to select whether a character was in the same or a different position.version_2
contains the latest, updated version. The cartoon-like faces have been replaced by more realistic faces. Players now have to select the exact position of a character in the other player's picture.
If you wish to use the framework for your experiments, we recommend to use the latest version.
The game starts from start_adult.html
. start_child.html
was used for an additional
version of the game which is simpler so it could be played by children. From the start page,
the html links to each subsequent round.
The folder dataset
contains the data which was gathered during two Human-Robot Interaction pilot experiments.
Robot behaviour was 'faked' using the Wizard-of-Oz approach.
The language used in the dataset is Dutch. The data in version_1
and version_2
use the gameplay code from their respective folder
in SPOTTER-game
with the same name.
The dataset contains interactions from 21 participants:
- 7 participants for
version_1
- 14 participants for
version_2
Data for each participant is contained in a separate .tsv file.
The dataset contains one Utterance per line. Utterances been annotated with the following features:
- Start: The start time of an utterance in seconds
- End: The end time of an utterance in seconds
- Text: The text in the utterance
- Speaker: The source of the utterance, either Human or Robot
- Mention: The part of the utterance which contains the description of a character
- Character: The gold annotation for the referent of the mention
- Round: The round of the game. Any utterances that are not part of a round (i.e. before or in between rounds) are annotated as '0'
- Transaction Unit: A unit of the interaction which contains the utterances and turns needed to resolve the mention for one referent and identify them in the picture
- Transaction Unit Relation: The relation between subsequent utterances within the same Transaction Unit. For a full list of relations, we refer to Appendix C of our paper.
- Dialog Act (DA): An automatically extracted Dialog Act for the utterance
- Dialog Act Confidence (DA_conf): The confidence score for the automatically extracted Dialog Act
See the table below for quantitative details about the dataset:
If you use our framework or data, please cite our paper:
@inproceedings{kruijt-etal-2024-spotter-framework, title = "{SPOTTER}: A Framework for Investigating Convention Formation in a Visually Grounded Human-Robot Reference Task", author = "Kruijt, Jaap and van Minkelen, Peggy and Donatelli, Lucia and Vossen, Piek T.J.M. and Konijn, Elly and Baier, Thomas", editor = "Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen", booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)", month = may, year = "2024", address = "Torino, Italy", publisher = "ELRA and ICCL", url = "https://aclanthology.org/2024.lrec-main.1322", pages = "15202--15215"}