This is the repository to accompany the paper:
Brian McMahan and Matthew Stone. 2020. Analyzing Speaker Strategy in
Referential Communication. In Proceedings of the SIGDIAL 2020 Conference.
We analyze a corpus of referential communication through the lens of quantitative models of speaker reasoning. Different models place different emphases on linguistic reasoning and collaborative reasoning. This leads models to make different assessments of the risks and rewards of using specific utterances in specific contexts. By fitting a latent variable model to the corpus, we can exhibit utterances that give systematic evidence of the diverse kinds of reasoning speakers employ, and build integrated models that recognize not only speaker reference but also speaker reasoning.
- data/
- filteredCorpus.csv
- cic_vectorized.csv
- A parsed & vectorized form of the CIC dataset
- Made with magis_sigdial2020.datasets.color_reference_2017.vectorized.make_or_load_cic())
- xkcd/
- lab/
- Description
- The
lab
directory is where model training and analyses are housed
- The
- README.md
- details around the reproducibility are here
- run_all.sh
- A script which will run most of the steps for reproducibility
- XKCD_model/
- Description
- The XKCD model is a reference model common to all speaker models in the paper
- There are 3 steps to training the model:
- training an uncalibrated version
- making the calibrations
- training a calibrated version
- configs
- The YAML configuration files for the 3 steps (prefixe E001, E002, and E003)
- logs
- The output of the 3 steps is stored here
- scripts
- the python files for executing the 3 steps are here:
- train_uncalibrated_model.py
- make_calibrations.py
- train_calibrated_model.py
- See
lab/README.md
for how they should be run
- the python files for executing the 3 steps are here:
- Description
- analyses/
- Description
- The reference model is used in 3 kinds of speaker models
- The speaker models are evaluated on the Colors in Context dataset
- The analysis numbers in the paper come from the notebooks in this directory
- configs
- The YAML configuration file for running the evaluation
- notebooks
- Two analyses:
- A001: compare speaker models independently
- The numbers for Section 4.2
- reports perplexity on CIC splits and the statistics around that
- A002: compare speaker models as a mixture
- The numbers for Section 4.3
- fits a mixture model to speaker model predictions
- reports winning model, matcher successes, and statistics around that
- A001: compare speaker models independently
- Two analyses:
- scripts
- evaluate_models_on_cic.py
- The python script for loading the speaker models, evaluating on CIC, and storing the result
- evaluate_models_on_cic.py
- src
- cic_results_lib.py
- Results loading and statistical tests
- emlib.py
- EM routine for the mixture analysis
- cic_results_lib.py
- Description
- Description
- src/
- sigdial_20202/
- contains the model, model training, and dataset code
- pyromancy/
- used for ML logging
- requirements.txt
- setup.py
- sigdial_20202/
- tests
See lab/README.md
and lab/run_all.sh
This repo has the data in it. As it stands, the total repo size is around 200MB.