A fundamental problem in research into language and cultural change is the difficulty of distinguishing processes of stochastic drift (also known as neutral evolution) from processes that are subject to certain selection pressures. In this article, we describe a new technique based on Deep Neural Networks, in which we reformulate the detection of evolutionary forces in cultural change as a binary classification task. Using Residual Networks for time series trained on artificially generated samples of cultural change, we demonstrate that this technique is able to efficiently, accurately and consistently learn which aspects of the time series are distinctive for drift and selection. We compare the model with a recently proposed statistical test, the Frequency Increment Test, and show that the neural time series classification system provides a possible solution to some of the key problems of this test.
DOI: https://doi.org/10.1017/ehs.2020.52
See the supplementary materials for a brief tutorial describing how to train your own models.
Code to reconstruct the past-tense data set can be obtained from
https://github.com/mnewberry/ldrift. To run the past-tense analysis in
notebooks/past-tense.ipynb
, save the frequency list under data/coha-past-tense.txt
.
All code is implemented in Python 3.7. A detailed list of the requirements to run the code
can be found in the requirements.txt
file. This repository might be updated. To use the
code used to run the analyses in the paper, please download the submission release:
https://github.com/fbkarsdorp/nnfit/releases/tag/v1.0
To train your own models, run src/train.py
and follow the instructions therein.
This work is licensed under a Creative Commons Attribution 4.0 International License.