This repository provides the Python and MATLAB scripts accompanying our paper accepted at Interspeech 2025:
Chang, A., Li, Y., Roman, I.R., Poeppel, D. (2025) Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds. Proc. Interspeech 2025, 216-220, doi: 10.21437/Interspeech.2025-1021
This project introduces Spectrotemporal Modulation (STM), a signal processing feature representation inspired by the neurophysiological encoding in the human auditory cortex. It is designed to provide an efficient and interpretable framework for classifying diverse audio types, including speech, music, and environmental sounds.
The results presented in our paper are fully reproducible using the provided scripts:
- Scripts are numbered sequentially to reflect the execution order.
- Python environments and dependencies are specified in the
./conda_env
directory.
Note: Due to file size constraints and copyright considerations, some audio data and output directories are excluded from this repository.
@inproceedings{chang25b_interspeech,
title = {{Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds}},
author = {{Andrew Chang and Yike Li and Iran R. Roman and David Poeppel}},
year = {{2025}},
booktitle = {{Interspeech 2025}},
pages = {{216--220}},
doi = {{10.21437/Interspeech.2025-1021}},
issn = {{2958-1796}},
}