Spectrotemporal Modulation (STM): Efficient and Interpretable Audio Feature Representation

This repository provides the Python and MATLAB scripts accompanying our paper accepted at Interspeech 2025:

Chang, A., Li, Y., Roman, I.R., Poeppel, D. (2025) Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds. Proc. Interspeech 2025, 216-220, doi: 10.21437/Interspeech.2025-1021

📄 ArXiv link

🔗 ISCA publisher link

Project Overview

This project introduces Spectrotemporal Modulation (STM), a signal processing feature representation inspired by the neurophysiological encoding in the human auditory cortex. It is designed to provide an efficient and interpretable framework for classifying diverse audio types, including speech, music, and environmental sounds.

Scripts and Reproducibility

The results presented in our paper are fully reproducible using the provided scripts:

Scripts are numbered sequentially to reflect the execution order.
Python environments and dependencies are specified in the ./conda_env directory.

Note: Due to file size constraints and copyright considerations, some audio data and output directories are excluded from this repository.

Citation

@inproceedings{chang25b_interspeech,
  title     = {{Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds}},
  author    = {{Andrew Chang and Yike Li and Iran R. Roman and David Poeppel}},
  year      = {{2025}},
  booktitle = {{Interspeech 2025}},
  pages     = {{216--220}},
  doi       = {{10.21437/Interspeech.2025-1021}},
  issn      = {{2958-1796}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 489 Commits
HPC_sbatch		HPC_sbatch
MATLAB_functions		MATLAB_functions
STM_fig		STM_fig
conda_env		conda_env
demucs		demucs
metaTables		metaTables
train_test_split		train_test_split
yamnet		yamnet
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
STM00_getAudioInfo.m		STM00_getAudioInfo.m
STM01_runSTM_HPC.m		STM01_runSTM_HPC.m
STM02_checkSlurm.m		STM02_checkSlurm.m
STM03_build_meta_data.py		STM03_build_meta_data.py
STM04-1_checkVoiceDetection.py		STM04-1_checkVoiceDetection.py
STM04_music_voice_detection.py		STM04_music_voice_detection.py
STM05_kFoldSplit.py		STM05_kFoldSplit.py
STM06_STMpreproc.py		STM06_STMpreproc.py
STM07-1_plot_tSNE_PCA.py		STM07-1_plot_tSNE_PCA.py
STM07_tSNE_PCA.py		STM07_tSNE_PCA.py
STM08gpu-1_checkmodel.py		STM08gpu-1_checkmodel.py
STM08gpu_MLP_STM_corpus.py		STM08gpu_MLP_STM_corpus.py
STM09_sklearn_classifiers.py		STM09_sklearn_classifiers.py
STM10-1_checkoutput.py		STM10-1_checkoutput.py
STM10_AST_emb.py		STM10_AST_emb.py
STM10_YAMNet_VGGish_emb.py		STM10_YAMNet_VGGish_emb.py
STM11gpu_MLP_AST_corpus.py		STM11gpu_MLP_AST_corpus.py
STM11gpu_MLP_VGGish_corpus.py		STM11gpu_MLP_VGGish_corpus.py
STM11gpu_MLP_YAMNet_corpus.py		STM11gpu_MLP_YAMNet_corpus.py
STM12_evaluate_model.ipynb		STM12_evaluate_model.ipynb
STM13_plot_averagedSTM.py		STM13_plot_averagedSTM.py
STM14_extract_melspectrogram.py		STM14_extract_melspectrogram.py
STM15gpu_MLP_melspectrogram_corpus.py		STM15gpu_MLP_melspectrogram_corpus.py
STM_Interspeech2025_supp.pdf		STM_Interspeech2025_supp.pdf
check_AST_model.ipynb		check_AST_model.ipynb
prepData.py		prepData.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spectrotemporal Modulation (STM): Efficient and Interpretable Audio Feature Representation

Project Overview

Scripts and Reproducibility

Citation

About

Uh oh!

Releases 3

Contributors 2

Uh oh!

Languages

License

curlsloth/MusicSpeech-STM

Folders and files

Latest commit

History

Repository files navigation

Spectrotemporal Modulation (STM): Efficient and Interpretable Audio Feature Representation

Project Overview

Scripts and Reproducibility

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Contributors 2

Uh oh!

Languages