Skip to content

This study converts piano recordings to mel spectrogram and classifies them by SOTA pre-trained neural network backbones in CV. Comparative experiments show that SqueezeNet achieves a best classification accuracy of 92.37%.|该项目将钢琴录音转为为mel频谱图,使用微调后的前沿计算机视觉领域预训练深度学习骨干网络对其进行分类,对比实验可知SqueezeNet作为最优网络正确率可达92.37%

License

Notifications You must be signed in to change notification settings

monetjoe/pianos

Repository files navigation

pianos

license Python application hf ms arxiv csmt

Classify piano sound quality by fine-tuned pre-trained CNN models.

Requirements

conda create -n py311 python=3.11 -y
conda activate py311
pip install -r requirements.txt

Usage

Maintenance

git clone [email protected]:monetjoe/pianos.git
cd pianos

Train

Assign a backbone(take squeezenet1_1 as an example) after --model to start training:

python train.py --model squeezenet1_1 --fullfinetune True --wce True

--fullfinetune True means full finetune, False means linear probing
--wce True means using focal loss

Supported backbones

Mirror 1 Mirror 2

Plot results

After finishing the training, use the below command to plot the latest results:

python plot.py

Results

A demo result of SqueezeNet fine-tuning:

Results Plots
Loss curve
Training and validation accuracy
Confusion matrix

Cite

@inproceedings{zhou2023holistic,
  title        = {A Holistic Evaluation of Piano Sound Quality},
  author       = {Monan Zhou and Shangda Wu and Shaohua Ji and Zijin Li and Wei Li},
  booktitle    = {National Conference on Sound and Music Technology},
  pages        = {3-17},
  year         = {2023},
  organization = {Springer}
}

About

This study converts piano recordings to mel spectrogram and classifies them by SOTA pre-trained neural network backbones in CV. Comparative experiments show that SqueezeNet achieves a best classification accuracy of 92.37%.|该项目将钢琴录音转为为mel频谱图,使用微调后的前沿计算机视觉领域预训练深度学习骨干网络对其进行分类,对比实验可知SqueezeNet作为最优网络正确率可达92.37%

Topics

Resources

License

Stars

Watchers

Forks

Languages