Official PyTorch implementation of the paper:
- Shih-Lun Wu and Yi-Hsuan Yang
Compose & Embellish: Well-Structured Piano Performance Generation via A Two-Stage Approach
Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2023
Paper | Audio demo (Google Drive) | Model weights
- Python 3.8 and CUDA 10.2 recommended
- Install dependencies
pip install -r requirements.txt pip install git+https://github.com/cifkao/fast-transformers.git@39e726864d1a279c9719d33a95868a4ea2fb5ac5
- Download trained models from HuggingFace Hub (make sure you're in repository root directory)
git clone https://huggingface.co/slseanwu/compose-and-embellish-pop1k7
- Stage 1: generate lead sheets (i.e., melody + chord progression)
You'll have 20 lead sheets under
python3 stage01_compose/inference.py \ stage01_compose/config/pop1k7_finetune.yaml \ generation/stage01 \ 20
generation/stage01
after this step. - Stage 2: generate full performances conditioned on Stage 1 lead sheets
The
python3 stage02_embellish/inference.py \ stage02_embellish/config/pop1k7_default.yaml \ generation/stage01 \ generation/stage02
samp_**_2stage_samp**.mid
files undergeneration/stage02
are the final results.
- Stage 1: lead sheet (i.e. "Compose") model
python3 stage01_compose/train.py stage01_compose/config/pop1k7_finetune.yaml
- Stage 2: performance (i.e. "Embellish") model
python3 stage02_embellish/train.py stage02_embellish/config/pop1k7_default.yaml
Note that these two commands may be run in parallel.
If you'd like to experiment with your own datasets, we suggest that you
- read our dataloaders (stage 1, stage 2) and
.pkl
files of our processed datasets (stage 1, stage 2) to understand what the models receive as inputs - refer to CP Transformer repo for a general guide on converting audio/MIDI files to event-based representations
- use musical structure analyzer to get required structure markings for our stage 1 models.
We would like to thank the following people for their open-source implementations that paved the way for our work:
- Performer (fast-transformers): Angelos Katharopoulos (@angeloskath) and Ondřej Cífka (@cifkao)
- Transformer w/ relative positional encoding: Zhilin Yang (@kimiyoung)
- Musical structure analysis: Shuqi Dai (@Dsqvival)
- LakhMIDI melody identification: Thomas Melistas (@gulnazaki)
- Skyline melody extraction: Wen-Yi Hsiao (@wayne391) and Yi-Hui Chou (@sophia1488)
If this repo helps with your research, please consider citing:
@inproceedings{wu2023compembellish,
title={{Compose \& Embellish}: Well-Structured Piano Performance Generation via A Two-Stage Approach},
author={Wu, Shih-Lun and Yang, Yi-Hsuan},
booktitle={Proc. Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP)},
year={2023},
url={https://arxiv.org/pdf/2209.08212.pdf}
}