The derived dataset using the default settings is available here.
-
Download Lakh MIDI Dataset (LMD) with the following script.
./scripts/download_lmd.sh
(Or, download it manually here.)
-
Set the variables
LMD_ROOT
andLPD_ROOT
inrun.sh
and variables inconfig.py
to proper values. -
Derive all subsets and versions of LPD,
matched_ids.txt
andcleansed_ids.txt
with the following script../scripts/derive_lpd.sh
The derived labels can be found at
data/labels.tar.gz
.
-
Download the labels with the following script.
./scripts/download_labels.sh
-
Derive the labels with the following script.
./scripts/derive_labels.sh
-
Install GNU Parallel to run the synthesizer in parallel mode.
-
Synthesize audio files from multitrack pianorolls with the following script.
./scripts/batch_synthesize.sh ./data/lpd/lpd/lpd_cleansed/ \ ./data/synthesized/lpd_cleansed 20
(The above command will synthesize all the multitrack pianorolls in the LPD-cleansed subset with 20 parallel jobs.)