The official implementation of Variable-Length Piano Infilling (VLI). (paper: Variable-Length Music Score Infilling via XLNet and Musically Specialized Positional Encoding)
VLI is a new Transformer-based model for music score infilling, i.e., to generate a polyphonic music sequence that fills in the gap between given past and future contexts. Our model can infill a variable number of notes for different time spans.
- Clone and install the modified Huggingface Transformer package.
git clone /~
cd Transformer
pip install -e .
cd ..
- Clone this repo and install the required packages.
git clone /~
cd variable-length-piano-infilling
pip install -r requirement.txt
- Download and unzip the Pop1K7 dataset. (Download link: here).
# Prepare data
python \
--midi-folder datasets/midi/midi_synchronized/ \
--save-folder ./
# Train the model
python --train
# Test the trained model
The codes to run baselines in our paper are in the baselines
We implement ILM and FELIX according to their paper (ILM and FELIX) and based on the implementation of Transformer-XL and BERT in Huggingface Transformer.
They can also be trained and tested through the same command as our model does above.
# cd baselines/ILM or cd baselines/FELIX
# Train the model
python --train \
--dict-file ../../dictionary.pickle \
--data-file ../../worded_data.pickle
# Test the trained model
python \
--dict-file ../../dictionary.pickle \
--data-file ../../worded_data.pickle
A demonstration page of the generated music can be found here.
The training NLL-loss curves of ours and the baseline models.The objective metrics evaluated on the music pieces generated by VLI(ours), ILM, FELIX, and the real music.
Results of the user study: mean opinion scores in 1–5 in M(melodic fluency), R(rhythmic fluency), I(im-pression), and percentage of votes in F(favorite), from ‘all’ the participants or only the music ‘pro’-fessionals.