Skip to content

Experiment repository to extend the task 2013-EH2 of the Blizzard Challenge 2013

Notifications You must be signed in to change notification settings

sigmedia/bc_2013_extension

Repository files navigation

Blizzard 2013 Extension - Experiment Repository

The goal of this repository is to share the toolkits, scripts and configurations used to extend the blizzard 2013-EH2 by adding modern neural voices.

1 Overview of this repository

This repository is organized around the following structure:

.
├── bc_extension_cpu.yaml
├── bc_extension_gpu.yaml
├── README.org
├── references.bib
├── evaluation
│   ├── run.sh
│   └── scripts
├── helpers
│   ├── backup_models.sh
│   └── render.sh
├── prepare_data
│   ├── 01_extract_acoustic.sh
│   ├── 02_extract_training_labels.sh
│   ├── 03_extract_synthesis_labels.sh
│   ├── configurations
│   └── scripts
├── src
│   ├── ph_list
│   ├── test
│   └── train
└── toolkits
    ├── fastpitch
    ├── marytts
    ├── parallel_wavegan
    ├── tacotron
    └── wavenet

Here is the description of the key files/directories:

bc_extension_gpu.yaml
conda environment configuration assuming GPUs are accessible
bc_extension_cpu.yaml
conda environment configuration when no GPUs are available
README.org
this file
references.bib
the bibtex containing all the references used in this repository
prepare_data
the directory providing the scripts and configurations needed to prepare the data (i.e. extract features, prompts, …) to run the training and the synthesis
helpers
the directory containing additional scripts used to backup the models, achieve the synthesis, …
evaluation
the directory containing the resources to analyze the subjective evaluation results and compute some additional objective evaluation bits
toolkits
the directory containing the toolkits necessary to conduct the experiments (see further on how they are used)
src
the directory containing the data needed to run the experiments

2 Reproducing the experiments

2.1 Java pre-requisites (Label generation)

Following the shutdown of bintray, the configuration of MaryTTS has been updated.

  • install sdkman (see https://sdkman.io/install/) and maven (google to find this for your operating system)
  • activate the environment
sdk env activate
  • Install JTok
(cd toolkits/marytts/jtok; mvn install)
  • Install MaryTTS
(cd toolkits/marytts/marytts; ./gradlew publishToMavenLocal)

MaryTTS should now be ready to use!

2.2 Pre-requisites (Training and synthesis)

This repository relies on java and gradle to extract the labels as well as python:

  • The code has been tested using java 11 (this is a restrict). You can install it using
  • gradle is using wrappers, so no dependencies have to be explicitly installed
  • for python, it is easier to create a conda environment:
    bc_extension_gpu.yaml
    defines the environment for a use on GPU (recommended)
    bc_extension_cpu.yaml
    defines the environment for a use on CPU (test for synthesis)
  • Additional python packages which need to be installed in the environment (so after activating it!):
    dllogger
    /~https://github.com/NVIDIA/dllogger
    apex
    /~https://github.com/NVIDIA/apex

2.3 Data preparation

2.3.1 For training

Simply go to the directory prepare_data and run the following command:

# To extract the mel spectrograms
bash ./01_extract_acoustics.sh

# To get the labels, the prompts, the F0 (FastPitch), the duration (FastPitch) and the attention guides (Tacotron)
bash ./02_extract_training_labels.sh

The results will be available in the directory output/synthesis.

2.3.2 For synthesis

Simply go to the directory prepare_data and run the following command:

bash ./03_extract_synthesis_labels.sh

The results will be available in the directory output/synthesis.

2.4 Training

For all this part, we assume that the conda environment is activated! We also assume that the data preparation was ran (if not go to the previous section!).

2.4.1 WaveNet

For WaveNet, the training happens in the directory toolkits/wavenet/egs/bc_2013. The first thing to do is linking the dataset to what has been extracted during the data preparation:

ln -s $PWD/../../../../prepare_data/output/training/wn $PWD/dump

Then you can start the training as following:

bash run.sh

2.4.2 Parallel WaveGAN

For WaveNet, the training happens in the directory toolkits/wavenet/egs/bc_2013/voc1. The first thing to do is linking the dataset to what has been extracted during the data preparation:

ln -s $PWD/../../../../../prepare_data/output/training/wg $PWD/dump

Then you can start the training as following:

bash run.sh

2.4.3 FastPitch

For FastPitch, the training happens in the directory toolkits/fastpitch. The first thing to do is linking the dataset to what has been extracted during the data preparation:

mkdir bc_2013
ln -s $PWD/bc_2013/../../../prepare_data/output/training/fastpitch $PWD/bc_2013/dataset

Then you can start the training as following:

NUM_GPUS=1 BS=16 PH_DICT=bc_2013/dataset/ph_list bash scripts/train.sh

Here is the description for the used variables:

NUM_GPUS
the number of GPUs used for the training
BS
the batch size
PH_DICT
the path to the list of phonemes used in the corpus (if not defined, it will default to RADIO_ARPABET & ARCTIC)

2.4.4 Tacotron

For Tacotron, the training happens in the directory toolkits/tacotron. The first thing to do is linking the dataset to what has been extracted during the data preparation:

mkdir bc_2013
ln -s $PWD/bc_2013/../../../prepare_data/output/training/tacotron $PWD/bc_2013/data

Then you can start the training as following:

python train_pag.py -d bc_2013/data/ph_list

2.4.5 When this is over!

The last step is to backup the files to be compatible with the synthesis script. To do so, run the following command:

bash helpers/backup_models models

For this command, the models will be back up in the directory models. Change the argument if you want to change the backup directory

2.5 Synthesis

EXPES="fp tac wg wn" bash helpers/render.sh

2.6 Parsing the evaluation results

Simply go to the directory evaluation and run:

bash run.sh

The results will be available in the directory output.

3 Resources

The models obtained for the experiments are available at this address: https://www.cstr.ed.ac.uk/projects/blizzard/ under the section models (to access these models, you need to obtain a license for [The English audiobook data for the Blizzard Challenge 2013](https://www.cstr.ed.ac.uk/projects/blizzard/2013/lessac_blizzard2013/), then use the same credentials).

The samples are available and subjective evaluation results are available at this address: https://data.cstr.ed.ac.uk/blizzard/wavs_and_scores/2013-EH2-EXT.tar.gz

4 References

4.1 Citing this repository and the resulted experiments

@article{LeMaguer2024,
    title        = {The limits of the Mean Opinion Score for speech synthesis evaluation},
    author       = {Sébastien {Le Maguer} and Simon King and Naomi Harte},
    year         = 2024,
    journal      = {Computer, Speech \& Language},
    volume       = 84,
    pages        = 101577,
    doi          = {https://doi.org/10.1016/j.csl.2023.101577},
    issn         = {0885-2308},
    url          = {https://www.sciencedirect.com/science/article/pii/S0885230823000967},
}

4.2 Architectures & toolkits used in this repository

The citation keys are given to avoid wasting too much space. Please refer to the bibtex file references.bib to access the full entry.

ArchitectureDescriptionImplementation
Tacotron[cite:@Wang2017]/~https://github.com/cassiavb/Tacotron/commit/946408f8cd7b5fe9c53931c631267ba2a723910d
FastPitch[cite:@Lancucki2021]/~https://github.com/NVIDIA/DeepLearningExamples/commit/6a642837c471c596aab7edf204384f66e9483ab2
WaveNet[cite:@Oord2016]/~https://github.com/r9y9/wavenet_vocoder/commit/a35fff76ea3687b05e1a10023cad3f7f64fa25a3
Parallel WaveGAN[cite:@Yamamoto2020]/~https://github.com/kan-bayashi/ParallelWaveGAN/commit/6d4411b65f9487de5ec49dabf029dc107f23192d

4.3 Additional tools/softwares

The citation keys are given to avoid wasting too much space. Please refer to the bibtex file references.bib to access the full entry.

SoftwareDescriptionImplementation
MaryTTS[cite:@Steiner2018]/~https://github.com/marytts/marytts
JTok/~https://github.com/DFKI-MLT/JTok
Pyworld/World[cite:@Morise2016]/~https://github.com/mmorise/World, /~https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder
FlexEval[cite:@Fayet2020]https://gitlab.inria.fr/expression/tools/FlexEval

About

Experiment repository to extend the task 2013-EH2 of the Blizzard Challenge 2013

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published