Blizzard 2013 Extension - Experiment Repository

The goal of this repository is to share the toolkits, scripts and configurations used to extend the blizzard 2013-EH2 by adding modern neural voices.

1 Overview of this repository

This repository is organized around the following structure:

.
├── bc_extension_cpu.yaml
├── bc_extension_gpu.yaml
├── README.org
├── references.bib
├── evaluation
│   ├── run.sh
│   └── scripts
├── helpers
│   ├── backup_models.sh
│   └── render.sh
├── prepare_data
│   ├── 01_extract_acoustic.sh
│   ├── 02_extract_training_labels.sh
│   ├── 03_extract_synthesis_labels.sh
│   ├── configurations
│   └── scripts
├── src
│   ├── ph_list
│   ├── test
│   └── train
└── toolkits
    ├── fastpitch
    ├── marytts
    ├── parallel_wavegan
    ├── tacotron
    └── wavenet

Here is the description of the key files/directories:

bc_extension_gpu.yaml: conda environment configuration assuming GPUs are accessible
bc_extension_cpu.yaml: conda environment configuration when no GPUs are available
README.org: this file
references.bib: the bibtex containing all the references used in this repository
prepare_data: the directory providing the scripts and configurations needed to prepare the data (i.e. extract features, prompts, …) to run the training and the synthesis
helpers: the directory containing additional scripts used to backup the models, achieve the synthesis, …
evaluation: the directory containing the resources to analyze the subjective evaluation results and compute some additional objective evaluation bits
toolkits: the directory containing the toolkits necessary to conduct the experiments (see further on how they are used)
src: the directory containing the data needed to run the experiments

2 Reproducing the experiments

2.1 Java pre-requisites (Label generation)

Following the shutdown of bintray, the configuration of MaryTTS has been updated.

install sdkman (see https://sdkman.io/install/) and maven (google to find this for your operating system)
activate the environment

sdk env activate

Install JTok

(cd toolkits/marytts/jtok; mvn install)

Install MaryTTS

(cd toolkits/marytts/marytts; ./gradlew publishToMavenLocal)

MaryTTS should now be ready to use!

2.2 Pre-requisites (Training and synthesis)

This repository relies on java and gradle to extract the labels as well as python:

The code has been tested using java 11 (this is a restrict). You can install it using
gradle is using wrappers, so no dependencies have to be explicitly installed
for python, it is easier to create a conda environment:

bc_extension_gpu.yaml
defines the environment for a use on GPU (recommended)

bc_extension_cpu.yaml
defines the environment for a use on CPU (test for synthesis)
Additional python packages which need to be installed in the environment (so after activating it!):

dllogger
/~https://github.com/NVIDIA/dllogger

apex
/~https://github.com/NVIDIA/apex

2.3 Data preparation

2.3.1 For training

Simply go to the directory prepare_data and run the following command:

# To extract the mel spectrograms
bash ./01_extract_acoustics.sh

# To get the labels, the prompts, the F0 (FastPitch), the duration (FastPitch) and the attention guides (Tacotron)
bash ./02_extract_training_labels.sh

The results will be available in the directory output/synthesis.

2.3.2 For synthesis

Simply go to the directory prepare_data and run the following command:

bash ./03_extract_synthesis_labels.sh

The results will be available in the directory output/synthesis.

2.4 Training

For all this part, we assume that the conda environment is activated! We also assume that the data preparation was ran (if not go to the previous section!).

2.4.1 WaveNet

For WaveNet, the training happens in the directory toolkits/wavenet/egs/bc_2013. The first thing to do is linking the dataset to what has been extracted during the data preparation:

ln -s $PWD/../../../../prepare_data/output/training/wn $PWD/dump

Then you can start the training as following:

bash run.sh

2.4.2 Parallel WaveGAN

For WaveNet, the training happens in the directory toolkits/wavenet/egs/bc_2013/voc1. The first thing to do is linking the dataset to what has been extracted during the data preparation:

ln -s $PWD/../../../../../prepare_data/output/training/wg $PWD/dump

Then you can start the training as following:

bash run.sh

2.4.3 FastPitch

For FastPitch, the training happens in the directory toolkits/fastpitch. The first thing to do is linking the dataset to what has been extracted during the data preparation:

mkdir bc_2013
ln -s $PWD/bc_2013/../../../prepare_data/output/training/fastpitch $PWD/bc_2013/dataset

Then you can start the training as following:

NUM_GPUS=1 BS=16 PH_DICT=bc_2013/dataset/ph_list bash scripts/train.sh

Here is the description for the used variables:

NUM_GPUS: the number of GPUs used for the training
BS: the batch size
PH_DICT: the path to the list of phonemes used in the corpus (if not defined, it will default to RADIO_ARPABET & ARCTIC)

2.4.4 Tacotron

For Tacotron, the training happens in the directory toolkits/tacotron. The first thing to do is linking the dataset to what has been extracted during the data preparation:

mkdir bc_2013
ln -s $PWD/bc_2013/../../../prepare_data/output/training/tacotron $PWD/bc_2013/data

Then you can start the training as following:

python train_pag.py -d bc_2013/data/ph_list

2.4.5 When this is over!

The last step is to backup the files to be compatible with the synthesis script. To do so, run the following command:

bash helpers/backup_models models

For this command, the models will be back up in the directory models. Change the argument if you want to change the backup directory

2.5 Synthesis

EXPES="fp tac wg wn" bash helpers/render.sh

2.6 Parsing the evaluation results

Simply go to the directory evaluation and run:

bash run.sh

The results will be available in the directory output.

3 Resources

The models obtained for the experiments are available at this address: https://www.cstr.ed.ac.uk/projects/blizzard/ under the section models (to access these models, you need to obtain a license for [The English audiobook data for the Blizzard Challenge 2013](https://www.cstr.ed.ac.uk/projects/blizzard/2013/lessac_blizzard2013/), then use the same credentials).

The samples are available and subjective evaluation results are available at this address: https://data.cstr.ed.ac.uk/blizzard/wavs_and_scores/2013-EH2-EXT.tar.gz

4 References

4.1 Citing this repository and the resulted experiments

@article{LeMaguer2024,
    title        = {The limits of the Mean Opinion Score for speech synthesis evaluation},
    author       = {Sébastien {Le Maguer} and Simon King and Naomi Harte},
    year         = 2024,
    journal      = {Computer, Speech \& Language},
    volume       = 84,
    pages        = 101577,
    doi          = {https://doi.org/10.1016/j.csl.2023.101577},
    issn         = {0885-2308},
    url          = {https://www.sciencedirect.com/science/article/pii/S0885230823000967},
}

4.2 Architectures & toolkits used in this repository

The citation keys are given to avoid wasting too much space. Please refer to the bibtex file references.bib to access the full entry.

Architecture	Description	Implementation
Tacotron	[cite:@Wang2017]	/~https://github.com/cassiavb/Tacotron/commit/946408f8cd7b5fe9c53931c631267ba2a723910d
FastPitch	[cite:@Lancucki2021]	/~https://github.com/NVIDIA/DeepLearningExamples/commit/6a642837c471c596aab7edf204384f66e9483ab2
WaveNet	[cite:@Oord2016]	/~https://github.com/r9y9/wavenet_vocoder/commit/a35fff76ea3687b05e1a10023cad3f7f64fa25a3
Parallel WaveGAN	[cite:@Yamamoto2020]	/~https://github.com/kan-bayashi/ParallelWaveGAN/commit/6d4411b65f9487de5ec49dabf029dc107f23192d

4.3 Additional tools/softwares

The citation keys are given to avoid wasting too much space. Please refer to the bibtex file references.bib to access the full entry.

Software	Description	Implementation
MaryTTS	[cite:@Steiner2018]	/~https://github.com/marytts/marytts
JTok		/~https://github.com/DFKI-MLT/JTok
Pyworld/World	[cite:@Morise2016]	/~https://github.com/mmorise/World, /~https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder
FlexEval	[cite:@Fayet2020]	https://gitlab.inria.fr/expression/tools/FlexEval

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Blizzard 2013 Extension - Experiment Repository

1 Overview of this repository

2 Reproducing the experiments

2.1 Java pre-requisites (Label generation)

2.2 Pre-requisites (Training and synthesis)

2.3 Data preparation

2.3.1 For training

2.3.2 For synthesis

2.4 Training

2.4.1 WaveNet

2.4.2 Parallel WaveGAN

2.4.3 FastPitch

2.4.4 Tacotron

2.4.5 When this is over!

2.5 Synthesis

2.6 Parsing the evaluation results

3 Resources

4 References

4.1 Citing this repository and the resulted experiments

4.2 Architectures & toolkits used in this repository

4.3 Additional tools/softwares

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
evaluation		evaluation
helpers		helpers
models		models
prepare_data		prepare_data
src		src
toolkits		toolkits
.gitignore		.gitignore
.sdkmanrc		.sdkmanrc
README.org		README.org
TODO.org		TODO.org
bc_extension_cpu.yaml		bc_extension_cpu.yaml
bc_extension_gpu.yaml		bc_extension_gpu.yaml
references.bib		references.bib

sigmedia/bc_2013_extension

Folders and files

Latest commit

History

Repository files navigation

Blizzard 2013 Extension - Experiment Repository

1 Overview of this repository

2 Reproducing the experiments

2.1 Java pre-requisites (Label generation)

2.2 Pre-requisites (Training and synthesis)

2.3 Data preparation

2.3.1 For training

2.3.2 For synthesis

2.4 Training

2.4.1 WaveNet

2.4.2 Parallel WaveGAN

2.4.3 FastPitch

2.4.4 Tacotron

2.4.5 When this is over!

2.5 Synthesis

2.6 Parsing the evaluation results

3 Resources

4 References

4.1 Citing this repository and the resulted experiments

4.2 Architectures & toolkits used in this repository

4.3 Additional tools/softwares

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages