Openspeech's Hydra configuration

Jun 19, 2021

This page describes how openspeech uses Hydra to manage configuration.

What is Hydra?

Hydra is an open-source Python framework that simplifies the development of research and other complex applications. The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line. The name Hydra comes from its ability to run multiple similar jobs - much like a Hydra with multiple heads.


Openspeech aims to provide as many options as possible. However, too many options cause a lot of confusion to users. . To address this problem, we needed a hierarchical configuration management toolkit. Hydra was the best choice for us in that respect. We have referred to the structure of Fairseq that successfully applied Hydra. Thank you for fairseq team.

We thought a lot about which method is better, using the YAML file or using @dataclass. After much consideration, we decided to use @dataclass, which is easy to understand each module's configuration. @dataclass has default values for that module and has been configured to be stored in the same Python file as each module.

Additionally, Hydra has a rich and growing library of plugins that provide functionality such as hyperparameter sweeping (including using bayesian optimization through the Ax library), job launching across various platforms, and more.

Creating or migrating components

In general, each new (or updated) component should provide a companion dataclass. These dataclass are typically located in the same file as the component and are passed as arguments to the register_*() functions. These classes are decorated with a @dataclass decorator, and typically inherit from OpenspeechDataclass.


from dataclasses import dataclass, field
from openspeech.dataclass.configurations import OpenspeechDataclass

class ConformerLSTMConfigs(OpenspeechDataclass):
    model_name: str = field(
        default="conformer_lstm", metadata={"help": "Model name"}
    encoder_dim: int = field(
        default=256, metadata={"help": "Dimension of encoder."}

@register_*() function

We actively utilized the @register_*() function inspired by Fairseq. The function @register_*() automatically registers classes and associated data classes. This method is very effective when adding new modules. Below is an example of how register_*() functions and data classes are utilized in Openspeech.

Model example:

class TransformerConfigs(ModelConfigs):
    model_name: str = field(
        default="transformer", metadata={"help": "Model name"}
    extractor: str = field(
        default="vgg", metadata={"help": "The CNN feature extractor."}

@register_model('transformer', dataclass=TransformerConfigs)
class SpeechTransformerModel(OpenspeechEncoderDecoderModel):
    def build_model(self):

Dataset example:

class MelSpectrogramConfigs(AudioConfigs):
    name: str = field(
        default="melspectrogram", metadata={"help": "Name of dataset."}
    num_mels: int = field(
        default=80, metadata={"help": "The number of mfc coefficients to retain."}

@register_dataset("melspectrogram", dataclass=MelSpectrogramConfigs)
class MelSpectrogramDataset(AudioDataset):
    def __init__(self):

Openspeech's configuration structure

Below are the configuration dataclasses that you can select from Openspeech.

  - audio: 
    - fbank
    - melspectrogram
    - mfcc
    - spectrogram
  - common: 
    - kspon
    - libri
    - aishell
  - criterion: 
    - cross_entropy
    - ctc
    - joint_ctc_cross_entropy
    - label_smoothed_cross_entropy
    - transducer
  - lr_scheduler: 
    - reduce_lr_on_plateau
    - transformer
    - tri_stage
    - warmup_reduce_lr_on_plateau
    - warmup
  - model: 
    - conformer_encoder_only
    - conformer_lstm
    - conformer_transducer
    - deepspeech2
    - jasper
    - listen_attend_spell
    - rnn_transducer
    - transformer
    - transformer_transducer
  - trainer: 
    - cpu
    - gpu
    - tpu
    - cpu-fp64
    - gpu-fp16
    - tpu-fp16
  - vocab: 
    - aishell_character
    - kspon_character
    - kspon_subword
    - kspon_grapheme
    - libri_character
    - libri_subword

Training with

On startup, Hydra will create a configuration object that contains a hierarchy of all the necessary dataclasses populated with their default values in the code.

Some of the most common use cases are shown below:

1. Override default values through command line:

$ python ./openspeech_cli/ \
    common=libri \
+   common.dataset_path=$DATASET_PATH \
+   common.dataset_download=True \
+   common.manifest_file_path=$MANIFEST_FILE_PATH \  
    vocab=libri_subword \
+   vocab.vocab_size=10000 \
    model=conformer_lstm \
+   model.encoder_dim=320 \
    audio=mfcc \
    lr_scheduler=warmup_reduce_lr_on_plateau \
    trainer=gpu-fp16 \

Note that along with explicitly providing values for parameters such as common.dataset_path, this also tells Hydra to overlay configuration found in dataclass. If you want to train a model without specifying a particular architecture you can simply specify model=conformer_lstm.

2. Add new configuration through command line:

$ python ./openspeech_cli/ \
    common=libri \
    vocab=libri_subword \
    model=conformer_lstm \
    audio=mfcc \
    lr_scheduler=warmup_reduce_lr_on_plateau \
    trainer=gpu-fp16 \
+   +trainer.is_gpu=True \
+   +trainer.is_tpu=False \
    criterion=ctc \

More detailed methods of using hydra can be found Hydra website. If you have any questions, feel free to send me an email or create an issue.