DNNSeg implements deep neural sequence models for unsupervised speech processing and for testing hypotheses about human language acquisition from raw speech. DNNSeg is an elaboration on the model described in Elsner & Shain (2017; see implementation), and variants have been used to study the acquisition of phonological categories and features from speech (Shain & Elsner, 2019; Shain & Elsner, 2020). In its full form, DNNSeg infers hierarchically organized segment boundaries and category labels through end-to-end optimization of cognitively-inspired proxy objectives for compression (Baddeley et al., 1998) and predictive coding (Singer et al., 2018), using a special type of segmental recurrent unit (Chung et al., 2017). DNNSeg is thus based on the hypothesis that linguistic representations (e.g. phonemes, words, and possibly constituents) make the speech signal both easier to remember and easier to predict than non-linguistic ones, and it exploits this signal to extract linguistic generalizations from speech without supervision.
This repository is under active development, and reproducibility of previously published results is not guaranteed from the master branch.
For this reason, repository states associated with previous results are saved in Git branches.
To reproduce those results, checkout the relevant branch and follow the instructions in the README
.
Current reproduction branches are:
NAACL19
CoNLL20
Thus, to reproduce results from CoNLL20 (Shain & Elsner, 2020), for example, run git checkout CoNLL20
from the repository root, and follow instructions in the README
file.
Published results depend on both (1) datasets and (2) models as defined in experiment-specific configuration files. We do not distribute data with this repository.
Install DNNSeg by cloning this repository. To install python dependencies, install anaconda, then run the following commands from this repository root to create a new conda environment:
conda env create -f conda_dnnseg.yml
In addition, for models using cochleagram-based acoustic representations, you will need to install the pycochleagram
library by first activating the dnnseg
environment
conda activate dnnseg
and running the following in the repository root:
git clone /~https://github.com/mcdermottLab/pycochleagram.git;
cd pycochleagram;
python setup.py install
The dnnseg
environment needs to be activated (as above) before running DNNSeg.
Running DNNSeg on the Zerospeech 2015 challenge data requires four external resources:
- The Zerospeech metadata
- The Zerospeech track 2 repository
- The Buckeye Speech Corpus
- The Xitsonga portion of the NCHLT corpus
Once these have been acquired and downloaded, they should be preprocessed by running the following from the DNNSeg repository root:
python -m dnnseg.datasets.zerospeech.build <PATH-TO-ZS-METADATA> <PATH-TO-ZS-TRACK2> -b <PATH-TO-BSC> -x <PATH-TO-NCHLT> -o <PATH-TO-OUTPUT-DIR>
Model data and hyperparameters are defined in *.ini
config files. For an example config file, see dnnseg_model.ini
in the repository root. For a full description of all settings that can be controlled with the config file,
see the DNNSeg initialization params by running:
python3 -m dnnseg.bin.help
Once you have defined an *.ini
config file, fit the model by running the following from the repository root:
python3 -m dnnseg.bin.train <PATH>.ini
- Baddeley, Alan; Gathercole, Susan; and Papagno, Costanza (1998). The phonological loop as a language learning device. Psychological Review.
- Chung, Junyoung; Ahn, Sungjin; and Bengio, Yoshua. Hierarchical multiscale recurrent neural networks. ICLR17.
- Elsner, Micha and Shain, Cory (2017). Speech segmentation with a neural encoder model of working memory. EMNLP17.
- Shain, Cory and Elsner, Micha (2019). Measuring the perceptual availability of phonological features during language acquisition using unsupervised binary stochastic autoencoders. NAACL19.
- Shain, Cory and Elsner, Micha (2020). Acquiring language from speech by learning to remember and predict. CoNLL20.
- Singer, Yosef; Teramoto, Yayoi; Willmore, Ben D B; Schnupp, Jan W H; King, Andrew J; Harper, Nicol S. Sensory cortex is optimized for prediction of future input. eLife.