Getting started

A Machine Learning Pipeline for Laughter Detection on the ICSI Corpus

This repo is based on the laughter detection model by Gillick et al. and retrains it on the ICSI Meeting corpus

The data pipeline uses Lhotse, a new Python library for speech and audio data preparation.

This repository consists of three main parts:

Evaluation Pipeline
Data Pipeline
Training Code

The following list outlines which parts of the repository belong to each of them and classifies the parts/files as one of three types:

from scratch: entirely written by myself
adapted: code taken from Gillick et al. and adapted
unmodified: code taken from Gillick et al. and not adapted or modified

Evalation Pipeline (from scratch):
- analysis
  - transcript_parsing/parse.py +preprocess.py: parsing and preprocessing the ICSI transcripts
  - analyse.py: main function, that parses and evaluates predictions from .TextGrid files output by the model
  - output_processing: scripts for creating .wav files for the laughter occurrences to manually evaluate them
- visualise.py: functions for visualising model performance (incl. prec-recall curve and confusion matrix)
Data Pipeline (from scratch) - also see diagram:
- compute_features: computes feature representing the whole corpus and specific subsets of the ICSI corpus
- create_data_df.py: creates a dataframe representing training, development and test-set
Training Code:
- models.py (unmodified): defines the model architecture
- train.py (adapted): main training code
- segment_laughter.py + laugh_segmenter.py (adpated): inference code to run laughter detection on audio files
- datasets.py + load_data.py (from scratch): the new LAD (Laugh Activity Detection) Dataset + new inference Dataset and code for their creation
Misc:
- Demo.ipynb (from scratch): demonstration of using Lhotse to compute features from a dataframe defining laughter and non-laughter segments
- config.py (adapted): configurations for different parts of the pipeline
- results.zip (N/A): contains the model predictions from experiments presented in my thesis

Diagram of the Data Pipeline

Getting started

Steps to get the environment setup from scratch such that training and evaluation can be run

Clone this repo
cd into the repo
create a python env and install all packages listed below. Put them in a requirments.txt file and run pip install -r requirments.txt
we use Lhotse's available recipe for the ICSI-corpus to download the corpus' audio + transcripts
- run the python script get_icsi_data.py
  - this will take a while to complete - it downloads all audio and transcriptions for the icsi corpus
  - after completion
    - you should have a data/icsi/speech folder with all your audio files grouped by meeting
    - you should have a data/icsi/transcripts folder with all the .mrt transcripts
Now create a .env file by copying the .sample.env-file to an .env file.

you can configure the folders to match your desired folder structure

Now you run compute_features.py once to compute the features for the whole corpus

the first time this will parse the transcripts and create indices with laughter and non-laughter segments (see Other documentation section below).
- This will take a while (e.g. it took one hour for me)
  - after initial creation the indices are cached and they are loaded from disk
that's done by the compute_features_per_split() method in the main() function
you can comment out the call to compute_features_for_cuts() in the main() function if you just want to create the features for the whole corpus for now

Then you run create_data_df to create a set of training samples
Then you need to run compute_features.py to create the cutset

this is done by the compute_features_for_cuts() function in the main() function

Other documentation

analysis-folder:

parse.py:
- functions for creating dataframes each containing all audio segments of a certain type (e.g. laughter, speech, etc.) - one per row. The columns for all these "segment dataframes" are the same
  - Columns: ['meeting_id', 'part_id', 'chan_id', 'start', 'end', 'length', 'type', 'laugh_type']
- Additionally parse.py creates one other dataframe, called info_df. This dataframe contains general information about each meeting.
  - Columns of info_df: ['meeting_id', 'part_id', 'chan_id', 'length', 'path']
preproces.py: functions for creating all the indices. An index in this context is a nested mapping. Each index maps a participant in a certain meeting to all transcribed "audio segments" of a certain type (e.g. laughter, speech, etc.) recorded by this participant's microphone. The "audio segments" are taken from the dataframes created in parse.py. Each segment is turned into an "openclosed"-interval which is a datatype provided by the Portion library (TODO link portion library). These intervals are joined into a single disjunction. Portion allows normal interval operations on such disjunctions which simplifies future logic, e.g. detecting whether predictions overlap with transcribed events.

All indices follow the same structure. They are defined as python dictionary of the following shape:

  {
      meeting_id: {
          tot_len: INT,
          tot_events: INT,
          part_id: P.openclosed(start,end) | P.openclosed(start,end),
          part_id: P.openclosed(start,end) | P.openclosed(start,end)
          ...
      }
      meeting_id: {
        ...
      }
      ...
  }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Machine Learning Pipeline for Laughter Detection on the ICSI Corpus

Diagram of the Data Pipeline

Getting started

Other documentation

analysis-folder:

About

Releases 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
analysis		analysis
cluster_scripts		cluster_scripts
data/icsi/data_dfs/samples		data/icsi/data_dfs/samples
docs		docs
misc_scripts		misc_scripts
utils		utils
.gitignore		.gitignore
CITATION.cff		CITATION.cff
Demo.ipynb		Demo.ipynb
LICENSE		LICENSE
README.md		README.md
chan_idx_map.pkl		chan_idx_map.pkl
compute_features.py		compute_features.py
config.py		config.py
create_data_df.py		create_data_df.py
datasets.py		datasets.py
get_icsi_data.py		get_icsi_data.py
laugh_segmenter.py		laugh_segmenter.py
load_data.py		load_data.py
models.py		models.py
requirements.txt		requirements.txt
results.zip		results.zip
sample.env		sample.env
segment_laughter.py		segment_laughter.py
train.py		train.py
visualise.py		visualise.py

License

LasseWolter/laughter-detection-icsi

Folders and files

Latest commit

History

Repository files navigation

A Machine Learning Pipeline for Laughter Detection on the ICSI Corpus

Diagram of the Data Pipeline

Getting started

Other documentation

analysis-folder:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Languages