[MXNET-1210 ] Gluon Audio #13241

gaurav-gireesh · 2018-11-13T01:13:54Z

Description

Phase 1

As a user, I would like to have an out of the box feature of audio data loader and some popular audio transforms in MXNet, that would allow me :

to be able to load audio (only .wav files supported currently) files and make a Gluon AudioDataset (NDArrays),
apply some popular audio transforms on the audio data( example scaling, MEL, MFCC etc.),
load the Dataset using Gluon's DataLoader, train a neural network ( Ex: MLP) with this transformed audio dataset,
perform a simple audio data related task such as sounds classification - 1 audio clip with 1 label( Multiclass sound classification problem).
have an end to end example for a sample Audio multi class classification task (Urban Sounds Classification)

Note: The plan is to have a working piece of this model with an example into the contrib package of MXNet before it is agreed upon to move this implementation to the gluon.data module.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness.
Code is well-documented
For user-facing API changes, API doc string has been updated.
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
Example contributed

Changes

AudioFolderDataset(gluon.contrib.data.audio.datasets),
audio transforms (gluon.contrib.data.audio.transforms)

Comments

This PR is a WIP. I will be adding tests for the modules added along with the documentation and an example(Notebook or a python script)
This feature allows to take first step in audio data related tasks and needs to be extended for more generic as well as advanced use cases - (Reason for contributing in the gluon.contrib package)
Opened this PR for early feedback regarding:
- librosa - which is a dependency for this for audio load and feature extraction like mfcc, scale, mel etc.. This may have to be listed as a dependency in the CI script, or the tests will fail. Suggestions about where to include this or a work around is appreciated.
- Design here- Feedback is required - Dev list discussion here

gaurav-gireesh · 2018-11-13T01:25:52Z

@mxnet-label-bot add [pr-work-in-progress]

zachgk · 2018-11-13T03:16:18Z

@mxnet-label-bot add [Python, Gluon, Feature]

gaurav-gireesh · 2018-11-13T17:07:02Z

@marcoabreu Would appreciate your feedback regarding a way to add dependencies of librosa and sklearn libraries for this PR Marco. I use these libraries to read/load audio files and sklearn's label encoding. The CI is failing because these dependencies are not satisfied. I have not written test cases still the imports are failing. Thanks!

marcoabreu · 2018-11-13T17:29:46Z

Hi, you probably want to have a look at /~https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_python.sh

But you got a good point here. MXNet should not fail if we don't actually require these dependencies for the workload we have. Thus, I'd suggest that you first try to determine why we get these errors in the first place.

The last thing we want is to add a hard dependency (in this case on librosa and sklearn) although people might don't even want to use it.

…form

roywei · 2018-11-13T17:49:59Z

@gaurav-gireesh It's probably due to some code like `import gluon.dataset.*" automatically imported your audio dataset which requires librosa and sklearn. We should make sure if users don't need to use audio dataset, they don't need to install librosa and sklearn to build and use mxnet normally.

roywei

Thanks for your contribution! Great work!
I would like to remove the dependency on sklearn, we don't need to depend on it for a small feature like label encoder.
Please also add unit tests and examples/tutorials when you have them!

roywei · 2018-11-13T18:25:56Z

python/mxnet/gluon/contrib/data/audio/datasets.py

+                    label_tmp.append(line.split(",")[1].strip())
+            data_tmp = data_tmp[1:]
+            label_tmp = label_tmp[1:]
+            le = sklearn.preprocessing.LabelEncoder()


why not follow ImageFolderDataset to generate synset? why is sklearn LabelEncoder a must here?

Yes, I am working to find a workaround here. We needed label encoding here because we read from a csv file that contains filename(location) and the corresponding label. It is different from the use case in ImageFolderDataset that requires each class(or label) to be a folder( with images inside them), and hence Folder can be encountered once and put in the synsets(no repetition there). However, in the csv file, a label can occur multiple times. Looking into this for a better way to get around this.
Thanks!

Addressed this.

roywei · 2018-11-13T18:26:54Z

python/mxnet/gluon/contrib/data/audio/transforms.py

+        if not librosa:
+            raise RuntimeError("Librosa dependency is not installed! Install that and retry")
+
+        audio_tmp = np.mean(librosa.feature.mfcc(y=x.asnumpy(), sr=22050, n_mfcc=40).T, axis=0)


can we parameterize sr and n_mfcc instead of hard coding values?

Yes, addressed this. Thanks

roywei · 2018-11-13T18:29:26Z

python/mxnet/gluon/contrib/data/audio/transforms.py

+
+    """
+
+    def __init__(self, **kwargs):


what kwargs are available? as a new user, how do I know what can be passed? can we parameterize and document them well so users don't have to go over librosa documents?

Added a description in the docstring about the parameters that could be passed to the function.

…esent

gaurav-gireesh · 2018-11-15T18:16:29Z

@safrooze @sandeep-krishnamurthy Could you help with the review of this Pull request?
Would appreciate your comments and suggestions.
Here is the design proposal: https://cwiki.apache.org/confluence/display/MXNET/Gluon+-+Audio

szha · 2018-11-15T18:30:48Z

python/mxnet/gluon/contrib/data/audio/datasets.py

+    root : str
+        Path to root directory.
+    transform : callable, default None
+        A function that takes data and label and transforms them


Gluon is not going to support this argument anymore.

Okay. I will remove the argument. I am using transform_first() function instead.

@szha - Can you please add more details? Why dataset should not support transform and what is the alternative?

szha · 2018-11-15T18:34:41Z

python/mxnet/gluon/contrib/data/audio/transforms.py

+
+class Loader(Block):
+    """
+        This transform opens a filepath and converts that into an NDArray using librosa to load


This is not how we usually define transform functions in Gluon.

Removed the Loader transform.

…nsforms addressing PR comments, removed transform argument from datasets - AudioFolderDataset

vandanavk

Please also execute pylint and rst-lint on all these files and fix the errors reported

vandanavk · 2018-11-15T21:49:29Z

example/gluon/urban_sounds/urban_sounds.py

+    **https://mxnet.incubator.apache.org/install/ **
+    4. Librosa is installed. To install, follow the instructions here:
+     **https://librosa.github.io/librosa/install.html**
+


Could you move these instructions to a README file?

Made a README for the example. Thanks!

vandanavk · 2018-11-15T22:06:36Z

example/gluon/urban_sounds/urban_sounds.py

+except ImportError as er:
+    warnings.warn("Argument parsing module could not be imported and hence \
+    no arguments passed to the script can actually be parsed.")
+


A more concise warning would be good. Maybe it should fail on ImportError?
Also, what happens to argparse.ArgumentParser below when this import fails?

If this fails, I pass some default arguments. There is a check which returns if directories are empty.

vandanavk · 2018-11-15T22:10:00Z

example/gluon/urban_sounds/urban_sounds.py

+        if args.epochs:
+            epochs = args.epochs
+        else:
+            epochs = 35


default number of epochs is 35 here but but 30 in train() above. this should be same?

Yes. Thanks.

vandanavk · 2018-11-15T22:15:52Z

python/mxnet/gluon/contrib/data/audio/datasets.py

+    file_format: str, default '.wav'
+        The format of the audio files(.wav, .mp3)
+    skip_rows: int, default 0
+        While reading from csv file, how many rows to skip at the start of the file to aboid reading in header


typo - 'avoid'

Addressed this.

vandanavk · 2018-11-15T22:16:52Z

python/mxnet/gluon/contrib/data/audio/datasets.py

+        """
+        if not self._has_csv:
+            self.synsets = []
+            self.items = []


these 2 lines can be common as they appear in both the if and else part

vandanavk · 2018-11-15T23:07:02Z

example/gluon/urban_sounds/urban_sounds.py

+       The directory that contains the audio files on which predictions are to be made
+    """
+    if not librosa:
+        warnings.warn("Librosa dependency not installed! Cnnot load the audio to make predictions. Exitting.")


Corrected the typos.

vandanavk · 2018-11-15T23:09:39Z

python/mxnet/gluon/contrib/data/audio/transforms.py

+            x = x.asnumpy()
+        specs = librosa.feature.melspectrogram(x, **self.kwargs)
+        return nd.array(specs)
+


new line at the end

Yes. Making this consistent with the other scripts.

…c to write synset to a text file

gaurav-gireesh · 2018-11-16T17:38:16Z

@mxnet-label-bot remove [pr-work-in-progress]

gaurav-gireesh · 2018-11-16T17:39:19Z

@mxnet-label-bot add [pr-awaiting-review]

sandeep-krishnamurthy · 2018-11-16T19:17:02Z

example/gluon/urban_sounds/README.md

+To be able to run this example:
+
+1. Download the dataset(train.zip, test.zip) required for this example from the location:
+**https://drive.google.com/drive/folders/0By0bAi7hOBAFUHVXd1JCN3MwTEU**


Why this drive folder? Who owns this?

It is from an urban sounds challenge/contest hosted by https://datahack.analyticsvidhya.com/contest/practice-problem-urban-sound-classification/
It was a free registration contest. The data is hosted on an open Google drive link.
If needed these could be hosted on a public s3?
or
can we add steps as this:
/~https://github.com/apache/incubator-mxnet/blob/master/example/gluon/house_prices/kaggle_k_fold_cross_validation.py

sandeep-krishnamurthy · 2018-11-16T19:17:59Z

example/gluon/urban_sounds/README.md

+3. Apache MXNet is installed on the machine. For instructions, go to the link: **https://mxnet.incubator.apache.org/install/**
+
+4. Librosa is installed. To install, follow the instructions here:
+ **https://librosa.github.io/librosa/install.html**


May be we can provide the pip command and avoid user going through another page.

pip install librosa
For more details refer here - https://librosa.github.io/librosa/install.html

sandeep-krishnamurthy · 2018-11-16T19:18:58Z

example/gluon/urban_sounds/README.md

@@ -0,0 +1,16 @@
+# Urban Sounds classification in MXNet
+
+Urban Sounds Dataset:


Will be good to add what this dataset is and what is the problem being solved.

sandeep-krishnamurthy · 2018-11-16T19:19:26Z

example/gluon/urban_sounds/model.py

+# under the License.
+
+"""
+    This module builds a model an MLP woth a configurable output layer( number of units in the last layer)


nit: woth -> with

Why should it be configurable? Different number of classes. can you make it more clear.

sandeep-krishnamurthy · 2018-11-16T19:21:25Z

example/gluon/urban_sounds/model.py

+    with net.name_scope():
+        net.add(gluon.nn.Dense(256, activation="relu")) # 1st layer (256 nodes)
+        net.add(gluon.nn.Dense(256, activation="relu")) # 2nd hidden layer
+    net.add(gluon.nn.Dense(num_labels))


why out of name_scope ?

sandeep-krishnamurthy · 2018-11-16T20:12:09Z

python/mxnet/gluon/contrib/data/audio/datasets.py

+            return nd.array(X1), label
+
+        else:
+            warnings.warn(" Dependency librosa is not installed! \


these checks are already done in init?

Since, user does not mandatorily need to use the audio module, these import failures cannot be raised as an exception. In init(), it issues a warning if librosa not found and returns. However, if get_item() is called, it will not be able to librosa load the audio to numpy arrays. These do not fail but issues a warning to the user, that librosa is needed for those functionalities. The module import of librosa is to be guarded as librosa is not supposed to be a hard dependency for MXNet.
What do you recommend? Should creation of AudioFolderDataset() error out if librosa not installed?

sandeep-krishnamurthy · 2018-11-16T20:15:26Z

python/mxnet/gluon/contrib/data/audio/transforms.py

+        super(MFCC, self).__init__()
+
+    def forward(self, x):
+        if not librosa:


this check is already done at the top

Yes. Not all the transforms require librosa like Scaling and Pad/Trim. so, testing the availability of librosa in the methods that actually use it and warning the user that this particular transform cannot be applied. What do you recommend?

sandeep-krishnamurthy · 2018-11-16T20:16:29Z

python/mxnet/gluon/contrib/data/audio/transforms.py

+        elif isinstance(x, nd.NDArray):
+            y = x.asnumpy()
+        else:
+            warnings.warn("Input object is not numpy or NDArray... Cannot apply the transform: MFCC!")


Messaging can be better

sandeep-krishnamurthy · 2018-11-16T20:22:13Z

python/mxnet/gluon/contrib/data/audio/transforms.py

+from ..... import ndarray as nd
+from ....block import Block
+
+class MFCC(Block):


All these transforms are too specific to librosa. It takes librosa parameters as kwargs. What if we change backend from librosa to other library or implement them as mxnet operator?

Can we conceptualize these transforms and define the standard parameters?

Yes. It makes sense to do what you have suggested. Some other frameworks pass along all the keywords arguments so this approach was chosen.

sandeep-krishnamurthy · 2018-11-16T20:24:16Z

tests/python/unittest/test_contrib_gluon_data_audio.py

+from common import with_seed
+
+
+@with_seed()


Can you please add tests for the new audio folder dataset?

vandanavk · 2018-11-16T21:30:43Z

example/gluon/urban_sounds/predict.py

+
+def predict(pred_dir='./Test'):
+    """
+        The function is used to run predictions on the audio files in the directory `pred_directory`


Please check this line for rst_lint errors

vandanavk · 2018-11-16T21:34:26Z

python/mxnet/gluon/contrib/data/audio/datasets.py

+
+    def _list_audio_files(self, root, skip_rows=0):
+        """
+            Populates synsets - a map of index to label for the data items.


Please check with rst_lint for errors here

vandanavk · 2018-11-16T21:36:24Z

python/mxnet/gluon/contrib/data/audio/datasets.py

+        filename = self.items[idx][0]
+        label = self.items[idx][1]
+
+        if librosa is not None:


just if librosa ?

vandanavk · 2018-11-16T21:38:09Z

python/mxnet/gluon/contrib/data/audio/datasets.py

+            Retrieve the item (data, label) stored at idx in items
+        """
+        filename = self.items[idx][0]
+        label = self.items[idx][1]


filename, label = self.items[idx] ?

vandanavk · 2018-11-16T21:38:39Z

python/mxnet/gluon/contrib/data/audio/datasets.py

+
+    def __len__(self):
+        """
+            Retrieves the number of items in the dataset


check rst_lint here

vandanavk · 2018-11-16T21:40:07Z

python/mxnet/gluon/contrib/data/audio/datasets.py

+        Parameters
+        ----------
+        fn : callable
+            A transformer function that takes the first elemtn of a sample


vandanavk · 2018-11-16T22:25:49Z

python/mxnet/gluon/contrib/data/audio/transforms.py

+class MFCC(Block):
+    """
+        Extracts Mel frequency cepstrum coefficients from the audio data file
+        More details : https://librosa.github.io/librosa/generated/librosa.feature.mfcc.html


Please check rst-lint here

vandanavk · 2018-11-16T22:27:33Z

python/mxnet/gluon/contrib/data/audio/transforms.py

+            x = x.asnumpy()
+        specs = librosa.feature.melspectrogram(x, **self.kwargs)
+        return nd.array(specs)
+


nit: newline at the end

vandanavk · 2018-11-16T22:27:55Z

tests/python/unittest/test_contrib_gluon_data_audio.py

+@with_seed()
+def test_pad_trim():
+    """
+        Function to test Pad/Trim Audio transform


Please check rst-lint here

vandanavk · 2018-11-16T22:28:47Z

tests/python/unittest/test_contrib_gluon_data_audio.py

+@with_seed()
+def test_scale():
+    """
+        Function to test scaling of the audio transform


Please execute pylint and rst-lint on all these files.

vandanavk · 2018-11-16T23:01:53Z

python/mxnet/gluon/contrib/data/audio/transforms.py

+    n_fft: int, default 2048
+        length of the Fast fourier transform window
+    n_mels: int,
+        number of mel bands to generate


is there a default value?

vandanavk · 2018-11-16T23:02:52Z

python/mxnet/gluon/contrib/data/audio/transforms.py

+class MEL(Block):
+    """Create MEL Spectrograms from a raw audio signal. Relatively pretty slow.
+
+    Parameters


I think rst expects class attributes to be under the "Attributes" section. "Parameters" for function params. Please double check this.

szha

My concerns about the current code are:

not enough use cases to validate the API.
API design not conforming to Gluon standards.
implementation efficiency.

For now, I'd suggest putting these code as one of the Gluon examples to receive feedbacks. Also, make sure that code that isn't used in the example doesn't get checked in.

sandeep-krishnamurthy · 2018-11-19T21:16:08Z

@szha
Can you please provide more details on the following points.

API design not conforming to Gluon standards.

Implementation efficiency.

What can be the actionable feedback we can provide here to bring it to Contrib?

szha · 2018-11-19T22:37:47Z

I chatted with @gaurav-gireesh and agreed on having these classes in the examples first. Once two more models are added we can consider making it part of API. Right now there's lack of evidence that these thin wrappers of librosa won't become bottleneck for proper audio-related applications such as ASR, and the compromise to have this in examples is a way to collect more use cases. I'd suggest having DeepSpeech2 and 3 and at least one ASR model with proper WFST models as validation.

@sandeep-krishnamurthy if you know better feel free to suggest.

szha · 2018-11-20T17:42:00Z

#13325

Added AudioFolderDataset and some popular audio transforms

18f0634

gaurav-gireesh requested a review from szha as a code owner November 13, 2018 01:13

marcoabreu added the pr-work-in-progress PR is still work in progress label Nov 13, 2018

marcoabreu added Feature Gluon Python labels Nov 13, 2018

Removed referencing in imports and added a check for ndarray in trans…

03e956f

…form

roywei suggested changes Nov 13, 2018

View reviewed changes

gaurav-gireesh added 5 commits November 13, 2018 10:33

Module import enclosed within try/catch

d278672

Reordering imports and fixing lint errors

9a49683

Reordering imports and fixing lint errors, ModuleNotFoundError not pr…

f771728

…esent

removed dependency from sklearn

73ebd25

Transforms - added Documentation of parameters

e1a9c9a

szha added Feature request and removed Feature labels Nov 14, 2018

gaurav-gireesh added 5 commits November 13, 2018 18:49

Organized imports

a40890f

Test cases for the transforms added

16bb51d

Adding librosa in the ci install script

7096331

Reverting the change to see if CI passes

8d6fd4d

Added an example on Urban Sounds dataset using AudioFolderDataset

eabb682

szha reviewed Nov 15, 2018

View reviewed changes

Adding prediction support in the example, Loader removed from the tra…

9a24af5

…nsforms addressing PR comments, removed transform argument from datasets - AudioFolderDataset

vandanavk reviewed Nov 15, 2018

View reviewed changes

gaurav-gireesh added 6 commits November 15, 2018 16:30

Modularized the example into train, model and predict. Added the logi…

4b62ee6

…c to write synset to a text file

removed unused imports

219b08e

Addressed PR Comments

e1a8037

synset.txt filename changed

21c7a4b

Parameter name updated

bbebd25

Updated README

65f3491

gaurav-gireesh changed the title ~~[MXNET-1210 ][WIP] Gluon Audio~~ [MXNET-1210 ] Gluon Audio Nov 16, 2018

marcoabreu removed the pr-work-in-progress PR is still work in progress label Nov 16, 2018

marcoabreu added the pr-awaiting-review PR is waiting for code review label Nov 16, 2018

sandeep-krishnamurthy suggested changes Nov 16, 2018

View reviewed changes

vandanavk reviewed Nov 16, 2018

View reviewed changes

szha suggested changes Nov 16, 2018

View reviewed changes

Addressed PR comments

ececde7

szha closed this Nov 20, 2018

		@@ -0,0 +1,16 @@
		# Urban Sounds classification in MXNet

		Urban Sounds Dataset:

[MXNET-1210 ] Gluon Audio #13241

[MXNET-1210 ] Gluon Audio #13241

Conversation

gaurav-gireesh commented Nov 13, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Comments

gaurav-gireesh commented Nov 13, 2018

zachgk commented Nov 13, 2018

gaurav-gireesh commented Nov 13, 2018

marcoabreu commented Nov 13, 2018

roywei commented Nov 13, 2018

roywei left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gaurav-gireesh commented Nov 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vandanavk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gaurav-gireesh commented Nov 16, 2018

gaurav-gireesh commented Nov 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szha left a comment

Choose a reason for hiding this comment

sandeep-krishnamurthy commented Nov 19, 2018

szha commented Nov 19, 2018

szha commented Nov 20, 2018

gaurav-gireesh commented Nov 13, 2018 •

edited

Loading

roywei left a comment •

edited

Loading