-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
@mxnet-label-bot add [pr-work-in-progress] |
@mxnet-label-bot add [Python, Gluon, Feature] |
@marcoabreu Would appreciate your feedback regarding a way to add dependencies of librosa and sklearn libraries for this PR Marco. I use these libraries to read/load audio files and sklearn's label encoding. The CI is failing because these dependencies are not satisfied. I have not written test cases still the imports are failing. Thanks! |
Hi, you probably want to have a look at /~https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_python.sh But you got a good point here. MXNet should not fail if we don't actually require these dependencies for the workload we have. Thus, I'd suggest that you first try to determine why we get these errors in the first place. The last thing we want is to add a hard dependency (in this case on librosa and sklearn) although people might don't even want to use it. |
@gaurav-gireesh It's probably due to some code like `import gluon.dataset.*" automatically imported your audio dataset which requires librosa and sklearn. We should make sure if users don't need to use audio dataset, they don't need to install librosa and sklearn to build and use mxnet normally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! Great work!
I would like to remove the dependency on sklearn, we don't need to depend on it for a small feature like label encoder.
Please also add unit tests and examples/tutorials when you have them!
label_tmp.append(line.split(",")[1].strip()) | ||
data_tmp = data_tmp[1:] | ||
label_tmp = label_tmp[1:] | ||
le = sklearn.preprocessing.LabelEncoder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not follow ImageFolderDataset to generate synset? why is sklearn LabelEncoder a must here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am working to find a workaround here. We needed label encoding here because we read from a csv file that contains filename(location) and the corresponding label. It is different from the use case in ImageFolderDataset that requires each class(or label) to be a folder( with images inside them), and hence Folder can be encountered once and put in the synsets(no repetition there). However, in the csv file, a label can occur multiple times. Looking into this for a better way to get around this.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed this.
if not librosa: | ||
raise RuntimeError("Librosa dependency is not installed! Install that and retry") | ||
|
||
audio_tmp = np.mean(librosa.feature.mfcc(y=x.asnumpy(), sr=22050, n_mfcc=40).T, axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we parameterize sr
and n_mfcc
instead of hard coding values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, addressed this. Thanks
|
||
""" | ||
|
||
def __init__(self, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kwargs
are available? as a new user, how do I know what can be passed? can we parameterize and document them well so users don't have to go over librosa documents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a description in the docstring about the parameters that could be passed to the function.
@safrooze @sandeep-krishnamurthy Could you help with the review of this Pull request? |
root : str | ||
Path to root directory. | ||
transform : callable, default None | ||
A function that takes data and label and transforms them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gluon is not going to support this argument anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I will remove the argument. I am using transform_first() function instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szha - Can you please add more details? Why dataset should not support transform and what is the alternative?
|
||
class Loader(Block): | ||
""" | ||
This transform opens a filepath and converts that into an NDArray using librosa to load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not how we usually define transform functions in Gluon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the Loader transform.
…nsforms addressing PR comments, removed transform argument from datasets - AudioFolderDataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also execute pylint and rst-lint on all these files and fix the errors reported
**https://mxnet.incubator.apache.org/install/ ** | ||
4. Librosa is installed. To install, follow the instructions here: | ||
**https://librosa.github.io/librosa/install.html** | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move these instructions to a README file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a README for the example. Thanks!
except ImportError as er: | ||
warnings.warn("Argument parsing module could not be imported and hence \ | ||
no arguments passed to the script can actually be parsed.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more concise warning would be good. Maybe it should fail on ImportError?
Also, what happens to argparse.ArgumentParser below when this import fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this fails, I pass some default arguments. There is a check which returns if directories are empty.
if args.epochs: | ||
epochs = args.epochs | ||
else: | ||
epochs = 35 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default number of epochs is 35 here but but 30 in train() above. this should be same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Thanks.
file_format: str, default '.wav' | ||
The format of the audio files(.wav, .mp3) | ||
skip_rows: int, default 0 | ||
While reading from csv file, how many rows to skip at the start of the file to aboid reading in header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo - 'avoid'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed this.
""" | ||
if not self._has_csv: | ||
self.synsets = [] | ||
self.items = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these 2 lines can be common as they appear in both the if and else part
The directory that contains the audio files on which predictions are to be made | ||
""" | ||
if not librosa: | ||
warnings.warn("Librosa dependency not installed! Cnnot load the audio to make predictions. Exitting.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some typos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected the typos.
x = x.asnumpy() | ||
specs = librosa.feature.melspectrogram(x, **self.kwargs) | ||
return nd.array(specs) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new line at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Making this consistent with the other scripts.
@mxnet-label-bot remove [pr-work-in-progress] |
@mxnet-label-bot add [pr-awaiting-review] |
To be able to run this example: | ||
|
||
1. Download the dataset(train.zip, test.zip) required for this example from the location: | ||
**https://drive.google.com/drive/folders/0By0bAi7hOBAFUHVXd1JCN3MwTEU** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this drive folder? Who owns this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is from an urban sounds challenge/contest hosted by https://datahack.analyticsvidhya.com/contest/practice-problem-urban-sound-classification/
It was a free registration contest. The data is hosted on an open Google drive link.
If needed these could be hosted on a public s3?
or
can we add steps as this:
/~https://github.com/apache/incubator-mxnet/blob/master/example/gluon/house_prices/kaggle_k_fold_cross_validation.py
example/gluon/urban_sounds/README.md
Outdated
3. Apache MXNet is installed on the machine. For instructions, go to the link: **https://mxnet.incubator.apache.org/install/** | ||
|
||
4. Librosa is installed. To install, follow the instructions here: | ||
**https://librosa.github.io/librosa/install.html** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be we can provide the pip command and avoid user going through another page.
pip install librosa
For more details refer here - https://librosa.github.io/librosa/install.html
@@ -0,0 +1,16 @@ | |||
# Urban Sounds classification in MXNet | |||
|
|||
Urban Sounds Dataset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be good to add what this dataset is and what is the problem being solved.
example/gluon/urban_sounds/model.py
Outdated
# under the License. | ||
|
||
""" | ||
This module builds a model an MLP woth a configurable output layer( number of units in the last layer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: woth -> with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should it be configurable? Different number of classes. can you make it more clear.
example/gluon/urban_sounds/model.py
Outdated
with net.name_scope(): | ||
net.add(gluon.nn.Dense(256, activation="relu")) # 1st layer (256 nodes) | ||
net.add(gluon.nn.Dense(256, activation="relu")) # 2nd hidden layer | ||
net.add(gluon.nn.Dense(num_labels)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why out of name_scope ?
return nd.array(X1), label | ||
|
||
else: | ||
warnings.warn(" Dependency librosa is not installed! \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these checks are already done in init?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since, user does not mandatorily need to use the audio module, these import failures cannot be raised as an exception. In init(), it issues a warning if librosa not found and returns. However, if get_item() is called, it will not be able to librosa load the audio to numpy arrays. These do not fail but issues a warning to the user, that librosa is needed for those functionalities. The module import of librosa is to be guarded as librosa is not supposed to be a hard dependency for MXNet.
What do you recommend? Should creation of AudioFolderDataset() error out if librosa not installed?
super(MFCC, self).__init__() | ||
|
||
def forward(self, x): | ||
if not librosa: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check is already done at the top
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Not all the transforms require librosa like Scaling and Pad/Trim. so, testing the availability of librosa in the methods that actually use it and warning the user that this particular transform cannot be applied. What do you recommend?
elif isinstance(x, nd.NDArray): | ||
y = x.asnumpy() | ||
else: | ||
warnings.warn("Input object is not numpy or NDArray... Cannot apply the transform: MFCC!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Messaging can be better
from ..... import ndarray as nd | ||
from ....block import Block | ||
|
||
class MFCC(Block): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these transforms are too specific to librosa. It takes librosa parameters as kwargs. What if we change backend from librosa to other library or implement them as mxnet operator?
Can we conceptualize these transforms and define the standard parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It makes sense to do what you have suggested. Some other frameworks pass along all the keywords arguments so this approach was chosen.
from common import with_seed | ||
|
||
|
||
@with_seed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add tests for the new audio folder dataset?
|
||
def predict(pred_dir='./Test'): | ||
""" | ||
The function is used to run predictions on the audio files in the directory `pred_directory` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check this line for rst_lint errors
|
||
def _list_audio_files(self, root, skip_rows=0): | ||
""" | ||
Populates synsets - a map of index to label for the data items. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check with rst_lint for errors here
filename = self.items[idx][0] | ||
label = self.items[idx][1] | ||
|
||
if librosa is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just if librosa
?
Retrieve the item (data, label) stored at idx in items | ||
""" | ||
filename = self.items[idx][0] | ||
label = self.items[idx][1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filename, label = self.items[idx]
?
|
||
def __len__(self): | ||
""" | ||
Retrieves the number of items in the dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check rst_lint here
Parameters | ||
---------- | ||
fn : callable | ||
A transformer function that takes the first elemtn of a sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo
class MFCC(Block): | ||
""" | ||
Extracts Mel frequency cepstrum coefficients from the audio data file | ||
More details : https://librosa.github.io/librosa/generated/librosa.feature.mfcc.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check rst-lint here
x = x.asnumpy() | ||
specs = librosa.feature.melspectrogram(x, **self.kwargs) | ||
return nd.array(specs) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: newline at the end
@with_seed() | ||
def test_pad_trim(): | ||
""" | ||
Function to test Pad/Trim Audio transform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check rst-lint here
@with_seed() | ||
def test_scale(): | ||
""" | ||
Function to test scaling of the audio transform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please execute pylint and rst-lint on all these files.
n_fft: int, default 2048 | ||
length of the Fast fourier transform window | ||
n_mels: int, | ||
number of mel bands to generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a default value?
class MEL(Block): | ||
"""Create MEL Spectrograms from a raw audio signal. Relatively pretty slow. | ||
|
||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rst expects class attributes to be under the "Attributes" section. "Parameters" for function params. Please double check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concerns about the current code are:
- not enough use cases to validate the API.
- API design not conforming to Gluon standards.
- implementation efficiency.
For now, I'd suggest putting these code as one of the Gluon examples to receive feedbacks. Also, make sure that code that isn't used in the example doesn't get checked in.
@szha
What can be the actionable feedback we can provide here to bring it to Contrib? |
I chatted with @gaurav-gireesh and agreed on having these classes in the examples first. Once two more models are added we can consider making it part of API. Right now there's lack of evidence that these thin wrappers of librosa won't become bottleneck for proper audio-related applications such as ASR, and the compromise to have this in examples is a way to collect more use cases. I'd suggest having DeepSpeech2 and 3 and at least one ASR model with proper WFST models as validation. @sandeep-krishnamurthy if you know better feel free to suggest. |
Description
Phase 1
As a user, I would like to have an out of the box feature of audio data loader and some popular audio transforms in MXNet, that would allow me :
Note: The plan is to have a working piece of this model with an example into the contrib package of MXNet before it is agreed upon to move this implementation to the gluon.data module.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments