vocabulary should contain blank to make the decoder API easy to use. #254

chenjiasheng · 2017-09-15T03:12:17Z

models/deep_speech_2/model_utils/decoder.py

Line 87 in 8b5c739

raise ValueError("The shape of prob_seq does not match with the "

If vocabulary dosen't contain blank, we have:

vocabulary=['a', 'b', ..., 'z', ' ', '.']
len(vocabulary) = 28
len(prob_list) = 29

Then at line 135,

new_char = vocabulary[c]

here c ranges from 0 to len(prob_list), but vocabulary[28] causes index out of range.

To use this decoder, one have to make the vocabulary a dict indexed from 1 instead of a list indexed from 0, in spite of that the doc suggests vocabulary to be a list.

    :param vocabulary: Vocabulary list.
    :type vocabulary: list

This is really confusing.

@xinghai-sun
@kuke

The text was updated successfully, but these errors were encountered:

kuke · 2017-09-15T15:59:00Z

@chenjiasheng Thanks for your attention!

This wouldn't happen because we in fact have enforced blank_id equal to len(vocabulary). And blank_id shouldn't be passed as an argument to the decoder, we have fixed that in the C++ decoders and will fix the python version very soon.

chenjiasheng · 2017-09-18T02:20:00Z

@kuke OK, thanks for the reply.

chenjiasheng closed this as completed Sep 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vocabulary should contain blank to make the decoder API easy to use. #254

vocabulary should contain blank to make the decoder API easy to use. #254

chenjiasheng commented Sep 15, 2017 •

edited

Loading

kuke commented Sep 15, 2017 •

edited

Loading

chenjiasheng commented Sep 18, 2017

vocabulary should contain blank to make the decoder API easy to use. #254

vocabulary should contain blank to make the decoder API easy to use. #254

Comments

chenjiasheng commented Sep 15, 2017 • edited Loading

kuke commented Sep 15, 2017 • edited Loading

chenjiasheng commented Sep 18, 2017

chenjiasheng commented Sep 15, 2017 •

edited

Loading

kuke commented Sep 15, 2017 •

edited

Loading