Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optimized decoder for the deployment of DS2 #139

Merged
merged 48 commits into from
Sep 18, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
724b0fb
add initial files for deployment
Jun 29, 2017
348d6bb
add deploy.py
Jun 29, 2017
59b4b87
Merge branch 'develop' into ctc_decoder_deploy
Jun 29, 2017
a506198
fix bugs
Jul 4, 2017
903c300
Merge branch 'develop' into ctc_decoder_deploy
Jul 5, 2017
1ca3814
code cleanup for the deployment decoder
Jul 6, 2017
34f98e0
add setup and README for deployment
Jul 6, 2017
ae05535
enable loading language model in multiple format
Jul 10, 2017
4e5b345
change probs' computation into log scale & add best path decoder
Jul 27, 2017
908932f
refine the interface of decoders in swig
Aug 3, 2017
ac3a49c
Delete swig_decoder.py
kuke Aug 3, 2017
9ff48b0
reorganize cpp files
Aug 22, 2017
32047c7
refine wrapper for swig and simplify setup
Aug 22, 2017
f41375b
add the support of parallel beam search decoding in deployment
Aug 23, 2017
a96c650
Refactor scorer and move utility functions to decoder_util.h
pkuyym Aug 23, 2017
3441148
Merge branch 'ctc_decoder_deploy' of /~https://github.com/kuke/models i…
pkuyym Aug 23, 2017
89c4a96
Make setup.py to support parallel processing.
pkuyym Aug 23, 2017
bbbc988
adapt to the last three commits
Aug 23, 2017
d68732b
convert data structure for prefix from map to trie tree
Aug 24, 2017
955d293
enable finite-state transducer in beam search decoding
Aug 29, 2017
20d13a4
streamline source code
Aug 29, 2017
09f4c6e
remove unused functions in Scorer
Aug 29, 2017
b5c4d83
add min cutoff & top n cutoff
Aug 30, 2017
202a06a
Merge branch 'develop' of /~https://github.com/PaddlePaddle/models into…
Aug 30, 2017
beb0c07
clean up code & update README for decoder in deployment
Aug 30, 2017
103a6ac
format C++ source code
Sep 6, 2017
efc5d9b
Merge branch 'develop' of /~https://github.com/PaddlePaddle/models into…
Sep 7, 2017
5a318e9
adapt to the new folder structure of DS2
Sep 8, 2017
f8c7d46
format header includes & update setup info
Sep 8, 2017
e49f505
resolve conflicts in model.py
Sep 8, 2017
d75f27d
Merge branch 'develop' of /~https://github.com/PaddlePaddle/models into…
Sep 8, 2017
41e9e59
append some comments
Sep 8, 2017
c4bc822
adapt to the new structure
Sep 13, 2017
52a862d
add __init__.py in decoders/swig
Sep 13, 2017
552dd52
move deprecated decoders
Sep 14, 2017
0bda37c
refine by following review comments
Sep 15, 2017
902c35b
append some changes
Sep 15, 2017
bb35363
Merge branch 'develop' of upstream into ctc_decoder_deploy
Sep 16, 2017
15728d0
expose param cutoff_top_n
Sep 16, 2017
e6740af
adjust scorer's init & add logging for scorer & separate long functions
Sep 17, 2017
8c5576d
format varabiables' name & add more comments
Sep 17, 2017
bcc236e
Merge branch 'develop' of upstream into ctc_decoder_deploy
Sep 18, 2017
98d35b9
adjust to pass ci
Sep 18, 2017
d7a9752
specify clang_format to ver3.9
Sep 18, 2017
cc2f91f
disable the make output of libsndfile in setup
Sep 18, 2017
f1cd672
use cd instead of pushd in setup.sh
Sep 18, 2017
cfecaa8
Merge branch 'develop' of upstream into ctc_decoder_deploy
Sep 18, 2017
9db0d25
pass unittest for deprecated decoders
Sep 18, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .clang_format.hook
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env bash
set -e

readonly VERSION="3.8"
readonly VERSION="3.9"

version=$(clang-format -version)

Expand Down
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ def ctc_greedy_decoder(probs_seq, vocabulary):
def ctc_beam_search_decoder(probs_seq,
beam_size,
vocabulary,
blank_id,
cutoff_prob=1.0,
cutoff_top_n=40,
ext_scoring_func=None,
nproc=False):
"""CTC Beam search decoder.
Expand All @@ -66,8 +66,6 @@ def ctc_beam_search_decoder(probs_seq,
:type beam_size: int
:param vocabulary: Vocabulary list.
:type vocabulary: list
:param blank_id: ID of blank.
:type blank_id: int
:param cutoff_prob: Cutoff probability in pruning,
default 1.0, no pruning.
:type cutoff_prob: float
Expand All @@ -87,9 +85,8 @@ def ctc_beam_search_decoder(probs_seq,
raise ValueError("The shape of prob_seq does not match with the "
"shape of the vocabulary.")

# blank_id check
if not blank_id < len(probs_seq[0]):
raise ValueError("blank_id shouldn't be greater than probs dimension")
# blank_id assign
blank_id = len(vocabulary)

# If the decoder called in the multiprocesses, then use the global scorer
# instantiated in ctc_beam_search_decoder_batch().
Expand All @@ -114,14 +111,15 @@ def ctc_beam_search_decoder(probs_seq,
prob_idx = list(enumerate(probs_seq[time_step]))
cutoff_len = len(prob_idx)
#If pruning is enabled
if cutoff_prob < 1.0:
if cutoff_prob < 1.0 or cutoff_top_n < cutoff_len:
prob_idx = sorted(prob_idx, key=lambda asd: asd[1], reverse=True)
cutoff_len, cum_prob = 0, 0.0
for i in xrange(len(prob_idx)):
cum_prob += prob_idx[i][1]
cutoff_len += 1
if cum_prob >= cutoff_prob:
break
cutoff_len = min(cutoff_len, cutoff_top_n)
prob_idx = prob_idx[0:cutoff_len]

for l in prefix_set_prev:
Expand Down Expand Up @@ -191,9 +189,9 @@ def ctc_beam_search_decoder(probs_seq,
def ctc_beam_search_decoder_batch(probs_split,
beam_size,
vocabulary,
blank_id,
num_processes,
cutoff_prob=1.0,
cutoff_top_n=40,
ext_scoring_func=None):
"""CTC beam search decoder using multiple processes.

Expand All @@ -204,8 +202,6 @@ def ctc_beam_search_decoder_batch(probs_split,
:type beam_size: int
:param vocabulary: Vocabulary list.
:type vocabulary: list
:param blank_id: ID of blank.
:type blank_id: int
:param num_processes: Number of parallel processes.
:type num_processes: int
:param cutoff_prob: Cutoff probability in pruning,
Expand All @@ -232,8 +228,8 @@ def ctc_beam_search_decoder_batch(probs_split,
pool = multiprocessing.Pool(processes=num_processes)
results = []
for i, probs_list in enumerate(probs_split):
args = (probs_list, beam_size, vocabulary, blank_id, cutoff_prob, None,
nproc)
args = (probs_list, beam_size, vocabulary, cutoff_prob, cutoff_top_n,
None, nproc)
results.append(pool.apply_async(ctc_beam_search_decoder, args))

pool.close()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import numpy as np


class LmScorer(object):
class Scorer(object):
"""External scorer to evaluate a prefix or whole sentence in
beam search decoding, including the score from n-gram language
model and word count.
Expand Down
Empty file.
204 changes: 204 additions & 0 deletions deep_speech_2/decoders/swig/ctc_beam_search_decoder.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
#include "ctc_beam_search_decoder.h"

#include <algorithm>
#include <cmath>
#include <iostream>
#include <limits>
#include <map>
#include <utility>

#include "ThreadPool.h"
#include "fst/fstlib.h"

#include "decoder_utils.h"
#include "path_trie.h"

using FSTMATCH = fst::SortedMatcher<fst::StdVectorFst>;

std::vector<std::pair<double, std::string>> ctc_beam_search_decoder(
const std::vector<std::vector<double>> &probs_seq,
const std::vector<std::string> &vocabulary,
size_t beam_size,
double cutoff_prob,
size_t cutoff_top_n,
Scorer *ext_scorer) {
// dimension check
size_t num_time_steps = probs_seq.size();
for (size_t i = 0; i < num_time_steps; ++i) {
VALID_CHECK_EQ(probs_seq[i].size(),
vocabulary.size() + 1,
"The shape of probs_seq does not match with "
"the shape of the vocabulary");
}

// assign blank id
size_t blank_id = vocabulary.size();

// assign space id
auto it = std::find(vocabulary.begin(), vocabulary.end(), " ");
int space_id = it - vocabulary.begin();
// if no space in vocabulary
if ((size_t)space_id >= vocabulary.size()) {
space_id = -2;
}

// init prefixes' root
PathTrie root;
root.score = root.log_prob_b_prev = 0.0;
std::vector<PathTrie *> prefixes;
prefixes.push_back(&root);

if (ext_scorer != nullptr && !ext_scorer->is_character_based()) {
auto fst_dict = static_cast<fst::StdVectorFst *>(ext_scorer->dictionary);
fst::StdVectorFst *dict_ptr = fst_dict->Copy(true);
root.set_dictionary(dict_ptr);
auto matcher = std::make_shared<FSTMATCH>(*dict_ptr, fst::MATCH_INPUT);
root.set_matcher(matcher);
}

// prefix search over time
for (size_t time_step = 0; time_step < num_time_steps; ++time_step) {
auto &prob = probs_seq[time_step];

float min_cutoff = -NUM_FLT_INF;
bool full_beam = false;
if (ext_scorer != nullptr) {
size_t num_prefixes = std::min(prefixes.size(), beam_size);
std::sort(
prefixes.begin(), prefixes.begin() + num_prefixes, prefix_compare);
min_cutoff = prefixes[num_prefixes - 1]->score +
std::log(prob[blank_id]) - std::max(0.0, ext_scorer->beta);
full_beam = (num_prefixes == beam_size);
}

std::vector<std::pair<size_t, float>> log_prob_idx =
get_pruned_log_probs(prob, cutoff_prob, cutoff_top_n);
// loop over chars
for (size_t index = 0; index < log_prob_idx.size(); index++) {
auto c = log_prob_idx[index].first;
auto log_prob_c = log_prob_idx[index].second;

for (size_t i = 0; i < prefixes.size() && i < beam_size; ++i) {
auto prefix = prefixes[i];
if (full_beam && log_prob_c + prefix->score < min_cutoff) {
break;
}
// blank
if (c == blank_id) {
prefix->log_prob_b_cur =
log_sum_exp(prefix->log_prob_b_cur, log_prob_c + prefix->score);
continue;
}
// repeated character
if (c == prefix->character) {
prefix->log_prob_nb_cur = log_sum_exp(
prefix->log_prob_nb_cur, log_prob_c + prefix->log_prob_nb_prev);
}
// get new prefix
auto prefix_new = prefix->get_path_trie(c);

if (prefix_new != nullptr) {
float log_p = -NUM_FLT_INF;

if (c == prefix->character &&
prefix->log_prob_b_prev > -NUM_FLT_INF) {
log_p = log_prob_c + prefix->log_prob_b_prev;
} else if (c != prefix->character) {
log_p = log_prob_c + prefix->score;
}

// language model scoring
if (ext_scorer != nullptr &&
(c == space_id || ext_scorer->is_character_based())) {
PathTrie *prefix_toscore = nullptr;
// skip scoring the space
if (ext_scorer->is_character_based()) {
prefix_toscore = prefix_new;
} else {
prefix_toscore = prefix;
}

double score = 0.0;
std::vector<std::string> ngram;
ngram = ext_scorer->make_ngram(prefix_toscore);
score = ext_scorer->get_log_cond_prob(ngram) * ext_scorer->alpha;
log_p += score;
log_p += ext_scorer->beta;
}
prefix_new->log_prob_nb_cur =
log_sum_exp(prefix_new->log_prob_nb_cur, log_p);
}
} // end of loop over prefix
} // end of loop over vocabulary

prefixes.clear();
// update log probs
root.iterate_to_vec(prefixes);

// only preserve top beam_size prefixes
if (prefixes.size() >= beam_size) {
std::nth_element(prefixes.begin(),
prefixes.begin() + beam_size,
prefixes.end(),
prefix_compare);
for (size_t i = beam_size; i < prefixes.size(); ++i) {
prefixes[i]->remove();
}
}
} // end of loop over time

// compute aproximate ctc score as the return score, without affecting the
// return order of decoding result. To delete when decoder gets stable.
for (size_t i = 0; i < beam_size && i < prefixes.size(); ++i) {
double approx_ctc = prefixes[i]->score;
if (ext_scorer != nullptr) {
std::vector<int> output;
prefixes[i]->get_path_vec(output);
auto prefix_length = output.size();
auto words = ext_scorer->split_labels(output);
// remove word insert
approx_ctc = approx_ctc - prefix_length * ext_scorer->beta;
// remove language model weight:
approx_ctc -= (ext_scorer->get_sent_log_prob(words)) * ext_scorer->alpha;
}
prefixes[i]->approx_ctc = approx_ctc;
}

return get_beam_search_result(prefixes, vocabulary, beam_size);
}


std::vector<std::vector<std::pair<double, std::string>>>
ctc_beam_search_decoder_batch(
const std::vector<std::vector<std::vector<double>>> &probs_split,
const std::vector<std::string> &vocabulary,
size_t beam_size,
size_t num_processes,
double cutoff_prob,
size_t cutoff_top_n,
Scorer *ext_scorer) {
VALID_CHECK_GT(num_processes, 0, "num_processes must be nonnegative!");
// thread pool
ThreadPool pool(num_processes);
// number of samples
size_t batch_size = probs_split.size();

// enqueue the tasks of decoding
std::vector<std::future<std::vector<std::pair<double, std::string>>>> res;
for (size_t i = 0; i < batch_size; ++i) {
res.emplace_back(pool.enqueue(ctc_beam_search_decoder,
probs_split[i],
vocabulary,
beam_size,
cutoff_prob,
cutoff_top_n,
ext_scorer));
}

// get decoding results
std::vector<std::vector<std::pair<double, std::string>>> batch_results;
for (size_t i = 0; i < batch_size; ++i) {
batch_results.emplace_back(res[i].get());
}
return batch_results;
}
61 changes: 61 additions & 0 deletions deep_speech_2/decoders/swig/ctc_beam_search_decoder.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#ifndef CTC_BEAM_SEARCH_DECODER_H_
#define CTC_BEAM_SEARCH_DECODER_H_

#include <string>
#include <utility>
#include <vector>

#include "scorer.h"

/* CTC Beam Search Decoder

* Parameters:
* probs_seq: 2-D vector that each element is a vector of probabilities
* over vocabulary of one time step.
* vocabulary: A vector of vocabulary.
* beam_size: The width of beam search.
* cutoff_prob: Cutoff probability for pruning.
* cutoff_top_n: Cutoff number for pruning.
* ext_scorer: External scorer to evaluate a prefix, which consists of
* n-gram language model scoring and word insertion term.
* Default null, decoding the input sample without scorer.
* Return:
* A vector that each element is a pair of score and decoding result,
* in desending order.
*/
std::vector<std::pair<double, std::string>> ctc_beam_search_decoder(
const std::vector<std::vector<double>> &probs_seq,
const std::vector<std::string> &vocabulary,
size_t beam_size,
double cutoff_prob = 1.0,
size_t cutoff_top_n = 40,
Scorer *ext_scorer = nullptr);

/* CTC Beam Search Decoder for batch data

* Parameters:
* probs_seq: 3-D vector that each element is a 2-D vector that can be used
* by ctc_beam_search_decoder().
* vocabulary: A vector of vocabulary.
* beam_size: The width of beam search.
* num_processes: Number of threads for beam search.
* cutoff_prob: Cutoff probability for pruning.
* cutoff_top_n: Cutoff number for pruning.
* ext_scorer: External scorer to evaluate a prefix, which consists of
* n-gram language model scoring and word insertion term.
* Default null, decoding the input sample without scorer.
* Return:
* A 2-D vector that each element is a vector of beam search decoding
* result for one audio sample.
*/
std::vector<std::vector<std::pair<double, std::string>>>
ctc_beam_search_decoder_batch(
const std::vector<std::vector<std::vector<double>>> &probs_split,
const std::vector<std::string> &vocabulary,
size_t beam_size,
size_t num_processes,
double cutoff_prob = 1.0,
size_t cutoff_top_n = 40,
Scorer *ext_scorer = nullptr);

#endif // CTC_BEAM_SEARCH_DECODER_H_
Loading