Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add python wrapper for CTC greedy decoder and edit distance evaluator #7655

Merged
merged 13 commits into from
Jan 22, 2018
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions doc/api/v2/fluid/layers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -500,6 +500,11 @@ swish
.. autofunction:: paddle.v2.fluid.layers.swish
:noindex:

greedy_ctc_error
---------------
.. autofunction:: paddle.v2.fluid.layers.greedy_ctc_error
:noindex:

l2_normalize
------------
.. autofunction:: paddle.v2.fluid.layers.l2_normalize
Expand Down
84 changes: 71 additions & 13 deletions python/paddle/v2/fluid/layers/nn.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
'sequence_last_step',
'dropout',
'split',
'greedy_ctc_error',
'l2_normalize',
'matmul',
]
Expand Down Expand Up @@ -1721,37 +1722,37 @@ def l2_normalize(x, axis, epsilon=1e-12, name=None):

def matmul(x, y, transpose_x=False, transpose_y=False, name=None):
"""
Applies matrix multipication to two tensors. Currently only rank 1 to rank
Applies matrix multipication to two tensors. Currently only rank 1 to rank
3 input tensors are supported.

The actual behavior depends on the shapes of :math:`x`, :math:`y` and the
The actual behavior depends on the shapes of :math:`x`, :math:`y` and the
flag values of :attr:`transpose_x`, :attr:`transpose_y`. Specifically:

- If a transpose flag is specified, the last two dimensions of the tensor
are transposed. If the tensor is rank-1 of shape :math:`[D]`, then for
:math:`x` it is treated as :math:`[1, D]` in nontransposed form and as
:math:`[D, 1]` in transposed form, whereas for :math:`y` it is the
opposite: It is treated as :math:`[D, 1]` in nontransposed form and as
- If a transpose flag is specified, the last two dimensions of the tensor
are transposed. If the tensor is rank-1 of shape :math:`[D]`, then for
:math:`x` it is treated as :math:`[1, D]` in nontransposed form and as
:math:`[D, 1]` in transposed form, whereas for :math:`y` it is the
opposite: It is treated as :math:`[D, 1]` in nontransposed form and as
:math:`[1, D]` in transposed form.

- After transpose, the two tensors are 2-D or 3-D and matrix multipication
- After transpose, the two tensors are 2-D or 3-D and matrix multipication
performs in the following way.

- If both are 2-D, they are multiplied like conventional matrices.
- If either is 3-D, it is treated as a stack of matrices residing in the
last two dimensions and a batched matrix multiply supporting broadcast
- If either is 3-D, it is treated as a stack of matrices residing in the
last two dimensions and a batched matrix multiply supporting broadcast
applies on the two tensors.

Also note that if the raw tensor :math:`x` or :math:`y` is rank-1 and
nontransposed, the prepended or appended dimension :math:`1` will be
Also note that if the raw tensor :math:`x` or :math:`y` is rank-1 and
nontransposed, the prepended or appended dimension :math:`1` will be
removed after matrix multipication.

Args:
x (Variable): The input variable which is a Tensor or LoDTensor.
y (Variable): The input variable which is a Tensor or LoDTensor.
transpose_x (bool): Whether to transpose :math:`x` before multiplication.
transpose_y (bool): Whether to transpose :math:`y` before multiplication.
name(str|None): A name for this layer(optional). If set None, the layer
name(str|None): A name for this layer(optional). If set None, the layer
will be named automatically.

Returns:
Expand Down Expand Up @@ -1788,3 +1789,60 @@ def matmul(x, y, transpose_x=False, transpose_y=False, name=None):
attrs={'transpose_X': transpose_x,
'transpose_Y': transpose_y})
return out


def greedy_ctc_error(input, label, blank, normalized=False, name=None):
"""
This evaluator is to calculate sequence-to-sequence edit distance.

Args:

input(Variable): (LodTensor, default: LoDTensor<float>), the unscaled probabilities of variable-length sequences, which is a 2-D Tensor with LoD information. It's shape is [Lp, num_classes + 1], where Lp is the sum of all input sequences' length and num_classes is the true number of classes. (not including the blank label).

label(Variable): (LodTensor, default: LoDTensor<int>), the ground truth of variable-length sequence, which is a 2-D Tensor with LoD information. It is of the shape [Lg, 1], where Lg is th sum of all labels' length.

blank(int): the blank label index of Connectionist Temporal Classification (CTC) loss, which is in thehalf-opened interval [0, num_classes + 1).

normalized(bool): Indicated whether to normalize the edit distance by the length of reference string.

Returns:
Variable: sequence-to-sequence edit distance loss in shape [batch_size, 1].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove loss.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. Done.


Examples:
.. code-block:: python

x = fluid.layers.data(name='x', shape=[8], dtype='float32')
y = fluid.layers.data(name='y', shape=[1], dtype='float32')

cost = fluid.layers.greedy_ctc_error(input=x,label=y, blank=0)
"""
helper = LayerHelper("greedy_ctc_error", **locals())
# top 1 op
topk_out = helper.create_tmp_variable(dtype=input.dtype)
topk_indices = helper.create_tmp_variable(dtype="int64")
helper.append_op(
type="top_k",
inputs={"X": [input]},
outputs={"Out": [topk_out],
"Indices": [topk_indices]},
attrs={"k": 1})

# ctc align op
ctc_out = helper.create_tmp_variable(dtype="int64")
helper.append_op(
type="ctc_align",
inputs={"Input": [topk_indices]},
outputs={"Output": [ctc_out]},
attrs={"merge_repeated": True,
"blank": blank})

# edit distance op
edit_distance_out = helper.create_tmp_variable(dtype="int64")
helper.append_op(
type="edit_distance",
inputs={"Hyps": [ctc_out],
"Refs": [label]},
outputs={"Out": [edit_distance_out]},
attrs={"normalized": normalized})

return edit_distance_out