-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Variable sequence length not handled correctly for BiDirectional layers #19323
Comments
Hi @zjost currently CPU does not support Your solution to use
For now, my suggestion would be to either use |
To be clear, this is a problem for us because we use SageMaker and it makes it such that the same record gets different scores when it's invoked via the endpoint as a single record vs in a Batch Transform job. Running Batch Transform 1 record at a time takes way too long and we can't control how SageMaker splits the batches. Also, any comment on point 1, about how this only seems to work with LSTM, not RNN/GRU, even when using correct cuDNN? |
Regarding point 1. it seems that for now only LSTM supports I understand your concerns about performance of using batches of 1. If you want to get correct results of bidirectional RNN layers while running bigger batches you can create BidirectionalRNN layer by yourself with two RNN layers and concat after. Example of such layer: ctx = [mx.cpu()]
class CustomBidirectionalRNNLayer(gluon.nn.HybridBlock):
def __init__(self, hidden_size):
super(CustomBidirectionalRNNLayer, self).__init__(prefix="bidir_rnn_")
with self.name_scope():
self.rnn_l2r = gluon.rnn.LSTM(hidden_size=hidden_size, bidirectional=False, prefix='l2r',
h2h_bias_initializer='one', i2h_bias_initializer='one')
self.rnn_r2l = gluon.rnn.LSTM(hidden_size=hidden_size, bidirectional=False, prefix='r2l',
h2h_bias_initializer='one', i2h_bias_initializer='one')
def hybrid_forward(self, F, x, x_len):
l2r_out = self.rnn_l2r(x)
r2l_out = self.rnn_r2l(F.SequenceReverse(x, sequence_length=x_len, use_sequence_length=True))
out = F.concat(l2r_out, r2l_out, dim=2)
return out
class TestModel(gluon.nn.HybridBlock):
def __init__(self):
super(TestModel, self).__init__(prefix="TestModel_")
with self.name_scope():
self.bidir_rnn = CustomBidirectionalRNNLayer(hidden_size=1)
def hybrid_forward(self, F, x, x_len):
x = x.expand_dims(2) # add a feature dimension
x = x.transpose((1, 0, 2)) # to make in (max_sequence_length, batch_size, other_feature_dims)
out = self.bidir_rnn(x, x_len)
out = F.SequenceLast(out, sequence_length=x_len, use_sequence_length=True)
return out
net = TestModel()
net.initialize(mx.init.Xavier(), ctx=ctx, force_reinit=True)
pad_val = 0
example_codes = [[1,2], [1,pad_val]]
example_len = [2,1]
x_input = mx.nd.array(example_codes).as_in_context(ctx[0])
x_len_input = mx.nd.array(example_len).as_in_context(ctx[0])
mx.random.seed(0)
# Original
out1 = net(x_input, x_len_input)
# Extra padding on first token
x_input2 = mx.nd.array([k+[pad_val] for k in example_codes]).as_in_context(ctx[0])
out2 = net(x_input2, x_len_input) This solution also solves point 1. because |
Thanks for taking the time to show an implementation of this. Do you think a warning should be added to the documentation regarding the use of Regarding #14208, I'm not sure why this fails for other RNN types since the code changes appear to be primarily to the |
Absolutely there should be some note in the documentation about I will try to explain you the problem why changing BTW sorry, I've made a little mistake with my class CustomBidirectionalRNNLayer(gluon.nn.HybridBlock):
def __init__(self, hidden_size):
super(CustomBidirectionalRNNLayer, self).__init__(prefix="bidir_rnn_")
with self.name_scope():
self.rnn_l2r = gluon.rnn.LSTM(hidden_size=hidden_size, bidirectional=False, prefix='l2r',
h2h_bias_initializer='one', i2h_bias_initializer='one')
self.rnn_r2l = gluon.rnn.LSTM(hidden_size=hidden_size, bidirectional=False, prefix='r2l',
h2h_bias_initializer='one', i2h_bias_initializer='one')
def hybrid_forward(self, F, x, x_len):
l2r_out = self.rnn_l2r(x)
r2l_out = self.rnn_r2l(F.SequenceReverse(x, sequence_length=x_len, use_sequence_length=True))
out = F.concat(l2r_out, F.SequenceReverse(r2l_out, sequence_length=x_len, use_sequence_length=True), dim=2)
return out For now I'll prepare a PR so the code fails when run with |
…ength=True (apache#19466) This PR is addressing apache#19323. I've added additional check for use_sequence_length parameter when choosing kernel to run. oneDNN does not support variable sequence length so the code right now raises an error.
…ength=True (apache#19466) This PR is addressing apache#19323. I've added additional check for use_sequence_length parameter when choosing kernel to run. oneDNN does not support variable sequence length so the code right now raises an error.
…ength=True (apache#19466) This PR is addressing apache#19323. I've added additional check for use_sequence_length parameter when choosing kernel to run. oneDNN does not support variable sequence length so the code right now raises an error.
Description
There are a couple of different issues related to the use of
use_sequence_length
in the_RNNLayer
.To Reproduce
Steps to reproduce
Run the above code with CPU and GPU context and observe the output of the second column (i.e. from the backward LSTM cell).
What have you tried to solve it?
I've found that if you use
x = F.SequenceMask(x, sequence_length=x_len, use_sequence_length=True)
before passing to the RNN, the outputs match. This might suggest that the CPU implementation reverses the entire padded sequence for the backward LSTM cell, rather than just reversing the firstx_len
elements.Note: I suspect #14208 is relevant given that the intended behavior works only for GPU/LSTM
Environment
Environment Information
The text was updated successfully, but these errors were encountered: