Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 #17872

Merged
merged 4 commits into from
Apr 12, 2020

Conversation

xziya
Copy link
Contributor

@xziya xziya commented Mar 19, 2020

Description

Patch for the issue #17818. The rnn operator produces zero gradients for bias when num_layers > 1. It is caused by a mistake in calculating the shift of bias pointer, where we used the size of fusion bias (i2h_bias + h2h_bias) but MXNet gives twice (i2h_bias, h2h_bias) as many as the fusion size.

Checklist

Changes

  • Use the correct shift of bias pointer.
  • Change the way of sharing the same values of parameters of the fused RNN layers and the stacked one.
  • Add check for RNN output states.

@ciyongch @pengzhao-intel @TaoLv

@xziya xziya requested a review from szha as a code owner March 19, 2020 07:41
@ciyongch
Copy link
Contributor

There's still some failure in fusedlstm tests. Please take a check.

@stu1130
Copy link
Contributor

stu1130 commented Apr 7, 2020

@zixuanweeei Thanks for your contribution, could you also cherry-pick the commit to 1.7? DJL LSTM model depends on this commit. Thanks!

@pengzhao-intel
Copy link
Contributor

@zixuanweeei Thanks for your contribution, could you also cherry-pick the commit to 1.7? DJL LSTM model depends on this commit. Thanks!

Sure, please add this requirement in 1.7 roadmap #16864

@xziya
Copy link
Contributor Author

xziya commented Apr 11, 2020

CI passed. Please take a review. Thanks. @ciyongch @TaoLv @pengzhao-intel

Besides, we will backport this patch into v1.7 branch as well @stu1130.

@pengzhao-intel pengzhao-intel merged commit 7dd7e7e into apache:master Apr 12, 2020
xziya added a commit to xziya/mxnet that referenced this pull request Apr 13, 2020
…che#17872)

* Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1

* Use nd.copy() to initialize parameters of new operator

* Add check for output states

* Initialize i2h/h2h_weights with zeros for rnn_relu/tanh, and reduce size

* Split fused rnn layer test into tests of individual mode

* Skip lstm and gru tests on CPU context without DNNL
pengzhao-intel pushed a commit that referenced this pull request Apr 15, 2020
* Support projection feature for LSTM on CPU (Only Inference) (#17702)

* Support projection feature for LSTM on CPU

* test solution for -Werror=maybe-uninitialized

* Check device type when create state

* Document the projection feature of LSTM for RNN operator

* Minor fix

* Re-run CI

* Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 (#17872)

* Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1

* Use nd.copy() to initialize parameters of new operator

* Add check for output states

* Initialize i2h/h2h_weights with zeros for rnn_relu/tanh, and reduce size

* Split fused rnn layer test into tests of individual mode

* Skip lstm and gru tests on CPU context without DNNL
stu1130 pushed a commit to stu1130/incubator-mxnet that referenced this pull request Apr 15, 2020
…18038)

* Support projection feature for LSTM on CPU (Only Inference) (apache#17702)

* Support projection feature for LSTM on CPU

* test solution for -Werror=maybe-uninitialized

* Check device type when create state

* Document the projection feature of LSTM for RNN operator

* Minor fix

* Re-run CI

* Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 (apache#17872)

* Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1

* Use nd.copy() to initialize parameters of new operator

* Add check for output states

* Initialize i2h/h2h_weights with zeros for rnn_relu/tanh, and reduce size

* Split fused rnn layer test into tests of individual mode

* Skip lstm and gru tests on CPU context without DNNL
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants