Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[Refactor] TransformerDecoder #976

Merged
merged 3 commits into from
Oct 29, 2019
Merged

[Refactor] TransformerDecoder #976

merged 3 commits into from
Oct 29, 2019

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Oct 19, 2019

Description

In the current Gluon API, each HybridBlock has to serve one puropse and can only
define a single callable interface. Previous Seq2SeqDecoder interface required
each Seq2SeqDecoder Block to perform two functionalities (multi-step ahead and
single-step ahead decoding). This means neither of the two functionalities can
in practice be hybridized completely. Thus use two separate Blocks for the two
functionalities. They may share parameters.

Update the NMTModel API accordingly.

Further refactor TransformerDecoder to make it completely hybridizable.
TransformerOneStepDecoder still relies on a small hack but can be hybridized
completely when we enable numpy shape semantics.

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Refactor TransformerDecoder

Comments

cc @dmlc/gluon-nlp-team

@leezu leezu requested review from szhengac and sxjscience October 19, 2019 00:56
@leezu leezu requested a review from a team as a code owner October 19, 2019 00:56
@codecov
Copy link

codecov bot commented Oct 19, 2019

Codecov Report

Merging #976 into master will increase coverage by 8.52%.
The diff coverage is 93.15%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #976      +/-   ##
==========================================
+ Coverage   81.49%   90.01%   +8.52%     
==========================================
  Files          67       67              
  Lines        6392     6368      -24     
==========================================
+ Hits         5209     5732     +523     
+ Misses       1183      636     -547
Impacted Files Coverage Δ
src/gluonnlp/model/translation.py 71.87% <100%> (+51.24%) ⬆️
src/gluonnlp/model/train/__init__.py 100% <100%> (+25%) ⬆️
src/gluonnlp/model/seq2seq_encoder_decoder.py 75.38% <83.33%> (+0.38%) ⬆️
src/gluonnlp/model/transformer.py 93.48% <93.65%> (+11.69%) ⬆️
src/gluonnlp/data/glue.py 98.63% <0%> (+1.81%) ⬆️
src/gluonnlp/model/train/embedding.py 87.17% <0%> (+2.56%) ⬆️
src/gluonnlp/embedding/token_embedding.py 92.01% <0%> (+2.83%) ⬆️
src/gluonnlp/vocab/elmo.py 96.66% <0%> (+3.33%) ⬆️
src/gluonnlp/data/word_embedding_evaluation.py 92.39% <0%> (+3.42%) ⬆️
... and 13 more

@pytest.mark.parametrize('use_residual', [False, True])
@pytest.mark.parametrize('batch_size', [4])
@pytest.mark.parametrize('src_tgt_seq_len', [(5, 10), (10, 5)])
def test_transformer_encoder_decoder(output_attention, use_residual, batch_size, src_tgt_seq_len):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes to this test. Only refactoring to use pytest.mark.parametrize

@leezu leezu force-pushed the refactortransformer branch from 2b3c537 to 8a9be69 Compare October 19, 2019 01:57
@leezu
Copy link
Contributor Author

leezu commented Oct 21, 2019

This is currently blocked by the issue that Block.load_parameters internally uses _collect_params_with_prefix and expects shared parameters to be serialized twice. Need to submit a patch to mxnet master first.

@leezu
Copy link
Contributor Author

leezu commented Oct 22, 2019

Will be unblocked once apache/mxnet#16582 is merged

@leezu leezu added the release focus Progress focus for release label Oct 25, 2019
@leezu leezu force-pushed the refactortransformer branch 3 times, most recently from 4520484 to d7f8e90 Compare October 27, 2019 18:50
@mli
Copy link
Member

mli commented Oct 28, 2019

Job PR-976/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-976/7/index.html

@mli
Copy link
Member

mli commented Oct 28, 2019

Job PR-976/8 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-976/8/index.html

@mli
Copy link
Member

mli commented Oct 28, 2019

Job PR-976/9 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-976/9/index.html

@szha
Copy link
Member

szha commented Oct 28, 2019

@szhengac @sxjscience ping for review on the API.

leezu added 2 commits October 28, 2019 18:32
In the current Gluon API, each HybridBlock has to serve one puropse and can only
define a single callable interface. Previous Seq2SeqDecoder interface required
each Seq2SeqDecoder Block to perform two functionalities (multi-step ahead and
single-step ahead decoding). This means neither of the two functionalities can
in practice be hybridized completely. Thus use two separate Blocks for the two
functionalities. They may share parameters.

Update the NMTModel API accordingly.

Further refactor TransformerDecoder to make it completely hybridizable.
TransformerOneStepDecoder still relies on a small hack but can be hybridized
completely when we enable numpy shape semantics.
@leezu leezu force-pushed the refactortransformer branch from 87c1566 to 46f5b64 Compare October 28, 2019 18:55
@mli
Copy link
Member

mli commented Oct 28, 2019

Job PR-976/10 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-976/10/index.html

@leezu leezu requested a review from szhengac October 28, 2019 23:44
@mli
Copy link
Member

mli commented Oct 29, 2019

Job PR-976/11 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-976/11/index.html

mem_value, mem_mask = states
if mem_mask is not None:
mem_mask = F.expand_dims(mem_mask, axis=1)
mem_mask = F.broadcast_like(mem_mask, inputs, lhs_axes=(1, ), rhs_axes=(1, ))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as in xingjian's slides, we can extract the code of generating mask into a fn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do it in a follow-up PR as it will extend the attention_cell API

@sxjscience
Copy link
Member

I suggest making the change inside a branch for EMNLP. Changing the API just for a tutorial is not the appropriate working style.

@sxjscience
Copy link
Member

sxjscience commented Oct 29, 2019

After offline discussion with the other members, I decide to approve it now. However, these are experimental APIs and we should discuss about that.

@leezu leezu merged commit 57a45aa into dmlc:master Oct 29, 2019
@leezu leezu deleted the refactortransformer branch October 29, 2019 21:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API change release focus Progress focus for release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants