Composed Sequence to Sequence Abstraction #2913

sai-prasanna · 2019-05-31T09:49:56Z

This completes the work started by @generall #2517 to decompose seq2seq encoder_decoder models.

Explanation

ComposedSeq2Seq is the class which composes older Seq2SeqEncoder and the newly defined SeqDecoder.
SeqDecoder is an abstract class upon which Decoder implementations are extended upon
AutoRegressiveSeqDecoder is the default implementation for SeqDecoder. It composes DecoderNet. The decoder net can implemented by anything from a transformer to a LSTMCell. In case of implementations like transformer which support parallel decoding (for training), we have a decodes_parallel flag. The AutoRegressiveSeqDecoder uses this flag to exploit single forward pass decoding in case of teacher forcing.

Questions

Can the current items such as the embedder in SeqDecoder can be moved to the default autoregressive decoder implementation?
Are the current tests sufficient?
I am reusing the harvard Transformer parts in the private API to implement StackedSelfAttentionDecoderNet

Help

~~Need help testing the transformer module on actual data. I am getting NAN errors during validation phase, while running beam search.~~ Fixed

Solves #2097

* Simple Seq2Seq decoder implemented * Seq2Seq decoder integrated into SimpleSeq2Seq model * Decoder accepts the whole decoding step history ToDo: * Write tests * Implement Attention Transformer decoder as demonstration

…ple_seq2sseq and CopyNet

…ecoder-updates

…qDecoder

…and methods. Refactor is_sequential flag to decodes_parallel flag.

regularizer to ComposedSeq2Seq

sai-prasanna · 2019-06-07T13:33:03Z

@brendan-ai2 @matt-gardner

matt-gardner · 2019-06-07T17:32:21Z

@brendan-ai2, can I get you to take this one?

brendan-ai2 · 2019-06-07T17:48:15Z

@matt-gardner, @sai-prasanna I can take a look, but I'm out from today until next Thursday (6/13). So it would be somewhat delayed.

sai-prasanna · 2019-06-08T18:51:07Z

@matt-gardner @brendan-ai2 Thought of wrapping this up this weekend. Thursday will be fine I guess. Since y'all are busy doing cool research (a bit jealous :p).

A bit of good news, the bug in my transformer decoder is fixed.
During training mutating the embeddings didn't increase the loss too much, but nans popped up only during validation set inference using the beam search. Turned out that NaNs were due to "*=" in-place operator mutating the embeddings directly 😒 . During beam search, since the same embeddings are called repeatedly in the transformer decoder, it resulted in the bug having effect.

Only the review of abstractions is left, I guess.

In place operator changes embedding matrix during inference. Change to normal multiplication.

sai-prasanna · 2019-06-18T10:22:11Z

@epwalsh Can you take a look into this to check whether implementations like CopyNet can be done using these abstractions without any hacks ?

@brendan-ai2 - Any update?

epwalsh · 2019-06-18T20:40:46Z

Hi @sai-prasanna, I think one issue with CopyNet-compatibility is that CopyNet.forward takes a few other arguments in addition to source_tokens and target_tokens. But those just end up just being passed around in a dictionary, so maybe the methods layed out in your new API could just take a dictionary as input to be more flexible? Idk though.. personally I don't think that's a great solution 🙃

brendan-ai2 · 2019-06-18T23:23:36Z

Hey @sai-prasanna, I'm giving this a closer look now. In the meantime, could you please ensure that scripts/verify.py passes? There are a number of linter errors that need to be handled. Thanks!

sai-prasanna · 2019-06-25T17:32:40Z

@brendan-ai2 Fixed the tests. Some guidance on what to unit test would be great.
And the way I have done weight tying is very unclean. I have accessed private variable containing the token embedder (can't change it to public without a big mess, it would break deserialization of older models).

If there is some other generic mechanism, maybe inIntializorApplicator to tie two parameters after model construction it would be better I guess.

brendan-ai2

Hi again @sai-prasanna, this is looking pretty good. I left a number of comments around clarifying the documentation (important for an abstraction, I think), but functionally I think this is close.

As for testing, could you try training (in the test) a very small autoencoder for each of the encoders/decoders and verify that they can recreate a short sequence? That would seem to be a nice sanity check.

Also, the linter on our CI is still complaining, so please be sure get scripts/verify.py passing. If it's passing locally for you, do let me know!

allennlp/data/dataset_readers/seq2seq.py

allennlp/models/encoder_decoders/composed_seq2seq.py

allennlp/tests/predictors/seq2seq_test.py

allennlp/modules/seq2seq_decoders/stacked_self_attention_decoder_net.py

allennlp/models/encoder_decoders/composed_seq2seq.py

allennlp/modules/seq2seq_decoders/decoder_net.py

allennlp/modules/seq2seq_decoders/stacked_self_attention_decoder_net.py

allennlp/modules/seq2seq_decoders/seq_decoder.py

allennlp/models/encoder_decoders/composed_seq2seq.py

Add additional documentation for Seq2Seq classes and clarify few others. Make ComposedSeq2Seq to share entire embedder instead of just weights.

Add new test cases for seq2seq_decoders.

sai-prasanna · 2019-08-21T09:45:03Z

@brendan-ai2 Sorry that it took a while to get back to this.
I have addressed the points in your review and added few more tests.

schmmd · 2019-09-06T15:28:02Z

@brendan-ai2 would you please take another look?

brendan-ai2

Looks good to me! (There's a minor typo in some logging that would be great to fix, but I can patch it after this is merged too if you're busy.)

@sai-prasanna , thank you so much for working on this! Especially in the face of the slow feedback cycle which I'm to blame for. Sorry again for that. I'm really looking forward to having this in master and getting some users' eyes on it! All the best.

allennlp/data/dataset_readers/seq2seq.py

sai-prasanna · 2019-09-24T12:45:54Z

@brendan-ai2 Thanks for the review. Getting it into user's eyes would really help us get more clarity.

TBH I am currently using fairseq directly for my tasks because it supports fp16, accumulation, DDP, and dataset reader/tensorization is fast. I couldn't battle test these abstractions as much as I wanted to.

brendan-ai2 · 2019-09-24T17:52:44Z

Merged! Thanks for the feedback about fairseq too. It's good for us to know what features people find important...

@generall

This completes the work started by @generall allenai#2517 to decompose seq2seq encoder_decoder models. ## Explanation * `ComposedSeq2Seq` is the class which composes older `Seq2SeqEncoder` and the newly defined `SeqDecoder`. * `SeqDecoder` is an abstract class upon which Decoder implementations are extended upon * `AutoRegressiveSeqDecoder` is the default implementation for `SeqDecoder`. It composes `DecoderNet`. The decoder net can implemented by anything from a transformer to a LSTMCell. In case of implementations like transformer which support parallel decoding (for training), we have a `decodes_parallel` flag. The `AutoRegressiveSeqDecoder` uses this flag to exploit single forward pass decoding in case of teacher forcing. ## Questions * Can the current items such as the embedder in `SeqDecoder` can be moved to the default autoregressive decoder implementation? * Are the current tests sufficient? * I am reusing the harvard Transformer parts in the private API to implement `StackedSelfAttentionDecoderNet` ## Help * ~Need help testing the transformer module on actual data. I am getting NAN errors during validation phase, while running beam search.~ Fixed Solves allenai#2097

generall and others added 19 commits February 16, 2019 13:57

Decoder class interfaces, initial scaffold

5c48527

Done:

994afba

* Simple Seq2Seq decoder implemented * Seq2Seq decoder integrated into SimpleSeq2Seq model * Decoder accepts the whole decoding step history ToDo: * Write tests * Implement Attention Transformer decoder as demonstration

Rename Seq2SeqDecoder to DecoderCell

a4bb506

Attempt to decompose Seq2Seq models, make universal interface for sim…

348f3fe

…ple_seq2sseq and CopyNet

decompose simple seq2seq model, fix tests

790400b

fix comments and add dimensions check

97c7d11

Make decoder state more general

d377191

renames + minor fixes + predictor test

dc40b0e

Merge branch 'master' into decoder-updates

7a9ea5a

update docstring

06bb028

Merge branch 'decoder-updates' of github.com:generall/allennlp into d…

6aa329d

…ecoder-updates

Registrable SeqDecoder

c2704ad

Remove decoder_cell from SeqDecoder, rename SimpleSeqDecoder -> RnnSe…

f49ecc2

…qDecoder

WIP: Transformer Decoder updates.

5974dc8

WIP: Generalized transformer and cell based LSTMs to common module.

ee47429

WIP: Decoder Abstraction - move target_embedder into abstract class.

88178bf

Merge remote-tracking branch 'origin/master' into decoder-updates

0a42f98

Rename DecoderModule to DecoderNet, update documentation for classes …

0363e96

…and methods. Refactor is_sequential flag to decodes_parallel flag.

Deduplicate transformer components in decoder and add initializer and

87f3a5c

regularizer to ComposedSeq2Seq

kl2806 requested a review from brendan-ai2 June 7, 2019 22:29

sai-prasanna added 3 commits June 9, 2019 01:32

Fix NaNs in inference.

6ca2a83

In place operator changes embedding matrix during inference. Change to normal multiplication.

Add weight tying to decoder and encoder.

74bc0fa

Fix mypy type checks.

9ae9beb

matt-gardner mentioned this pull request Jun 14, 2019

Seq2Seq model decomposition #2517

Closed

Fix linting errors.

1895c68

matt-gardner mentioned this pull request Jun 24, 2019

Split AllenNLP into pieces #2996

Closed

brendan-ai2 reviewed Jun 28, 2019

View reviewed changes

allennlp/models/encoder_decoders/composed_seq2seq.py Outdated Show resolved Hide resolved

brendan-ai2 reviewed Jun 28, 2019

View reviewed changes

allennlp/models/encoder_decoders/composed_seq2seq.py Outdated Show resolved Hide resolved

sai-prasanna added 5 commits August 19, 2019 16:39

Fix documentation and ComposedSeq2Seq.

3e790c8

Add additional documentation for Seq2Seq classes and clarify few others. Make ComposedSeq2Seq to share entire embedder instead of just weights.

Merge branch 'master' into decoder-updates

8565f45

Add documentation for seq2seq.

901cb1e

Fix documentation formatting for ComposedSeq2Seq related modules.

37aa9fa

Fix documentation and add new test cases

6a514a0

Add new test cases for seq2seq_decoders.

Add tracking for max length truncation in Seq2Seq.

726b312

brendan-ai2 added 2 commits September 13, 2019 18:20

Merge branch 'master' into decoder-updates

a964c8d

Merge branch 'master' into decoder-updates

4c4e341

brendan-ai2 approved these changes Sep 24, 2019

View reviewed changes

allennlp/data/dataset_readers/seq2seq.py Outdated Show resolved Hide resolved

allennlp/data/dataset_readers/seq2seq.py Outdated Show resolved Hide resolved

brendan-ai2 and others added 2 commits September 23, 2019 18:09

Merge branch 'master' into decoder-updates

ab31991

Fix logs with source and target exceeded counts

35d5209

Merge branch 'master' into decoder-updates

56536d1

brendan-ai2 merged commit 3b22011 into allenai:master Sep 24, 2019

sai-prasanna deleted the decoder-updates branch September 25, 2019 11:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Composed Sequence to Sequence Abstraction #2913

Composed Sequence to Sequence Abstraction #2913

sai-prasanna commented May 31, 2019 •

edited

Loading

sai-prasanna commented Jun 7, 2019

matt-gardner commented Jun 7, 2019

brendan-ai2 commented Jun 7, 2019

sai-prasanna commented Jun 8, 2019

sai-prasanna commented Jun 18, 2019

epwalsh commented Jun 18, 2019

brendan-ai2 commented Jun 18, 2019

sai-prasanna commented Jun 25, 2019

brendan-ai2 left a comment

sai-prasanna commented Aug 21, 2019

schmmd commented Sep 6, 2019

brendan-ai2 left a comment

sai-prasanna commented Sep 24, 2019

brendan-ai2 commented Sep 24, 2019

Composed Sequence to Sequence Abstraction #2913

Composed Sequence to Sequence Abstraction #2913

Conversation

sai-prasanna commented May 31, 2019 • edited Loading

Explanation

Questions

Help

sai-prasanna commented Jun 7, 2019

matt-gardner commented Jun 7, 2019

brendan-ai2 commented Jun 7, 2019

sai-prasanna commented Jun 8, 2019

sai-prasanna commented Jun 18, 2019

epwalsh commented Jun 18, 2019

brendan-ai2 commented Jun 18, 2019

sai-prasanna commented Jun 25, 2019

brendan-ai2 left a comment

Choose a reason for hiding this comment

sai-prasanna commented Aug 21, 2019

schmmd commented Sep 6, 2019

brendan-ai2 left a comment

Choose a reason for hiding this comment

sai-prasanna commented Sep 24, 2019

brendan-ai2 commented Sep 24, 2019

sai-prasanna commented May 31, 2019 •

edited

Loading