Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Composed Sequence to Sequence Abstraction #2913

Merged
merged 34 commits into from
Sep 24, 2019
Merged

Composed Sequence to Sequence Abstraction #2913

merged 34 commits into from
Sep 24, 2019

Conversation

sai-prasanna
Copy link
Contributor

@sai-prasanna sai-prasanna commented May 31, 2019

This completes the work started by @generall #2517 to decompose seq2seq encoder_decoder models.

Explanation

  • ComposedSeq2Seq is the class which composes older Seq2SeqEncoder and the newly defined SeqDecoder.
  • SeqDecoder is an abstract class upon which Decoder implementations are extended upon
  • AutoRegressiveSeqDecoder is the default implementation for SeqDecoder. It composes DecoderNet. The decoder net can implemented by anything from a transformer to a LSTMCell. In case of implementations like transformer which support parallel decoding (for training), we have a decodes_parallel flag. The AutoRegressiveSeqDecoder uses this flag to exploit single forward pass decoding in case of teacher forcing.

Questions

  • Can the current items such as the embedder in SeqDecoder can be moved to the default autoregressive decoder implementation?
  • Are the current tests sufficient?
  • I am reusing the harvard Transformer parts in the private API to implement StackedSelfAttentionDecoderNet

Help

  • Need help testing the transformer module on actual data. I am getting NAN errors during validation phase, while running beam search. Fixed

Solves #2097

generall and others added 19 commits February 16, 2019 13:57
* Simple Seq2Seq decoder implemented
* Seq2Seq decoder integrated into SimpleSeq2Seq model
* Decoder accepts the whole decoding step history

ToDo:

* Write tests
* Implement Attention Transformer decoder as demonstration
…and methods.

Refactor is_sequential flag to decodes_parallel flag.
@sai-prasanna
Copy link
Contributor Author

@brendan-ai2 @matt-gardner

@matt-gardner
Copy link
Contributor

@brendan-ai2, can I get you to take this one?

@brendan-ai2
Copy link
Contributor

@matt-gardner, @sai-prasanna I can take a look, but I'm out from today until next Thursday (6/13). So it would be somewhat delayed.

@kl2806 kl2806 requested a review from brendan-ai2 June 7, 2019 22:29
@sai-prasanna
Copy link
Contributor Author

@matt-gardner @brendan-ai2 Thought of wrapping this up this weekend. Thursday will be fine I guess. Since y'all are busy doing cool research (a bit jealous :p).

A bit of good news, the bug in my transformer decoder is fixed.
During training mutating the embeddings didn't increase the loss too much, but nans popped up only during validation set inference using the beam search. Turned out that NaNs were due to "*=" in-place operator mutating the embeddings directly 😒 . During beam search, since the same embeddings are called repeatedly in the transformer decoder, it resulted in the bug having effect.

Only the review of abstractions is left, I guess.

In place operator changes embedding matrix during inference. Change to normal multiplication.
@sai-prasanna
Copy link
Contributor Author

@epwalsh Can you take a look into this to check whether implementations like CopyNet can be done using these abstractions without any hacks ?

@brendan-ai2 - Any update?

@epwalsh
Copy link
Member

epwalsh commented Jun 18, 2019

Hi @sai-prasanna, I think one issue with CopyNet-compatibility is that CopyNet.forward takes a few other arguments in addition to source_tokens and target_tokens. But those just end up just being passed around in a dictionary, so maybe the methods layed out in your new API could just take a dictionary as input to be more flexible? Idk though.. personally I don't think that's a great solution 🙃

@brendan-ai2
Copy link
Contributor

Hey @sai-prasanna, I'm giving this a closer look now. In the meantime, could you please ensure that scripts/verify.py passes? There are a number of linter errors that need to be handled. Thanks!

@sai-prasanna
Copy link
Contributor Author

@brendan-ai2 Fixed the tests. Some guidance on what to unit test would be great.
And the way I have done weight tying is very unclean. I have accessed private variable containing the token embedder (can't change it to public without a big mess, it would break deserialization of older models).

If there is some other generic mechanism, maybe inIntializorApplicator to tie two parameters after model construction it would be better I guess.

Copy link
Contributor

@brendan-ai2 brendan-ai2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi again @sai-prasanna, this is looking pretty good. I left a number of comments around clarifying the documentation (important for an abstraction, I think), but functionally I think this is close.

As for testing, could you try training (in the test) a very small autoencoder for each of the encoders/decoders and verify that they can recreate a short sequence? That would seem to be a nice sanity check.

Also, the linter on our CI is still complaining, so please be sure get scripts/verify.py passing. If it's passing locally for you, do let me know!

Add additional documentation for Seq2Seq classes and clarify few others.

Make ComposedSeq2Seq to share entire embedder instead of  just weights.
Add new test cases for seq2seq_decoders.
@sai-prasanna
Copy link
Contributor Author

@brendan-ai2 Sorry that it took a while to get back to this.
I have addressed the points in your review and added few more tests.

@schmmd
Copy link
Member

schmmd commented Sep 6, 2019

@brendan-ai2 would you please take another look?

Copy link
Contributor

@brendan-ai2 brendan-ai2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! (There's a minor typo in some logging that would be great to fix, but I can patch it after this is merged too if you're busy.)

@sai-prasanna , thank you so much for working on this! Especially in the face of the slow feedback cycle which I'm to blame for. Sorry again for that. I'm really looking forward to having this in master and getting some users' eyes on it! All the best.

allennlp/data/dataset_readers/seq2seq.py Outdated Show resolved Hide resolved
allennlp/data/dataset_readers/seq2seq.py Outdated Show resolved Hide resolved
@sai-prasanna
Copy link
Contributor Author

@brendan-ai2 Thanks for the review. Getting it into user's eyes would really help us get more clarity.

TBH I am currently using fairseq directly for my tasks because it supports fp16, accumulation, DDP, and dataset reader/tensorization is fast. I couldn't battle test these abstractions as much as I wanted to.

@brendan-ai2 brendan-ai2 merged commit 3b22011 into allenai:master Sep 24, 2019
@brendan-ai2
Copy link
Contributor

Merged! Thanks for the feedback about fairseq too. It's good for us to know what features people find important...

@sai-prasanna sai-prasanna deleted the decoder-updates branch September 25, 2019 11:48
reiyw pushed a commit to reiyw/allennlp that referenced this pull request Nov 12, 2019
This completes the work started by @generall allenai#2517 to decompose seq2seq encoder_decoder models.

## Explanation
* `ComposedSeq2Seq` is the class which composes older `Seq2SeqEncoder` and the newly defined `SeqDecoder`.
* `SeqDecoder` is an abstract class upon which Decoder implementations are extended upon 
* `AutoRegressiveSeqDecoder` is the default implementation for `SeqDecoder`. It composes `DecoderNet`. The decoder net can implemented by anything from a transformer to a LSTMCell. In case of implementations like transformer which support parallel decoding (for training), we have a `decodes_parallel` flag. The `AutoRegressiveSeqDecoder` uses this flag to exploit single forward pass decoding in case of teacher forcing.

## Questions
* Can the current items such as the embedder  in `SeqDecoder` can be moved to the default autoregressive decoder implementation?
* Are the current tests sufficient?
* I am reusing the harvard Transformer parts in the private API to implement `StackedSelfAttentionDecoderNet`

## Help
* ~Need help testing the transformer module on actual data. I am getting NAN errors during validation phase, while running beam search.~ Fixed

Solves allenai#2097
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants