-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Composed Sequence to Sequence Abstraction #2913
Conversation
…ple_seq2sseq and CopyNet
…and methods. Refactor is_sequential flag to decodes_parallel flag.
regularizer to ComposedSeq2Seq
@brendan-ai2, can I get you to take this one? |
@matt-gardner, @sai-prasanna I can take a look, but I'm out from today until next Thursday (6/13). So it would be somewhat delayed. |
@matt-gardner @brendan-ai2 Thought of wrapping this up this weekend. Thursday will be fine I guess. Since y'all are busy doing cool research (a bit jealous :p). A bit of good news, the bug in my transformer decoder is fixed. Only the review of abstractions is left, I guess. |
In place operator changes embedding matrix during inference. Change to normal multiplication.
@epwalsh Can you take a look into this to check whether implementations like CopyNet can be done using these abstractions without any hacks ? @brendan-ai2 - Any update? |
Hi @sai-prasanna, I think one issue with CopyNet-compatibility is that |
Hey @sai-prasanna, I'm giving this a closer look now. In the meantime, could you please ensure that |
@brendan-ai2 Fixed the tests. Some guidance on what to unit test would be great. If there is some other generic mechanism, maybe in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi again @sai-prasanna, this is looking pretty good. I left a number of comments around clarifying the documentation (important for an abstraction, I think), but functionally I think this is close.
As for testing, could you try training (in the test) a very small autoencoder for each of the encoders/decoders and verify that they can recreate a short sequence? That would seem to be a nice sanity check.
Also, the linter on our CI is still complaining, so please be sure get scripts/verify.py
passing. If it's passing locally for you, do let me know!
allennlp/modules/seq2seq_decoders/stacked_self_attention_decoder_net.py
Outdated
Show resolved
Hide resolved
Add additional documentation for Seq2Seq classes and clarify few others. Make ComposedSeq2Seq to share entire embedder instead of just weights.
Add new test cases for seq2seq_decoders.
@brendan-ai2 Sorry that it took a while to get back to this. |
@brendan-ai2 would you please take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! (There's a minor typo in some logging that would be great to fix, but I can patch it after this is merged too if you're busy.)
@sai-prasanna , thank you so much for working on this! Especially in the face of the slow feedback cycle which I'm to blame for. Sorry again for that. I'm really looking forward to having this in master and getting some users' eyes on it! All the best.
@brendan-ai2 Thanks for the review. Getting it into user's eyes would really help us get more clarity. TBH I am currently using fairseq directly for my tasks because it supports fp16, accumulation, DDP, and dataset reader/tensorization is fast. I couldn't battle test these abstractions as much as I wanted to. |
Merged! Thanks for the feedback about fairseq too. It's good for us to know what features people find important... |
This completes the work started by @generall allenai#2517 to decompose seq2seq encoder_decoder models. ## Explanation * `ComposedSeq2Seq` is the class which composes older `Seq2SeqEncoder` and the newly defined `SeqDecoder`. * `SeqDecoder` is an abstract class upon which Decoder implementations are extended upon * `AutoRegressiveSeqDecoder` is the default implementation for `SeqDecoder`. It composes `DecoderNet`. The decoder net can implemented by anything from a transformer to a LSTMCell. In case of implementations like transformer which support parallel decoding (for training), we have a `decodes_parallel` flag. The `AutoRegressiveSeqDecoder` uses this flag to exploit single forward pass decoding in case of teacher forcing. ## Questions * Can the current items such as the embedder in `SeqDecoder` can be moved to the default autoregressive decoder implementation? * Are the current tests sufficient? * I am reusing the harvard Transformer parts in the private API to implement `StackedSelfAttentionDecoderNet` ## Help * ~Need help testing the transformer module on actual data. I am getting NAN errors during validation phase, while running beam search.~ Fixed Solves allenai#2097
This completes the work started by @generall #2517 to decompose seq2seq encoder_decoder models.
Explanation
ComposedSeq2Seq
is the class which composes olderSeq2SeqEncoder
and the newly definedSeqDecoder
.SeqDecoder
is an abstract class upon which Decoder implementations are extended uponAutoRegressiveSeqDecoder
is the default implementation forSeqDecoder
. It composesDecoderNet
. The decoder net can implemented by anything from a transformer to a LSTMCell. In case of implementations like transformer which support parallel decoding (for training), we have adecodes_parallel
flag. TheAutoRegressiveSeqDecoder
uses this flag to exploit single forward pass decoding in case of teacher forcing.Questions
SeqDecoder
can be moved to the default autoregressive decoder implementation?StackedSelfAttentionDecoderNet
Help
Need help testing the transformer module on actual data. I am getting NAN errors during validation phase, while running beam search.FixedSolves #2097