Decompose LanguageModel contextualizer into forward_ and backward_ contextualizer #2438

nelson-liu · 2019-01-26T22:12:49Z

As discussed in #2373 , this PR adds forward_contextualizer and backward_contextualizer arguments to the LanguageModel, towards bidirectional language modeling of contiguous text.

…ntextualizer

…directional

nelson-liu · 2019-01-26T22:13:27Z

FYI @brendan-ai2 , I was able to retrain the model at /~https://github.com/allenai/allennlp/blob/master/training_config/constituency_parser_transformer_elmo.jsonnet with these changes, so seems like backwards compatibility is maintained.

…textualizer

allennlp/models/language_model.py

nelson-liu · 2019-01-27T06:56:24Z

allennlp/models/language_model.py

+                        self._backward_contextualizer(embeddings, mask))
+                # Concatenate the backward contextual embeddings to the
+                # forward contextual embeddings
+                if (isinstance(contextual_embeddings, list) and


@brendan-ai2 i'm struggling to test this case (where contextual_embeddings and backward_contextual_embeddings are lists), since I'm not sure when they would return lists :) any pointers?

Blargh. :/ The contextualizer can be made to return all layers for embedding.

/~https://github.com/allenai/allennlp/blob/master/allennlp/modules/token_embedders/language_model_token_embedder.py#L61

There basically isn't a real API in place right now as this is specific to the transformer contextualizer. We definitely need to figure out how this plays with having more general contextualizers... It is an important feature.

Yeah, seems hard to get the per-layer outputs of a multilayer pytorch LSTM / other contextualizers in a flexible fashion...

Off the top of my head, a solution would be to modify /~https://github.com/allenai/allennlp/blob/master/allennlp/modules/seq2seq_encoders/seq2seq_encoder.py#L5 so that it takes a constructor arg "return_all_layers" (like with lazy in the DatasetReader) and then have the subclasses do the appropriate thing. Maybe also add forward as an explicit method in Seq2SeqEncoder as well in order to clearly document the behavior and the type.

I think the main trouble is that the PyTorch LSTM doesn't return the outputs for all layers and all timesteps?

Excellent point, @matt-gardner.

@nelson-liu, I should elaborate a bit. The pure object-oriented approach would be to have a MultiLayerSeq2SeqEncoder that subclasses the existing Seq2SeqEncoder. Annoyingly this would break anyone that's using a vanilla Seq2SeqEncoder currently. Another issue is that when many of these features are independent there is a risk of the class hierarchy becoming very deep resulting in, say, MultiLayerSeq2SeqEncoderWithFooAndBar, MultiLayerSeq2SeqEncoderWithFooAndNotBar, etc. There are a few ways to guard against this. First, one can make judicious use of defaults -- at the risk of weakening (if not breaking) the abstraction as Matt points out. Another option would be to use mixins. (Though that would also break existing users.) You can also adopt a sort of manual approach. This might involve exposing is_multi_layer for user code to query. This is ugly, but can be quite flexible and aid in backwards compatibility.

Let's figure out how/whether to obtain the extra layers and then proceed with caution as Matt suggests. :)

Sorry, I missed this mention. We don't need to handle the case that the contextualizer returns multiple layers here, because it will never do that during training.

I think really the problem here has stemmed from the fact we have been passing around the return_all_layers argument as a constructor parameter and not a runtime forward parameter (or as a separate method on a Contextualizer subclass of Seq2SeqEncoder.

Is the right API here this:

class Seq2SeqEncoder class Contextualizer(Seq2SeqEncoder): # Idea 1: def forward(sequence: torch.Tensor[batch, sequence, embedding], return_all_layers = False) -> Union[torch.Tensor[batch, sequence, embedding], List[of the same thing]] # this way all Contextualizers can still function as `Seq2SeqEncoders` by default, # which is useful for training and perhaps for downstream use, say if you wanted to # fine tune an encoder and didn't want to do an elmo mixture. However, all # `ContextualTokenEmbedders` could call `Contextualizer.forward(return_all_layers=True)`. # Second idea: def forward(sequence: torch.Tensor[batch, sequence, embedding]) -> torch.Tensor[batch, sequence, embedding] def get_layers(sequence: torch.Tensor) -> List[torch.Tensor[batch, sequence, embedding]]: # A separate method which basically implements the functionality above. # This would get around the type problems of `forward` possibly returning lists # of tensors when used as a `Seq2SeqEncoder`.

This doesn't handle Nelson's case that the contextualizers might be stateful during training. I haven't thought hard about that because i'm not 100% convinced that we need to support it as part of a concrete API yet, but i'm happy to be convinced!

more thoughts later, but:

This doesn't handle Nelson's case that the contextualizers might be stateful during training. I haven't thought hard about that because i'm not 100% convinced that we need to support it as part of a concrete API yet, but i'm happy to be convinced!

How else would you do language modeling of contiguous text? At the very least, it'd be useful to add the (left-to-right) LM functionality to AllenNLP so people can easily train on contiguous-text datasets like the PTB or wikitext. I think having the bidirectional contiguous text LM would be useful as well, since I'm actually curious to run the experiment and find out whether contiguous text matters (I suspect it does, or at least doesn't hurt). Bidirectionality is definitely crucial to getting the best contextual representations.

Would #2716 help here?

…ectional

brendan-ai2 · 2019-01-31T05:24:23Z

Cool! To double check, when you retrained the constituency parser which trained LM did you use to perform the embedding? Ideally it would be the old one I linked you previously.

Also, fyi, my responses will be slow/sporadic this week due to AAAI.

nelson-liu · 2019-01-31T05:25:42Z

Cool! To double check, when you retrained the constituency parser which trained LM did you use to perform the embedding? Ideally it would be the old one I linked you previously.

Yup, I used the old one.

Also, fyi, my responses will be slow/sporadic this week due to AAAI.

No worries, have a good time!

brendan-ai2 · 2019-02-14T02:36:55Z

@nelson-liu, just wanted to check in on this and make sure I'm not blocking you. Are you still proceeding with this approach?

DeNeutoy

Sorry it took so long to add my comments to this, I missed the first mention!

DeNeutoy · 2019-02-14T07:21:00Z

allennlp/models/language_model.py

@@ -74,6 +75,23 @@ class LanguageModel(Model):
    contextualizer: ``Seq2SeqEncoder``
        Used to "contextualize" the embeddings. As described above,
        this encoder must not cheat by peeking ahead.
+
+        .. deprecated:: 0.8.2


Question: To me this seems like we are ripping out quite a good api for one which is more complicated. We already have a trained BidirectionalLanguageModel which uses a bidirectional transformer, which is very useful to many people (e.g github issues/ Swabha is using it in her research), which makes me unsure that depreciating a key component of it is a good idea. Do we know that bidirectional language modeling of contiguous text is worth it? I can't imagine a downstream task where you actually need unlimited context (and even if there was one, how that would practically work). Is it worth ripping this into a separate repo, making the changes and double checking that contiguous text does something useful?

Additionally, we now have a BidirectionalLanguageModel subclass which is subsumed by this code (I think). This is "bad code smell" to me when we have a subclass which only differs in constructor arguments and we only have a single example of how those arguments might differ (it triggers my MultiClassMultipleChoiceMemoryNetwork(MemoryNetwork, MultiClass, MultipleChoice) sense from DeepQa).

Leaving the API decision to others, here's some motivation for enabling modeling of longer contexts: https://ai.googleblog.com/2019/01/transformer-xl-unleashing-potential-of.html. They basically make a stateful transformer, showing pretty big gains.

DeNeutoy · 2019-02-14T07:21:14Z

allennlp/models/language_model.py

+                        self._backward_contextualizer(embeddings, mask))
+                # Concatenate the backward contextual embeddings to the
+                # forward contextual embeddings
+                if (isinstance(contextual_embeddings, list) and


Sorry, I missed this mention. We don't need to handle the case that the contextualizer returns multiple layers here, because it will never do that during training.

I think really the problem here has stemmed from the fact we have been passing around the return_all_layers argument as a constructor parameter and not a runtime forward parameter (or as a separate method on a Contextualizer subclass of Seq2SeqEncoder.

Is the right API here this:

class Seq2SeqEncoder class Contextualizer(Seq2SeqEncoder): # Idea 1: def forward(sequence: torch.Tensor[batch, sequence, embedding], return_all_layers = False) -> Union[torch.Tensor[batch, sequence, embedding], List[of the same thing]] # this way all Contextualizers can still function as `Seq2SeqEncoders` by default, # which is useful for training and perhaps for downstream use, say if you wanted to # fine tune an encoder and didn't want to do an elmo mixture. However, all # `ContextualTokenEmbedders` could call `Contextualizer.forward(return_all_layers=True)`. # Second idea: def forward(sequence: torch.Tensor[batch, sequence, embedding]) -> torch.Tensor[batch, sequence, embedding] def get_layers(sequence: torch.Tensor) -> List[torch.Tensor[batch, sequence, embedding]]: # A separate method which basically implements the functionality above. # This would get around the type problems of `forward` possibly returning lists # of tensors when used as a `Seq2SeqEncoder`.

This doesn't handle Nelson's case that the contextualizers might be stateful during training. I haven't thought hard about that because i'm not 100% convinced that we need to support it as part of a concrete API yet, but i'm happy to be convinced!

matt-gardner · 2019-04-19T22:29:22Z

@nelson-liu, how do you want to proceed with this?

nelson-liu · 2019-05-02T00:04:27Z

I plan on coming back to this eventually, just got caught up with some other stuff...

matt-gardner · 2019-06-14T15:06:37Z

@nelson-liu, is now a good time to come back to this?

DeNeutoy · 2019-11-19T17:53:39Z

Closing due to inactivity and the Age Of The Transformer, which makes this PR not super important.

nelson-liu added 2 commits January 26, 2019 13:57

Decompose LanguageModel contextualizer into forward_ and backward_ co…

32a810c

…ntextualizer

Error if forward_contextualizer or backward_contextualizer is not uni…

1631caf

…directional

nelson-liu requested a review from brendan-ai2 January 26, 2019 22:12

nelson-liu and others added 11 commits January 26, 2019 14:13

Merge branch 'master' into rfc_lm_contextualizers

9cdf898

Fix error if forward/backward ctxer is bidirectional

b4d976d

Fix mypy

6db32d7

Make contextualizer optional, simplify conditional

70262a5

Change bidirectional contextualizer to 1 layer, so it isn't cheating

493f798

Test biLM with forward_ and backward_contextualizer

3573b68

Test for errors when only one ctxer is given or a bidirectional ctxer

471d5ea

Test that error is raised when forward ctxer is bidirectional

6fff544

Test that error is thrown when fwd and bwd ctxer output size differs

3b1918c

Test functionality and checks in unidirectional case with forward_con…

2f7173e

…textualizer

Raise error when fwd / bwd ctxers return different types

15e0234

nelson-liu commented Jan 27, 2019

View reviewed changes

allennlp/models/language_model.py Outdated Show resolved Hide resolved

nelson-liu commented Jan 27, 2019

View reviewed changes

nelson-liu and others added 2 commits January 30, 2019 10:07

Merge branch 'master' into rfc_lm_contextualizers

3801ef7

Emit warning about cheating when using contextualizer, and it's bidir…

18f233f

…ectional

nelson-liu and others added 2 commits January 31, 2019 13:42

Fix lint

0db58dd

Merge branch 'master' into rfc_lm_contextualizers

136a66b

DeNeutoy reviewed Feb 14, 2019

View reviewed changes

nelson-liu mentioned this pull request Jun 14, 2019

[WIP] Language Modeling of Contiguous Text #2414

Closed

6 tasks

DeNeutoy closed this Nov 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decompose LanguageModel contextualizer into forward_ and backward_ contextualizer #2438

Decompose LanguageModel contextualizer into forward_ and backward_ contextualizer #2438

nelson-liu commented Jan 26, 2019

nelson-liu commented Jan 26, 2019

nelson-liu Jan 27, 2019

brendan-ai2 Jan 31, 2019

nelson-liu Jan 31, 2019

brendan-ai2 Jan 31, 2019

nelson-liu Jan 31, 2019

brendan-ai2 Feb 1, 2019

brendan-ai2 Feb 1, 2019

DeNeutoy Feb 14, 2019

nelson-liu Feb 14, 2019 •

edited

Loading

matt-gardner Jun 14, 2019

brendan-ai2 commented Jan 31, 2019

nelson-liu commented Jan 31, 2019

brendan-ai2 commented Feb 14, 2019

DeNeutoy left a comment

DeNeutoy Feb 14, 2019

matt-gardner Feb 14, 2019

DeNeutoy Feb 14, 2019

matt-gardner commented Apr 19, 2019

nelson-liu commented May 2, 2019

matt-gardner commented Jun 14, 2019

DeNeutoy commented Nov 19, 2019

Decompose LanguageModel contextualizer into forward_ and backward_ contextualizer #2438

Decompose LanguageModel contextualizer into forward_ and backward_ contextualizer #2438

Conversation

nelson-liu commented Jan 26, 2019

nelson-liu commented Jan 26, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nelson-liu Feb 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brendan-ai2 commented Jan 31, 2019

nelson-liu commented Jan 31, 2019

brendan-ai2 commented Feb 14, 2019

DeNeutoy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matt-gardner commented Apr 19, 2019

nelson-liu commented May 2, 2019

matt-gardner commented Jun 14, 2019

DeNeutoy commented Nov 19, 2019

nelson-liu Feb 14, 2019 •

edited

Loading