-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Fix LanguageModel API to easily handle bidirectional contextualizers. #2373
Comments
fyi @nelson-liu |
Quick note: you'd definitely need two |
@matt-gardner, sorry I was unclear. What I meant was two types of encoders vs one type of encoder. We very definitely need to instantiate two different instances. :) |
Updated description: "It would accept an underlying |
Given the discussion in #2414 , it seems like a solution would be to have the user instantiate two forward direction contextualizers. The LM would then use one for the forward direction, and another for the backward direction. with this implementation, we would error when a bidirectional contextualizer is used. |
For the record, @nelson-liu and I chatted offline about this in some detail. His solution is definitely necessary given that contiguous language models need to provide distinct targets for the forward and backward contextualizers. Sounds like a PR is incoming... :) |
@nelson-liu this is just a friendly ping to make sure you haven't forgotten about this issue 😜 |
I've got to admit that I've lost almost all context on this issue, and it feels like we've decided to not proceed forward with it—closing this for now, unsure how big of an issue this really is in the current day |
Is your feature request related to a problem? Please describe.
Currently
LanguageModel
takes a single contextualizer. All our normalSeq2SeqEncoder
s work here in the unidirectional case. In the bidirectional case, however, our normalSeq2SeqEncoder
s are broken when they contain more than one layer. See /~https://github.com/allenai/allennlp/blob/master/allennlp/models/language_model.py#L61. Special handling is needed like that contained in theBidirectionalLanguageModelTransformer
. See /~https://github.com/allenai/allennlp/blob/master/allennlp/modules/seq2seq_encoders/bidirectional_language_model_transformer.py#L262. This is an issue because a model can be silently broken. Further, a user can't simply specifylstm
,gru
, etc. in their config and get a working LM in general.Describe the solution you'd like
Let's provide a wrapper
Seq2SeqEncoder
, maybe aBidirectionalLanguageModelObjectiveEncoder
, that handles the masking and direction changes transparently. It would accept an underlyingSeq2SeqEncoder
that would be duplicated to prevent parameter sharing or perhaps two entirely separate encoders.Describe alternatives you've considered
In the short term we could at least check that users aren't using our default encoders with
bidirectional=True
andnum_layers>1
. (Annoyingly we'd need a class blacklist as opposed to checking that condition directly as the transformer encoder does the correct thing despite satisfying the condition.)Additional context
LSTM Encoder from Calypso: /~https://github.com/allenai/calypso/blob/master/calypso/lstm_encoder.py#L22
Note that it's hardwired to be bidirectional and have only one layer.
The text was updated successfully, but these errors were encountered: