Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Fix LanguageModel API to easily handle bidirectional contextualizers. #2373

Closed
brendan-ai2 opened this issue Jan 16, 2019 · 8 comments
Closed
Assignees

Comments

@brendan-ai2
Copy link
Contributor

brendan-ai2 commented Jan 16, 2019

Is your feature request related to a problem? Please describe.
Currently LanguageModel takes a single contextualizer. All our normal Seq2SeqEncoders work here in the unidirectional case. In the bidirectional case, however, our normal Seq2SeqEncoders are broken when they contain more than one layer. See /~https://github.com/allenai/allennlp/blob/master/allennlp/models/language_model.py#L61. Special handling is needed like that contained in the BidirectionalLanguageModelTransformer. See /~https://github.com/allenai/allennlp/blob/master/allennlp/modules/seq2seq_encoders/bidirectional_language_model_transformer.py#L262. This is an issue because a model can be silently broken. Further, a user can't simply specify lstm, gru, etc. in their config and get a working LM in general.

Describe the solution you'd like
Let's provide a wrapper Seq2SeqEncoder, maybe a BidirectionalLanguageModelObjectiveEncoder, that handles the masking and direction changes transparently. It would accept an underlying Seq2SeqEncoder that would be duplicated to prevent parameter sharing or perhaps two entirely separate encoders.

Describe alternatives you've considered
In the short term we could at least check that users aren't using our default encoders with bidirectional=True and num_layers>1. (Annoyingly we'd need a class blacklist as opposed to checking that condition directly as the transformer encoder does the correct thing despite satisfying the condition.)

Additional context
LSTM Encoder from Calypso: /~https://github.com/allenai/calypso/blob/master/calypso/lstm_encoder.py#L22

Note that it's hardwired to be bidirectional and have only one layer.

@brendan-ai2
Copy link
Contributor Author

fyi @nelson-liu

@matt-gardner
Copy link
Contributor

Quick note: you'd definitely need two Seq2SeqEncoders, otherwise you'd have a single encoder with backward and forward parameters tied, which is almost certainly not what you want.

@brendan-ai2
Copy link
Contributor Author

@matt-gardner, sorry I was unclear. What I meant was two types of encoders vs one type of encoder. We very definitely need to instantiate two different instances. :)

@brendan-ai2
Copy link
Contributor Author

Updated description: "It would accept an underlying Seq2SeqEncoder that would be duplicated to prevent parameter sharing or perhaps two entirely separate encoders."

@nelson-liu
Copy link
Contributor

nelson-liu commented Jan 24, 2019

Given the discussion in #2414 , it seems like a solution would be to have the user instantiate two forward direction contextualizers. The LM would then use one for the forward direction, and another for the backward direction. with this implementation, we would error when a bidirectional contextualizer is used.

@brendan-ai2
Copy link
Contributor Author

For the record, @nelson-liu and I chatted offline about this in some detail. His solution is definitely necessary given that contiguous language models need to provide distinct targets for the forward and backward contextualizers. Sounds like a PR is incoming... :)

@github-actions
Copy link

@nelson-liu this is just a friendly ping to make sure you haven't forgotten about this issue 😜

@nelson-liu
Copy link
Contributor

I've got to admit that I've lost almost all context on this issue, and it feels like we've decided to not proceed forward with it—closing this for now, unsure how big of an issue this really is in the current day

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants