-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend Bert support #829
base: dev
Are you sure you want to change the base?
Extend Bert support #829
Conversation
3625234
to
7a6dec9
Compare
7ebcc2d
to
c09336a
Compare
… instead of logits
045044e
to
bb6ce00
Compare
) -> Optional[ | ||
Union[ | ||
Float[torch.Tensor, "batch pos d_vocab"], | ||
Float[torch.Tensor, "batch 2"], | ||
str, | ||
List[str], | ||
] | ||
]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything that can be done to simplify this return type? This function could return 5 different states, and the worst part of that is that two of the potential states are the exact same type of data, but in different shapes. I understand that they will be used in completely different contexts, but it's for that reason that I am kind of wondering why we don't do a bit more with this, and create two distinct modules for the two different use cases. All of the shared functionality could be put into components that are used in both, and it may simply be easier to communicate the usage to end users if they are distinct. The real question is, is there any use case where someone will be using both use cases with the same instance of the module? If there is a valid use case for when that may happen, then leaving it as is is probably fine. However, if it's probably not going to happen, I think I would prefer to find a way to make these more simple and single purpose.
Description
The current implementation of BERT only implements MaskedLanguageModelling and has a certain number of other limitations, like only being able to take tokens as input or only supporting the model "bert-base-cased". This PR intends to enhance the BERT support of TransformerLens by addressing these limitations.
Features added include:
I also extensively added and edited documentation and the existing BERT notebook.
There is no specific issue attached to this PR.
Checklist: