Extend Bert support #829

degenfabian · 2025-01-06T01:55:38Z

Description

The current implementation of BERT only implements MaskedLanguageModelling and has a certain number of other limitations, like only being able to take tokens as input or only supporting the model "bert-base-cased". This PR intends to enhance the BERT support of TransformerLens by addressing these limitations.

Features added include:

Next Sentence Prediction
Accepting strings and lists of strings as input and not only tokens
Directly return human-readable predictions instead of only logits
Support for Bert models "bert-base-uncased", "bert-large-uncased", "bert-large-cased"

I also extensively added and edited documentation and the existing BERT notebook.

There is no specific issue attached to this PR.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

… instead of logits

bryce13950 · 2025-01-18T22:40:19Z

transformer_lens/HookedEncoder.py

+    ) -> Optional[
+        Union[
+            Float[torch.Tensor, "batch pos d_vocab"],
+            Float[torch.Tensor, "batch 2"],
+            str,
+            List[str],
+        ]
+    ]:


Is there anything that can be done to simplify this return type? This function could return 5 different states, and the worst part of that is that two of the potential states are the exact same type of data, but in different shapes. I understand that they will be used in completely different contexts, but it's for that reason that I am kind of wondering why we don't do a bit more with this, and create two distinct modules for the two different use cases. All of the shared functionality could be put into components that are used in both, and it may simply be easier to communicate the usage to end users if they are distinct. The real question is, is there any use case where someone will be using both use cases with the same instance of the module? If there is a valid use case for when that may happen, then leaving it as is is probably fine. However, if it's probably not going to happen, I think I would prefer to find a way to make these more simple and single purpose.

Bert masked language modelling refactor

8bb2250

degenfabian changed the base branch from main to dev January 6, 2025 23:32

degenfabian force-pushed the bert_extend_support branch from 3625234 to 7a6dec9 Compare January 7, 2025 20:55

degenfabian marked this pull request as ready for review January 7, 2025 22:02

degenfabian force-pushed the bert_extend_support branch from 7ebcc2d to c09336a Compare January 9, 2025 18:22

Fabian Degen added 10 commits January 15, 2025 04:53

Implement Next sentence prediction for BERT

fc93165

Implement tokenization

e3b63f5

Add more bert models

2383ff9

Allowing return type predictions to directly return model predictions…

6d5aabc

… instead of logits

Fix unrelated typos

42fc46d

Reflect changes in BERT notebook

97d81e6

Adjust colab_compatibility

7317362

Add test cases

9554f8b

Type hinting

887cbeb

Format

bb6ce00

degenfabian force-pushed the bert_extend_support branch from 045044e to bb6ce00 Compare January 15, 2025 03:58

Remove embeddings as input to forward function

c9ab729

bryce13950 reviewed Jan 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend Bert support #829

Extend Bert support #829

degenfabian commented Jan 6, 2025 •

edited

Loading

bryce13950 Jan 18, 2025

Extend Bert support #829

Are you sure you want to change the base?

Extend Bert support #829

Conversation

degenfabian commented Jan 6, 2025 • edited Loading

Description

Checklist:

bryce13950 Jan 18, 2025

Choose a reason for hiding this comment

degenfabian commented Jan 6, 2025 •

edited

Loading