Set default_prepend_bos to False in Bloom model configuration (#806) · TransformerLensOrg/TransformerLens@b5bd89c

Commit

Set default_prepend_bos to False in Bloom model configuration (#806)

* fix prepend_bos to False by default for bloom model family

* add comment

* edit documentation

* fix wrong expected value for bloom-560m model loss

* fix expected loss value for bloom model computed with google colab

* set prepend_bos to user value, then to value in model config and then default to true

* fix format

* remove log points in test_hooked_transformer

* remove einsum in forward pass in AbstractAttention (#783)

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Fabian Degen <fabian.degen@mytum.de>

* Colab compatibility bug fixes (#794)

* call functions on model object instead of model string in run_encoder_decoder_set

* remove generate call in run_encoder_decoder_set because HookedEncoderDecoder doesn't support generate yet

* add testing function for HookedEncoders and stop testing BERT as HookedTransformer

* clear cell output to prevent test from failing

* add comment about bert working with free version of colab

---------

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>

* remove einsum usage from create_alibi_bias function in AbstractAttention (#781)

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Fabian Degen <fabian.degen@mytum.de>

* updated token location (#797)

* remove einsum from apply_causal_mask in abstract_attention (#782)

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Fabian Degen <fabian.degen@mytum.de>

* clarified arguments a bit for hook_points (#799)

* remove einsum in logit_attrs in ActivationCache (#788)

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* Remove einsum in compute_head_results in ActivationCache (#789)

* remove einsum in compute_head_results in ActivationCache

* ran format

---------

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* Remove einsum usage in refactor_factored_attn_matrices in HookedTransformer (#791)

* remove einsum usage in refactor_factored_attn_matrices in HookedTransformer

* fix format

---------

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* Remove einsum usage in _get_w_in_matrix in SVDInterpreter (#792)

* remove einsum usage in _get_w_in_matrix in SVDInterpreter

* fix format

---------

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>

* Remove einsum usage in forward function of BertMLMHead (#793)

* remove einsam usage in forward function of BertMLMHead

* fix format

---------

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* Set default_prepend_bos to false in Bloom configuration

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Fabian Degen <fabian.degen@mytum.de>

Loading branch information

3 people authored Dec 5, 2024

1 parent f4279b5 commit b5bd89c

transformer_lens/loading_from_pretrained.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -1158,6 +1158,7 @@ def convert_hf_model_config(model_name: str, **kwargs): @@
                 "normalization_type": "LN",
                 "post_embedding_ln": True,
                 "positional_embedding_type": "alibi",
+                "default_prepend_bos": False,
             }
         elif architecture == "GPT2LMHeadCustomModel":
             # santacoder
@@ Expand Down @@

0 comments on commit `b5bd89c`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `b5bd89c`

Commit

There are no files selected for viewing

0 comments on commit b5bd89c

0 comments on commit `b5bd89c`