From b5bd89c8d8ac4d05d81ee17f129ca91a4ed991bd Mon Sep 17 00:00:00 2001
From: Fabian Degen <106864199+degenfabian@users.noreply.github.com>
Date: Wed, 4 Dec 2024 17:13:43 -0800
Subject: [PATCH] Set default_prepend_bos to False in Bloom model configuration
 (#806)

* fix prepend_bos to False by default for bloom model family

* add comment

* edit documentation

* fix wrong expected value for bloom-560m model loss

* fix expected loss value for bloom model computed with google colab

* set prepend_bos to user value, then to value in model config and then default to true

* fix format

* remove log points in test_hooked_transformer

* remove einsum in forward pass in AbstractAttention (#783)

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Fabian Degen <fabian.degen@mytum.de>

* Colab compatibility bug fixes (#794)

* call functions on model object instead of model string in run_encoder_decoder_set

* remove generate call in run_encoder_decoder_set because HookedEncoderDecoder doesn't support generate yet

* add testing function for HookedEncoders and stop testing BERT as HookedTransformer

* clear cell output to prevent test from failing

* add comment about bert working with free version of colab

---------

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>

* remove einsum usage from create_alibi_bias function in AbstractAttention (#781)

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Fabian Degen <fabian.degen@mytum.de>

* updated token location (#797)

* remove einsum from apply_causal_mask in abstract_attention (#782)

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Fabian Degen <fabian.degen@mytum.de>

* clarified arguments a bit for hook_points (#799)

* remove einsum in logit_attrs in ActivationCache (#788)

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* Remove einsum in compute_head_results in ActivationCache (#789)

* remove einsum in compute_head_results in ActivationCache

* ran format

---------

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* Remove einsum usage in refactor_factored_attn_matrices in HookedTransformer (#791)

* remove einsum usage in refactor_factored_attn_matrices in HookedTransformer

* fix format

---------

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* Remove einsum usage in _get_w_in_matrix in SVDInterpreter (#792)

* remove einsum usage in _get_w_in_matrix in SVDInterpreter

* fix format

---------

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>

* Remove einsum usage in forward function of BertMLMHead (#793)

* remove einsam usage in forward function of BertMLMHead

* fix format

---------

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* Set default_prepend_bos to false in Bloom configuration

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Fabian Degen <fabian.degen@mytum.de>
---
 transformer_lens/loading_from_pretrained.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/transformer_lens/loading_from_pretrained.py b/transformer_lens/loading_from_pretrained.py
index aa544786f..ea93a22cf 100644
--- a/transformer_lens/loading_from_pretrained.py
+++ b/transformer_lens/loading_from_pretrained.py
@@ -1158,6 +1158,7 @@ def convert_hf_model_config(model_name: str, **kwargs):
             "normalization_type": "LN",
             "post_embedding_ln": True,
             "positional_embedding_type": "alibi",
+            "default_prepend_bos": False,
         }
     elif architecture == "GPT2LMHeadCustomModel":
         # santacoder