From b5bd89c8d8ac4d05d81ee17f129ca91a4ed991bd Mon Sep 17 00:00:00 2001 From: Fabian Degen <106864199+degenfabian@users.noreply.github.com> Date: Wed, 4 Dec 2024 17:13:43 -0800 Subject: [PATCH] Set default_prepend_bos to False in Bloom model configuration (#806) * fix prepend_bos to False by default for bloom model family * add comment * edit documentation * fix wrong expected value for bloom-560m model loss * fix expected loss value for bloom model computed with google colab * set prepend_bos to user value, then to value in model config and then default to true * fix format * remove log points in test_hooked_transformer * remove einsum in forward pass in AbstractAttention (#783) Co-authored-by: Bryce Meyer Co-authored-by: Fabian Degen * Colab compatibility bug fixes (#794) * call functions on model object instead of model string in run_encoder_decoder_set * remove generate call in run_encoder_decoder_set because HookedEncoderDecoder doesn't support generate yet * add testing function for HookedEncoders and stop testing BERT as HookedTransformer * clear cell output to prevent test from failing * add comment about bert working with free version of colab --------- Co-authored-by: Fabian Degen * remove einsum usage from create_alibi_bias function in AbstractAttention (#781) Co-authored-by: Bryce Meyer Co-authored-by: Fabian Degen * updated token location (#797) * remove einsum from apply_causal_mask in abstract_attention (#782) Co-authored-by: Bryce Meyer Co-authored-by: Fabian Degen * clarified arguments a bit for hook_points (#799) * remove einsum in logit_attrs in ActivationCache (#788) Co-authored-by: Fabian Degen Co-authored-by: Bryce Meyer * Remove einsum in compute_head_results in ActivationCache (#789) * remove einsum in compute_head_results in ActivationCache * ran format --------- Co-authored-by: Fabian Degen Co-authored-by: Bryce Meyer * Remove einsum usage in refactor_factored_attn_matrices in HookedTransformer (#791) * remove einsum usage in refactor_factored_attn_matrices in HookedTransformer * fix format --------- Co-authored-by: Fabian Degen Co-authored-by: Bryce Meyer * Remove einsum usage in _get_w_in_matrix in SVDInterpreter (#792) * remove einsum usage in _get_w_in_matrix in SVDInterpreter * fix format --------- Co-authored-by: Fabian Degen * Remove einsum usage in forward function of BertMLMHead (#793) * remove einsam usage in forward function of BertMLMHead * fix format --------- Co-authored-by: Fabian Degen Co-authored-by: Bryce Meyer * Set default_prepend_bos to false in Bloom configuration --------- Co-authored-by: Bryce Meyer Co-authored-by: Fabian Degen --- transformer_lens/loading_from_pretrained.py | 1 + 1 file changed, 1 insertion(+) diff --git a/transformer_lens/loading_from_pretrained.py b/transformer_lens/loading_from_pretrained.py index aa544786f..ea93a22cf 100644 --- a/transformer_lens/loading_from_pretrained.py +++ b/transformer_lens/loading_from_pretrained.py @@ -1158,6 +1158,7 @@ def convert_hf_model_config(model_name: str, **kwargs): "normalization_type": "LN", "post_embedding_ln": True, "positional_embedding_type": "alibi", + "default_prepend_bos": False, } elif architecture == "GPT2LMHeadCustomModel": # santacoder