Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove einsum in logit_attrs in ActivationCache #788

Merged

Conversation

degenfabian
Copy link
Contributor

Description

There are known accuracy issues which are to some extent caused by the usage of einsum across large parts of the codebase. This PR removes one of these usages in the logic_attrs function in ActivationCache.py. There is no specific issue associated with this PR, although removing a lot of the einsum usages could help in removing implementation inaccuracies addressed in many issues. I have not added specific test cases for this change, but I ran all the models (except those that required authorization and the paid_cpu models) in the Colab_Compatibility notebook and have not run into any compatibility issues introduced by this change. Additionally, the changed part is executed 8 times across the unit and acceptance tests.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

@bryce13950 bryce13950 merged commit a695f79 into TransformerLensOrg:dev Nov 26, 2024
13 checks passed
@degenfabian degenfabian deleted the remove_einsum_in_logit_attrs branch November 27, 2024 01:11
degenfabian added a commit to degenfabian/TransformerLens that referenced this pull request Dec 4, 2024
Co-authored-by: Fabian Degen <fabian.degen@mytum.de>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
bryce13950 added a commit that referenced this pull request Dec 5, 2024
* fix prepend_bos to False by default for bloom model family

* add comment

* edit documentation

* fix wrong expected value for bloom-560m model loss

* fix expected loss value for bloom model computed with google colab

* set prepend_bos to user value, then to value in model config and then default to true

* fix format

* remove log points in test_hooked_transformer

* remove einsum in forward pass in AbstractAttention (#783)

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Fabian Degen <fabian.degen@mytum.de>

* Colab compatibility bug fixes (#794)

* call functions on model object instead of model string in run_encoder_decoder_set

* remove generate call in run_encoder_decoder_set because HookedEncoderDecoder doesn't support generate yet

* add testing function for HookedEncoders and stop testing BERT as HookedTransformer

* clear cell output to prevent test from failing

* add comment about bert working with free version of colab

---------

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>

* remove einsum usage from create_alibi_bias function in AbstractAttention (#781)

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Fabian Degen <fabian.degen@mytum.de>

* updated token location (#797)

* remove einsum from apply_causal_mask in abstract_attention (#782)

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Fabian Degen <fabian.degen@mytum.de>

* clarified arguments a bit for hook_points (#799)

* remove einsum in logit_attrs in ActivationCache (#788)

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* Remove einsum in compute_head_results in ActivationCache (#789)

* remove einsum in compute_head_results in ActivationCache

* ran format

---------

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* Remove einsum usage in refactor_factored_attn_matrices in HookedTransformer (#791)

* remove einsum usage in refactor_factored_attn_matrices in HookedTransformer

* fix format

---------

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* Remove einsum usage in _get_w_in_matrix in SVDInterpreter (#792)

* remove einsum usage in _get_w_in_matrix in SVDInterpreter

* fix format

---------

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>

* Remove einsum usage in forward function of BertMLMHead (#793)

* remove einsam usage in forward function of BertMLMHead

* fix format

---------

Co-authored-by: Fabian Degen <fabian.degen@mytum.de>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* Set default_prepend_bos to false in Bloom configuration

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Fabian Degen <fabian.degen@mytum.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants