Throw error when using attn_in with grouped query attention #810

degenfabian · 2024-12-11T10:58:46Z

Description

When using attn_in with models that use GroupedQueryAttention, TransformerLens crashes because use_attn_in does not account for the different number of query and key/value heads when using GQA. For models with GQA use_split_qkv_input should be used instead, because it implements hooks for query, key and value heads and therefore can account for the different number of heads for each of them. This PR implements a more meaningful error message that informs the user to use split_qkv_input when working with models with GQA instead of use_attn_in.

This PR is not linked to a specific issue.

After adding a test case, it failed because of a beartype error that stated that rotary_base needs to be an integer instead of a float. I adjusted this accordingly in the configuration of google/gemma-2b

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

…eartype error

Fabian Degen added 3 commits December 11, 2024 01:45

raise AssertionError when use_attn_in is used with GQA

bcaf807

add test case for raising AssertionErromake format

d2f2421

rotary_base as int for gemma model to keep test from failing due to b…

32862b7

…eartype error

degenfabian changed the title ~~Throw error when using attn in with grouped query attention~~ Throw error when using attn_in with grouped query attention Dec 11, 2024

Fabian Degen added 2 commits December 13, 2024 18:56

Test on Qwen model instead of Gemma

588b2d7

Fixed beaertype error by converting rotary_base to int in Qwen config

f6ff577

bryce13950 merged commit d0d0750 into TransformerLensOrg:dev Dec 28, 2024
13 checks passed

degenfabian deleted the throw_error_when_using_attn_in_with_grouped_query_attention branch December 28, 2024 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throw error when using attn_in with grouped query attention #810

Throw error when using attn_in with grouped query attention #810

degenfabian commented Dec 11, 2024 •

edited

Loading

Throw error when using attn_in with grouped query attention #810

Throw error when using attn_in with grouped query attention #810

Conversation

degenfabian commented Dec 11, 2024 • edited Loading

Description

Type of change

Checklist:

degenfabian commented Dec 11, 2024 •

edited

Loading