Skip to content

Polishing T5 inference #1698

Open
Open
@monatis

Description

Hi,
as it's reported in several issues (#1271, #1413), t5 still lacks some of workflows. Particularly, I'm trying to optimize T5 conditional generation. I started by porting code from BartSeq2SeqLM, but one immediate thing that caught my attention is that T5 uses its own MHA implementation which lacks the kv cache functionality implemented in CachedMultiHeadAttention. This can be achieved in two ways:

  1. Add rel_attn_bias support to CachedMultiHeadAttention, or
  2. Add kv cache support to T5MultiHeadAttention.
    I'm also planning to upstream what I came up with. The question is, which one would you prefer, and which one do you think would be easier to hack? I'm more for the option 2, but is there anything I'm missing?

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions