Add option to use E5 text encoder for SDXL #108

A-Jacobson · 2024-01-03T19:39:13Z

adds a simple switch to use e5 text encoder with SDXL. this is accomplished by splicing e5 into our joint text encoder class. Currently, this approach has some limitations:

I wanted to avoid changing the API + building out a large registry or text encoders so only e5-large-v2 is currently supported. (t5 and other text encoders i tester were not supported by automodel/autotokenizer so I opted to keep it simple for now and leave them out.
decided to stick with openclip vs openai clip as the HF model supports the projection layer needed for SDXL out of the box
truncate sequence max-length to 77 (clip max len) vs 512 (e5 max length).

~~to enable e5 add use_e5: true to both your dataset and model.~~

after feedback, model_name and tokenizer_name_or_path now need to be updated to sdxl-e5 to enable e5 training.

A-Jacobson · 2024-01-03T22:14:23Z

@jazcollins if you could confirm i didn't horrendously break the tokenizers or sdxl code you wrote that would be great =)

diffusion/datasets/image_caption.py

diffusion/models/models.py

jazcollins

Overall looks good to me aside from some small suggestions to remove the use_e5 flag!

Also - do we want to truncate e5 tokenizer by the CLIP tokenizer's max_length? As the code is currently is currently written - yes, we have to, because we stack the tokenized outputs in the dataloader. However, we don't have to do that, and could potentially have different length tokenized outputs for the two text encoders if that makes sense to do.

diffusion/datasets/image_caption.py

diffusion/models/models.py

A-Jacobson · 2024-01-08T22:15:14Z

Also - do we want to truncate e5 tokenizer by the CLIP tokenizer's max_length? As the code is currently is currently written - yes, we have to, because we stack the tokenized outputs in the dataloader. However, we don't have to do that, and could potentially have different length tokenized outputs for the two text encoders if that makes sense to do.

I believe they have to be the same length because they're concatenated on the embedding dim not the sequence dim later on. We can't do concatenation unless we either pad clip to 512 or truncate e5 to 77. e5 uses wordpiece and clip uses BPE so I THINK the # of tokens per prompt should be similar.

diffusion/models/models.py

A-Jacobson · 2024-03-15T00:56:33Z

handled by #124

A-Jacobson added 18 commits November 22, 2023 01:19

add hacky e5 option to sdxl

6f36711

fix masking broadcasting

38f3231

use value, not batch for masking

0ddbc7f

use e5 instead of overai clip or projection layer

4856b49

reverse tokenizer dicts for merging

39307a6

change attetion conditioning dim in unet for e5

f65412d

fix conditioning dims for e5 in unet

94c97af

add tests for sdxl foward and sdxl_e5 forward

bc506ef

fix sdxl_forward naming

30ad22d

print conditioning dim

f3f9a17

new config key

23d7c21

swap back proj layer dim

972971e

only change xatten dim

eb175a7

try to fix masking

de8b7c1

Merge branch 'main' into e5

227114c

add docstrings

42a9923

pre-commit hooks

1ada4b8

update sdxl dummy batch

b5badcd

A-Jacobson changed the title ~~Add option to use E5 text encoder to SDXL~~ Add option to use E5 text encoder for SDXL Jan 3, 2024

pre-commit

80dc844

A-Jacobson requested review from coryMosaicML and jazcollins January 3, 2024 22:13

Skylion007 reviewed Jan 5, 2024

View reviewed changes

diffusion/datasets/image_caption.py Outdated Show resolved Hide resolved

Skylion007 reviewed Jan 5, 2024

View reviewed changes

diffusion/models/models.py Outdated Show resolved Hide resolved

jazcollins reviewed Jan 8, 2024

View reviewed changes

diffusion/datasets/image_caption.py Outdated Show resolved Hide resolved

diffusion/datasets/image_caption.py Outdated Show resolved Hide resolved

diffusion/models/models.py Outdated Show resolved Hide resolved

A-Jacobson added 3 commits January 29, 2024 17:08

remove boolean flag, replace with sdxl-e5 model_name and tokenizer_path

4b3237d

update sdxl tets

fdb99a4

pull everything but e5 using standing sdxl model_name

af9a06d

A-Jacobson added 2 commits January 29, 2024 18:34

fix sdxl boolean in image caption dataset

d6a7191

styling

d80dc75

coryMosaicML approved these changes Feb 15, 2024

View reviewed changes

Merge branch 'main' into e5

31e0570

A-Jacobson commented Feb 26, 2024

View reviewed changes

diffusion/models/models.py Outdated Show resolved Hide resolved

Update diffusion/models/models.py

37693a4

Landanjs mentioned this pull request Mar 6, 2024

Landan/text encoder refactor #124

Merged

A-Jacobson closed this Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to use E5 text encoder for SDXL #108

Add option to use E5 text encoder for SDXL #108

A-Jacobson commented Jan 3, 2024 •

edited

Loading

A-Jacobson commented Jan 3, 2024

jazcollins left a comment

A-Jacobson commented Jan 8, 2024

A-Jacobson commented Mar 15, 2024

Add option to use E5 text encoder for SDXL #108

Add option to use E5 text encoder for SDXL #108

Conversation

A-Jacobson commented Jan 3, 2024 • edited Loading

A-Jacobson commented Jan 3, 2024

jazcollins left a comment

Choose a reason for hiding this comment

A-Jacobson commented Jan 8, 2024

A-Jacobson commented Mar 15, 2024

A-Jacobson commented Jan 3, 2024 •

edited

Loading