-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to use E5 text encoder for SDXL #108
Conversation
@jazcollins if you could confirm i didn't horrendously break the tokenizers or sdxl code you wrote that would be great =) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me aside from some small suggestions to remove the use_e5
flag!
Also - do we want to truncate e5 tokenizer by the CLIP tokenizer's max_length? As the code is currently is currently written - yes, we have to, because we stack the tokenized outputs in the dataloader. However, we don't have to do that, and could potentially have different length tokenized outputs for the two text encoders if that makes sense to do.
I believe they have to be the same length because they're concatenated on the embedding dim not the sequence dim later on. We can't do concatenation unless we either pad clip to 512 or truncate e5 to 77. e5 uses wordpiece and clip uses BPE so I THINK the # of tokens per prompt should be similar. |
handled by #124 |
adds a simple switch to use e5 text encoder with SDXL. this is accomplished by splicing e5 into our joint text encoder class. Currently, this approach has some limitations:
to enable e5 adduse_e5: true
to both your dataset and model.after feedback,
model_name
andtokenizer_name_or_path
now need to be updated tosdxl-e5
to enable e5 training.