-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Landan/text encoder refactor #124
Landan/text encoder refactor #124
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some comments!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for refactoring!!
since we newly have the MultiTokenizer, we will want to change this line as well to access one of the tokenizers in the list:
text_captions = self.tokenizer.tokenizer.batch_decode(captions[:, 0, :], skip_special_tokens=True) |
Ooops, requested review before implementing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks for doing this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has a few new features:
SDXLTextEncoder
andSDXLTokenizer
to beMultiTextEncoder
andMultiTokenizer
, respectively. These two new classes enable using arbitrary combinations of OpenAI CLIP, OpenCLIP, E5 and possibly other models with minor tweaks.a. CONFIG CHANGE: Since there is no more SDXL specific tokenizer, a new argument
sdxl_conditioning
is added inimage_caption.py
to specify whether to generate SDXL-style conditioning in the dataset.attention_mask
in the text encoder.Minor:
qkv_clip = None