AttnDreamBooth (NeurIPS 2024)

Official Implementation of "AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image" by Lianyu Pang, Jian Yin, Baoquan Zhao, Qing Li and Xudong Mao.

Abstract

Recent advances in text-to-image models have enabled high-quality personalized image synthesis of user-provided concepts with flexible textual control. In this work, we analyze the limitations of two primary techniques in text-to-image personalization: Textual Inversion and DreamBooth. When integrating the learned concept into new prompts, Textual Inversion tends to overfit the concept, while DreamBooth often overlooks it. We attribute these issues to the incorrect learning of the embedding alignment for the concept. We introduce AttnDreamBooth, a novel approach that addresses these issues by separately learning the embedding alignment, the attention map, and the subject identity in different training stages. We also introduce a cross-attention map regularization term to enhance the learning of the attention map. Our method demonstrates significant improvements in identity preservation and text alignment compared to the baseline methods. Code will be made publicly available.

Setup

Our code is primarily based on Diffusers-DreamBooth and relies on the diffusers library.

Set up the Environment

To set up the environment, run the following commands:

conda env create -f environment.yaml
conda activate AttnDreamBooth

Initialize Accelerate

Initialize an Accelerate environment with:

accelerate config

Logging into Huggingface

To use stabilityai/stable-diffusion-2-1-base model, you may have to log into Huggingface as follows

Use huggingface-cli to log in via the Terminal
Input your token extracted from Token

Download

Image Dataset

Our datasets were originally collected and are provided by Textual Inversion and DreamBooth.

Pretrained Checkpoints

We provide pretrained checkpoints for two objects. You can download the sample images and their corresponding pretrained checkpoints.

Concepts	Samples	Models
child doll	images	model
grey sloth	images	model

Usage

Training

You can run the bash_script/train_attndreambooth.sh script to train your own model. Before executing the training command, ensure that you have configured the following parameters in train_attndreambooth.sh:

Line 6: output_dir. This is the directory where the fine-tuned model will be saved.
Line 8: instance_dir. This is the directory containing the images of the target concept.
Line 10: category. This is the category of the target concept.

For example, to train the concept child doll in the Pretrained Checkpoints, you need to set the parameters as follows.

output_dir="./models/"
instance_dir="./dataset/child_doll"
category="doll"

To run the training script, use the following command.

bash bash_script/train_attndreambooth.sh

Notes:

All training arguments can be found in train_attndreambooth.sh and are set to their defaults according to the official paper.
Please refer to train_attndreambooth.sh and train_attndreambooth.py for more details on all parameters.

Fast Version of AttnDreamBooth

We have explored a simple yet effective strategy to reduce the training time of our method by increasing the learning rate while simultaneously decreasing both the training steps and the batch size for our third training stage, which significantly reduces the training time from 20 minutes to 6 minutes on average. And We observed that the fast version model performs very closely to the original model for short prompts, but it slightly under-performs for complex prompts.

To use the fast version of AttnDreamBooth, set the config of stage 3 in bash_script\train_attndreambooth.sh as follows.

unet_learning_rate="1e-5"
unet_save_step=200
unet_train_steps=200
unet_attn_mean=2
unet_attn_var=5
unet_bs=4
unet_ga=1
unet_validation_steps=100

Inference

You can run the bash_script/inference.sh script to generate images. Before executing the inference command, ensure that you have configured the following parameters in inference.sh:

Line 2: learned_embedding_path. This is the path to the embeddings learned in the first stage.
Line 4: checkpoint_path. This is the path to the fine-tuned models trained in the third stage.
Line 6: category. This is the category of the target concept.
Line 8: output_dir. This is the directory where the generated images will be saved.

To run the inference, use the following command.

bash bash_script/inference.sh

Notes:

If you did not set --only_save_checkpoints during the training phase, you can specify --pretrained_model_name_or_path as the path to the full model, and then omit --checkpoint_path.
We offer learned embeddings and models for two objects here for direct experimentation.
For convenience, you can either specify a path of a text file with --prompt_file, where each line contains a prompt. For example:

A photo of a {}
A {} floats on the water
A {} latte art

Specify the concept using {}, and we will replace it with the concept’s placeholder token and the specified category.
The resulting images will be saved in the directory {save_dir}/{prompt}
For detailed information on all parameters, please consult inference.py and inference.sh.

Metrics

We use the same evaluation protocol as used in Textual Inversion.

Results of Our Method

Acknowledgements

Our code mainly bases on Diffusers-DreamBooth. A huge thank you to the authors for their valuable contributions.

References

@article{pang2024attndreambooth,
  title={AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation},
  author={Pang, Lianyu and Yin, Jian and Zhao, Baoquan and Wu, Feize and Wang, Fu Lee and Li, Qing and Mao, Xudong},
  journal={arXiv preprint arXiv:2406.05000},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
bash_script		bash_script
modules		modules
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
inference.py		inference.py
train_attndreambooth.py		train_attndreambooth.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AttnDreamBooth (NeurIPS 2024)

Abstract

Setup

Set up the Environment

Initialize Accelerate

Logging into Huggingface

Download

Image Dataset

Pretrained Checkpoints

Usage

Training

Fast Version of AttnDreamBooth

Inference

Metrics

Results of Our Method

Acknowledgements

References

About

Releases

Packages

Languages

License

lyuPang/AttnDreamBooth

Folders and files

Latest commit

History

Repository files navigation

AttnDreamBooth (NeurIPS 2024)

Abstract

Setup

Set up the Environment

Initialize Accelerate

Logging into Huggingface

Download

Image Dataset

Pretrained Checkpoints

Usage

Training

Fast Version of AttnDreamBooth

Inference

Metrics

Results of Our Method

Acknowledgements

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages