This repository is for the paper:
GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation
Can Qin 1, Ning Yu 2, Chen Xing 2, Shu Zhang 2, Zeyuan Chen 2, Stefano Ermon 3, Yun Fu 1, Caiming Xiong 2, Ran Xu 2
1 Northeastern University 2 Salesforce AI Research 3 Stanford Univerisy
Work done when Can Qin was an intern at Salesforce AI Research.
Setup the env of stable-diffusion first (need to wait a few minutes).
cd ./stable-diffusion
PIP_EXISTS_ACTION=w conda env create -f environment.yaml
conda activate gluegen
Then, install the packages for audioclip.
cd ./stable-diffusion/audioclip
pip install -r requirements.txt
pip install -U llvmlite==0.32.1
pip install -e .
Download the official checkpoints of SD v1 to ./checkpoints_all/checkpoint_sd_v1
as ./checkpoints_all/checkpoint_sd_v1/v1-5-pruned-emaonly.ckpt
(downloaded from https://huggingface.co/runwayml/stable-diffusion-v1-5).
Then follow the README.md (./stable-diffusion/audioclip/README.md
) of audioclip to download checkpoints to ./checkpoints_all/audioclip_checkpoint
as ./checkpoints_all/audioclip_checkpoint/AudioCLIP-Full-Training.pt
.
mkdir ./checkpoints_all/audioclip_checkpoint
cd ./checkpoints_all/audioclip_checkpoint
wget /~https://github.com/AndreyGuzhov/AudioCLIP/releases/download/v0.1/AudioCLIP-Full-Training.pt
Then download the pretrained gluenet checkpoints and save them to ./checkpoints_all/gluenet_checkpoint
:
bash download_gluenet_checkpoints.sh
Download audio dataset (urbansound8k) to ./data
as ./data/urbansound8k
bash download_us8k_data.sh
Download multilingual text dataset to ./data
bash download_multilingual_data.sh
Multilingual Stable Diffusion Inference:
cd stable-diffusion
python scripts/txt2img_demo_ml.py --prompt "下午的花园的印象派绘画" --plms --outdir outputs/text2img-multilingual --ckpt ../checkpoints_all/checkpoint_sd_v1/v1-5-pruned-emaonly.ckpt
python scripts/txt2img_demo_ml.py --prompt "Peinture impressionniste d'un jardin d'après-midi" --plms --outdir outputs/text2img-multilingual --ckpt ../checkpoints_all/checkpoint_sd_v1/v1-5-pruned-emaonly.ckpt
python scripts/txt2img_demo_ml.py --prompt "Pintura impresionista de un jardín de tarde" --plms --outdir outputs/text2img-multilingual --ckpt ../checkpoints_all/checkpoint_sd_v1/v1-5-pruned-emaonly.ckpt
python scripts/txt2img_demo_ml.py --prompt "午後の庭の印象派絵画" --plms --outdir outputs/text2img-multilingual --ckpt ../checkpoints_all/checkpoint_sd_v1/v1-5-pruned-emaonly.ckpt
python scripts/txt2img_demo_ml.py --prompt "Pittura impressionista di un giardino pomeridiano" --plms --outdir outputs/text2img-multilingual --ckpt ../checkpoints_all/checkpoint_sd_v1/v1-5-pruned-emaonly.ckpt
Sound-to-image Stable Diffusion Inference:
cd stable-diffusion
python scripts/sound2img_gluegen.py --plms --ckpt ../checkpoints_all/checkpoint_sd_v1/v1-5-pruned-emaonly.ckpt --outdir outputs/sound2img --config configs/stable-diffusion/v1-inference-trans-audioclip.yaml --scale 7.5 --n_iter 1 --audioclip_ckpt ../checkpoints_all/audioclip_checkpoint/AudioCLIP-Full-Training.pt
Sound-to-image GlueNet Training:
cd ./sound-gluenet
CUDA_VISIBLE_DEVICES=0 python train_gluenet_sound_text.py
Multilingual Text-to-image GlueNet Training:
cd ./multilingual-gluenet
CUDA_VISIBLE_DEVICES=0 python train_gluenet_multi.py --DATA_PATH_SRC ../data/WikiMatrix.en-zh.txt.en --DATA_PATH_TAR ../data/WikiMatrix.en-zh.txt.zh --DATA_PATH_SRC_1 ../data/laion-1M-trans-en-zh-cn-en.txt --DATA_PATH_TAR_1 ../data/laion-1M-trans-en-zh-cn-zh-cn.txt --tarLanguage Chinese
If you find this project useful for your research, please kindly cite our paper:
@article{qin2023gluegen,
title={GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation},
author={Qin, Can and Yu, Ning and Xing, Chen and Zhang, Shu and Chen, Zeyuan and Ermon, Stefano and Fu, Yun and Xiong, Caiming and Xu, Ran},
journal={arXiv preprint arXiv:2303.10056},
year={2023}
}
If you have any questions, please contact Can Qin.
Stable Diffusion /~https://github.com/CompVis/stable-diffusion
AudioCLIP /~https://github.com/AndreyGuzhov/AudioCLIP
WikiMatrix /~https://github.com/facebookresearch/LASER/tree/main/tasks/WikiMatrix