This is the official repository of "HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior".
Li-Yuan Tsao, Hao-Wei Chen, Hao-Wei Chung, Deqing Sun, Chun-Yi Lee, Kelvin C.K. Chan, Ming-Hsuan Yang
- We also release our evaluation code at
to facilitate fair comparisons between (real-world) image super-resolution papers.
Existing pre-trained text-to-image diffusion model-based Real-ISR methods may produce unintended results due to noisy text prompts and their lack of spatial information. In this paper, we present HoliSDiP, a framework that leverages semantic segmentation to provide both precise textual and spatial guidance for diffusion-based Real-ISR. Our method employs semantic labels as concise text prompts while introducing dense semantic guidance through segmentation masks and our proposed Segmentation-CLIP Map, achieving significant improvement in image quality across various Real-ISR scenarios through reduced prompt noise and enhanced spatial control.
Requirements
- python 3.8.10
- cuda 12.1
Dependencies
- Basic dependency
# Here we use conda for example
conda create -n HoliSDiP python=3.8
conda activate HoliSDiP
pip install torch==2.3.1 torchvision==0.18.1
pip install -r requirements.txt
- Install detectron2 and Mask2Former (Reference: /~https://github.com/facebookresearch/Mask2Former)
# Install detectron2
git clone /~https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2
# clone and install Mask2Former
git clone /~https://github.com/facebookresearch/Mask2Former.git
cd Mask2Former
pip install git+/~https://github.com/cocodataset/panopticapi.git
pip install -r requirements.txt
cd mask2former/modeling/pixel_decoder/ops
python setup.py build install
cd ../../../../..
1. Semantic Segmentation Model
- Install the Mask2Former semantic segmentation model from their Model Zoo, we use Mask2Former with Swin-L backbone, pre-trained on ADE20K dataset. model link (install this)
- Put the model in
preset/models/mask2former
2. SD-2-base Model
- Download the SD-2-base model from HuggingFace.
- Put the folder
stable-diffusion-2-base
inpreset/models/
3. Pre-trained Image Encoder
- We use the pre-trained DAPE encoder from SeeSR. Please download the
DAPE.pth
from their Google Drive and also download the RAM model from this link. - Put both
DAPE.pth
andram_swin_large_14m.pth
inpreset/models
.
4. Download HoliSDiP Pre-trained Model
- Please download our pre-trained model from the Google Drive.
- Put the folder
HoliSDiP
inpreset/models/
- We use the LSDIR dataset and the first 10k images from the FFHQ dataset.
- Modify the
gt_path
indataloaders/config.yml
to the paths of these datasets.
- You can download the testing sets from the HuggingFace page of StableSR.
- The checkpoints will be saved at
experiments/<EXP_NAME>
by specifing the--output_dir
argument.
CUDA_VISIBLE_DEVICES="0,1,2,3" accelerate launch train.py --output_dir experiments/<EXP_NAME> --enable_xformers_memory_efficient_attention --train_batch_size=4 --gradient_accumulation_steps=2
- Specify the
--image_path
argument to the path of your testing data. - The output images will be saved at
results/<OUTPUT_DIR>/samples
by specifing the--output_dir
argument. - We also provide the code of saving the segmentation masks, which are stored at
results/<OUTPUT_DIR>/masks
, withresults/<OUTPUT_DIR>/masks_meta
showing the labels on the masks. - The prompts generated by our Semantic Label-Based Prompting (SLBP) are stored at results/<OUTPUT_DIR>/masks_meta`
python test.py --holisdip_model_path preset/models/HoliSDiP --image_path <TEST_IMG_PATH> --output_dir results/<OUTPUT_DIR> --save_prompts
If you have any question, please feel free to send message to lytsao@gapp.nthu.edu.tw
If you find our work helpful for your research, we would greatly appreciate your assistance in sharing it with the community and citing it using the following BibTex. Thank you for supporting our research.
@article{tsao2024holisdip,
title={HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior},
author={Tsao, Li-Yuan and Chen, Hao-Wei and Chung, Hao-Wei and Sun, Deqing and Lee, Chun-Yi and Chan, Kelvin CK and Yang, Ming-Hsuan},
journal={arXiv preprint arXiv:2411.18662},
year={2024}
}
Our project is built on SeeSR and Mask2Former, with some codes borrowed from SPADE. We appriciate their amazing works that advance this community.