HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior

This image is generated by ChatGPT image generator.

HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior

This is the official repository of "HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior".

Li-Yuan Tsao, Hao-Wei Chen, Hao-Wei Chung, Deqing Sun, Chun-Yi Lee, Kelvin C.K. Chan, Ming-Hsuan Yang

We also release our evaluation code at to facilitate fair comparisons between (real-world) image super-resolution papers.

Overview

Existing pre-trained text-to-image diffusion model-based Real-ISR methods may produce unintended results due to noisy text prompts and their lack of spatial information. In this paper, we present HoliSDiP, a framework that leverages semantic segmentation to provide both precise textual and spatial guidance for diffusion-based Real-ISR. Our method employs semantic labels as concise text prompts while introducing dense semantic guidance through segmentation masks and our proposed Segmentation-CLIP Map, achieving significant improvement in image quality across various Real-ISR scenarios through reduced prompt noise and enhanced spatial control.

🔧 Dependencies and Installation

Requirements

python 3.8.10
cuda 12.1

Dependencies

Basic dependency

# Here we use conda for example
conda create -n HoliSDiP python=3.8
conda activate HoliSDiP

pip install torch==2.3.1 torchvision==0.18.1
pip install -r requirements.txt

Install detectron2 and Mask2Former (Reference: /~https://github.com/facebookresearch/Mask2Former)

# Install detectron2
git clone /~https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2

# clone and install Mask2Former
git clone /~https://github.com/facebookresearch/Mask2Former.git
cd Mask2Former
pip install git+/~https://github.com/cocodataset/panopticapi.git
pip install -r requirements.txt
cd mask2former/modeling/pixel_decoder/ops
python setup.py build install
cd ../../../../..

⚡ Pre-trained Models

1. Semantic Segmentation Model

Install the Mask2Former semantic segmentation model from their Model Zoo, we use Mask2Former with Swin-L backbone, pre-trained on ADE20K dataset. model link (install this)
Put the model in preset/models/mask2former

2. SD-2-base Model

Download the SD-2-base model from HuggingFace.
Put the folder stable-diffusion-2-base in preset/models/

3. Pre-trained Image Encoder

We use the pre-trained DAPE encoder from SeeSR. Please download the DAPE.pth from their Google Drive and also download the RAM model from this link.
Put both DAPE.pth and ram_swin_large_14m.pth in preset/models.

4. Download HoliSDiP Pre-trained Model

Please download our pre-trained model from the Google Drive.
Put the folder HoliSDiP in preset/models/

🎑 Preparing Datasets

Training data

We use the LSDIR dataset and the first 10k images from the FFHQ dataset.
Modify the gt_path in dataloaders/config.yml to the paths of these datasets.

Testing data

You can download the testing sets from the HuggingFace page of StableSR.

🚀 Launch Your Training

The checkpoints will be saved at experiments/<EXP_NAME> by specifing the --output_dir argument.

CUDA_VISIBLE_DEVICES="0,1,2,3" accelerate launch train.py  --output_dir experiments/<EXP_NAME>  --enable_xformers_memory_efficient_attention --train_batch_size=4 --gradient_accumulation_steps=2

🎡 Run Testing

Specify the --image_path argument to the path of your testing data.
The output images will be saved at results/<OUTPUT_DIR>/samples by specifing the --output_dir argument.
We also provide the code of saving the segmentation masks, which are stored at results/<OUTPUT_DIR>/masks, with results/<OUTPUT_DIR>/masks_meta showing the labels on the masks.
The prompts generated by our Semantic Label-Based Prompting (SLBP) are stored at results/<OUTPUT_DIR>/masks_meta`

python test.py --holisdip_model_path preset/models/HoliSDiP  --image_path <TEST_IMG_PATH> --output_dir results/<OUTPUT_DIR> --save_prompts

📬 Contact

If you have any question, please feel free to send message to lytsao@gapp.nthu.edu.tw

📈 Citation

If you find our work helpful for your research, we would greatly appreciate your assistance in sharing it with the community and citing it using the following BibTex. Thank you for supporting our research.

@article{tsao2024holisdip,
  title={HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior},
  author={Tsao, Li-Yuan and Chen, Hao-Wei and Chung, Hao-Wei and Sun, Deqing and Lee, Chun-Yi and Chan, Kelvin CK and Yang, Ming-Hsuan},
  journal={arXiv preprint arXiv:2411.18662},
  year={2024}
}

🌟 Acknowledgements

Our project is built on SeeSR and Mask2Former, with some codes borrowed from SPADE. We appriciate their amazing works that advance this community.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
asserts		asserts
basicsr		basicsr
dataloaders		dataloaders
figs		figs
models		models
pipelines		pipelines
ram		ram
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior

Overview

🔧 Dependencies and Installation

⚡ Pre-trained Models

🎑 Preparing Datasets

Training data

Testing data

🚀 Launch Your Training

🎡 Run Testing

📬 Contact

📈 Citation

🌟 Acknowledgements

About

Releases

Packages

Languages

License

liyuantsao/HoliSDiP

Folders and files

Latest commit

History

Repository files navigation

HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior

Overview

🔧 Dependencies and Installation

⚡ Pre-trained Models

🎑 Preparing Datasets

Training data

Testing data

🚀 Launch Your Training

🎡 Run Testing

📬 Contact

📈 Citation

🌟 Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages