Skip to content

[CVPR 2025] Official implementation of StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

Notifications You must be signed in to change notification settings

Westlake-AGI-Lab/StyleStudio

Repository files navigation

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
CVPR 2025

Mingkun Lei1    Xue Song2    Beier Zhu1, 3    Hao Wang4    Chi Zhang1✉
1AGI Lab, Westlake University,  2Fudan University,  3Nanyang Technological University 
4The Hong Kong University of Science and Technology (Guangzhou) 
   


News and Update

  • [2024.12.12] 🔥🔥We release the code.
  • [2024.12.19] 📝📝We have summarized the recent developments in style transfer. And we will continue to update.

Abstract

Text-driven style transfer aims to merge the style of a reference image with content described by a text prompt. Recent advancements in text-to-image models have improved the nuance of style transformations, yet significant challenges remain, particularly with overfitting to reference styles, limiting stylistic control, and misaligning with textual content. In this paper, we propose three complementary strategies to address these issues. First, we introduce a cross-modal Adaptive Instance Normalization (AdaIN) mechanism for better integration of style and text features, enhancing alignment. Second, we develop a Style-based Classifier-Free Guidance (SCFG) approach that enables selective control over stylistic elements, reducing irrelevant influences. Finally, we incorporate a teacher model during early generation stages to stabilize spatial layouts and mitigate artifacts. Our extensive evaluations demonstrate significant improvements in style transfer quality and alignment with textual prompts. Furthermore, our approach can be integrated into existing style transfer frameworks without fine-tuning.

Getting Started

1.Clone the code and prepare the environment

git clone /~https://github.com/Westlake-AGI-Lab/StyleStudio
cd StyleStudio

# create env using conda
conda create -n StyleStudio python=3.10
conda activate StyleStudio

# install dependencies with pip
# for Linux and Windows users
pip install -r requirements.txt

2.Run StyleStudio

Please note: Our solution is designed to be fine-tuning free and can be combined with different methods.

Parameter Explanation

  • adainIP using the cross modal AdaIN
  • fuSAttn hijack Self-Attention Map in the Teacher Model
  • fuAttn hijack Cross-Attention Map in the Teacher Model
  • end_fusion define when the Teacher Model stops participating
  • prompt specified prompt for generating the image
  • style_path path to the style image or folder
  • neg_style_path path to the negative style image

Integration with CSGO

Follow CSGO to download pre-trained checkpoints.

This is an example of usage: as the value of end_fusion increases, the style gradually diminishes. If the num_inference_steps are set to 50, we recommend setting end_fusion between 10 and 20. Typically, end_fusion should be set within the first 1/5 to 1/3 of the total num_inference_steps.

If you find that layout stability is not satisfactory, consider increasing the duration of the Teacher Model's involvement.

# Generate a single stylized image
# Use a specific text prompt and style image path
python infer_StyleStudio.py \
  --prompt "A red apple" \
  --style_path "assets/style1.jpg" \
  --adainIP \ # Enable Cross-Modal AdaIN
  --fuSAttn \ # Enable Teacher Model with Self Attention Map
  --end_fusion 20 \ # Define when the Teacher Model stop participating
  --num_inference_steps 50

# Check layout stability across different style images
# With the same text prompt and a set of style images
python infer_StyleStudio_layout_stability.py \
    --prompt "A red apple" \
    --style_path "path/to/style_images_folder" \
    --adainIP \ # Enable Cross-Modal AdaIN
    --fuSAttn \ # Enable Teacher Model with Self Attention Map
    --end_fusion 20 \ # Define when the Teacher Model stop participating
    --num_inference_steps 50
Note
  1. As shown in Figure 15 of the paper, employing a Cross Attention Map in the Teacher Model does not ensure layout stability. We have also provided an interface fuAttn and encourage everyone to experiment with it.
  2. To ensure layout stability and consistency for the same prompt under different style images, it is important to maintain consistency in the initial noise $z_0$ during experiments. For more details on this aspect, refer to infer_StyleStudio_layout_stability.py.

This is an example of using Style-based Classifier-Free Guidance.

python infer_StyleStudio.py \
  --prompt "A red apple" \
  --style_path "assets/style2.jpg" \
  --neg_style_path "assets/neg_style2.jpg" \

Some recommendations for generating Negative Style Images.

  • You can use ControlNet Canny for generation.
  • To ensure the generated images are more realistic, you can use weights from Civitai or Huggingface that are better suited for generating realistic image effects. We use the RealVisXL_V4.0.

To generate negative style images, we provide a code implementation in example_create_neg_style.py for your reference.

Integration with InstantStyle

Follow InstantStyle to download pre-trained checkpoints.

python infer_InstantStyle.py \
  --prompt "A red apple" \
  --style_path "assets/style1.jpg" \
  --adainIP \ # Enable Cross-Modal AdaIN
  --fuSAttn \ # Enable Teacher Model with Self Attention Map
  --end_fusion 20 \ # Define when the Teacher Model stop participating
  --num_inference_steps 50

Integration with StyleCrafter

Follow StyleCrafter to download pre-trained checkpoints.

We encourage you to integrate the Teacher Model with StyleCrafter. This combination, as shown in our experiments, not only helps maintain layout stability but also effectively reduces content leakage.

cd stylecrafter_sdxl

python stylecrafter_teacherModel.py \
  --config config/infer/style_crafter_sdxl.yaml \
  --style_path "../assets/style1.jpg" \
  --prompt "A red apple" \
  --scale 0.5 \
  --num_samples 2 \
  --end_fusion 10 # Define when the Teacher Model stop participating

3. Demo

To run a local demo of the project, run the following:

python gradio/app.py

Related Links

BibTeX

If you find our repo helpful, please consider leaving a star or cite our paper :)

@misc{lei2024stylestudiotextdrivenstyletransfer,
      title={StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements}, 
      author={Mingkun Lei and Xue Song and Beier Zhu and Hao Wang and Chi Zhang},
      year={2024},
      eprint={2412.08503},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.08503}, 
}

📭 Contact

If you have any comments or questions, feel free to contact Mingkun Lei.

About

[CVPR 2025] Official implementation of StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages