StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
CVPR 2025

Mingkun Lei¹ Xue Song² Beier Zhu^{1, 3} Hao Wang⁴ Chi Zhang^1✉
¹AGI Lab, Westlake University, ²Fudan University, ³Nanyang Technological University
⁴The Hong Kong University of Science and Technology (Guangzhou)

News and Update

[2024.12.12] 🔥🔥We release the code.
[2024.12.19] 📝📝We have summarized the recent developments in style transfer. And we will continue to update.

Abstract

Text-driven style transfer aims to merge the style of a reference image with content described by a text prompt. Recent advancements in text-to-image models have improved the nuance of style transformations, yet significant challenges remain, particularly with overfitting to reference styles, limiting stylistic control, and misaligning with textual content. In this paper, we propose three complementary strategies to address these issues. First, we introduce a cross-modal Adaptive Instance Normalization (AdaIN) mechanism for better integration of style and text features, enhancing alignment. Second, we develop a Style-based Classifier-Free Guidance (SCFG) approach that enables selective control over stylistic elements, reducing irrelevant influences. Finally, we incorporate a teacher model during early generation stages to stabilize spatial layouts and mitigate artifacts. Our extensive evaluations demonstrate significant improvements in style transfer quality and alignment with textual prompts. Furthermore, our approach can be integrated into existing style transfer frameworks without fine-tuning.

Getting Started

1.Clone the code and prepare the environment

git clone /~https://github.com/Westlake-AGI-Lab/StyleStudio
cd StyleStudio

# create env using conda
conda create -n StyleStudio python=3.10
conda activate StyleStudio

# install dependencies with pip
# for Linux and Windows users
pip install -r requirements.txt

2.Run StyleStudio

Please note: Our solution is designed to be fine-tuning free and can be combined with different methods.

Parameter Explanation

adainIP using the cross modal AdaIN
fuSAttn hijack Self-Attention Map in the Teacher Model
fuAttn hijack Cross-Attention Map in the Teacher Model
end_fusion define when the Teacher Model stops participating
prompt specified prompt for generating the image
style_path path to the style image or folder
neg_style_path path to the negative style image

Integration with CSGO

Follow CSGO to download pre-trained checkpoints.

This is an example of usage: as the value of end_fusion increases, the style gradually diminishes. If the num_inference_steps are set to 50, we recommend setting end_fusion between 10 and 20. Typically, end_fusion should be set within the first 1/5 to 1/3 of the total num_inference_steps.

If you find that layout stability is not satisfactory, consider increasing the duration of the Teacher Model's involvement.

# Generate a single stylized image
# Use a specific text prompt and style image path
python infer_StyleStudio.py \
  --prompt "A red apple" \
  --style_path "assets/style1.jpg" \
  --adainIP \ # Enable Cross-Modal AdaIN
  --fuSAttn \ # Enable Teacher Model with Self Attention Map
  --end_fusion 20 \ # Define when the Teacher Model stop participating
  --num_inference_steps 50

# Check layout stability across different style images
# With the same text prompt and a set of style images
python infer_StyleStudio_layout_stability.py \
    --prompt "A red apple" \
    --style_path "path/to/style_images_folder" \
    --adainIP \ # Enable Cross-Modal AdaIN
    --fuSAttn \ # Enable Teacher Model with Self Attention Map
    --end_fusion 20 \ # Define when the Teacher Model stop participating
    --num_inference_steps 50

Note

As shown in Figure 15 of the paper, employing a Cross Attention Map in the Teacher Model does not ensure layout stability. We have also provided an interface fuAttn and encourage everyone to experiment with it.
To ensure layout stability and consistency for the same prompt under different style images, it is important to maintain consistency in the initial noise $z_0$ during experiments. For more details on this aspect, refer to infer_StyleStudio_layout_stability.py.

This is an example of using Style-based Classifier-Free Guidance.

python infer_StyleStudio.py \
  --prompt "A red apple" \
  --style_path "assets/style2.jpg" \
  --neg_style_path "assets/neg_style2.jpg" \

Some recommendations for generating Negative Style Images.

You can use ControlNet Canny for generation.
To ensure the generated images are more realistic, you can use weights from Civitai or Huggingface that are better suited for generating realistic image effects. We use the RealVisXL_V4.0.

To generate negative style images, we provide a code implementation in example_create_neg_style.py for your reference.

Integration with InstantStyle

Follow InstantStyle to download pre-trained checkpoints.

python infer_InstantStyle.py \
  --prompt "A red apple" \
  --style_path "assets/style1.jpg" \
  --adainIP \ # Enable Cross-Modal AdaIN
  --fuSAttn \ # Enable Teacher Model with Self Attention Map
  --end_fusion 20 \ # Define when the Teacher Model stop participating
  --num_inference_steps 50

Integration with StyleCrafter

Follow StyleCrafter to download pre-trained checkpoints.

We encourage you to integrate the Teacher Model with StyleCrafter. This combination, as shown in our experiments, not only helps maintain layout stability but also effectively reduces content leakage.

cd stylecrafter_sdxl

python stylecrafter_teacherModel.py \
  --config config/infer/style_crafter_sdxl.yaml \
  --style_path "../assets/style1.jpg" \
  --prompt "A red apple" \
  --scale 0.5 \
  --num_samples 2 \
  --end_fusion 10 # Define when the Teacher Model stop participating

3. Demo

To run a local demo of the project, run the following:

python gradio/app.py

BibTeX

If you find our repo helpful, please consider leaving a star or cite our paper :)

@misc{lei2024stylestudiotextdrivenstyletransfer,
      title={StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements}, 
      author={Mingkun Lei and Xue Song and Beier Zhu and Hao Wang and Chi Zhang},
      year={2024},
      eprint={2412.08503},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.08503}, 
}

📭 Contact

If you have any comments or questions, feel free to contact Mingkun Lei.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
CVPR 2025

News and Update

Abstract

Getting Started

1.Clone the code and prepare the environment

2.Run StyleStudio

Parameter Explanation

Integration with CSGO

Note

Integration with InstantStyle

Integration with StyleCrafter

3. Demo

Related Links

BibTeX

📭 Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
gradio		gradio
ip_adapter		ip_adapter
stylecrafter_sdxl		stylecrafter_sdxl
README.md		README.md
example_create_neg_style.py		example_create_neg_style.py
infer_InstantStyle.py		infer_InstantStyle.py
infer_StyleStudio.py		infer_StyleStudio.py
infer_StyleStudio_layout_stability.py		infer_StyleStudio_layout_stability.py
requirements.txt		requirements.txt

Westlake-AGI-Lab/StyleStudio

Folders and files

Latest commit

History

Repository files navigation

StyleStudio: Text-Driven Style Transfer with Selective Control of Style ElementsCVPR 2025

News and Update

Abstract

Getting Started

1.Clone the code and prepare the environment

2.Run StyleStudio

Parameter Explanation

Integration with CSGO

Note

Integration with InstantStyle

Integration with StyleCrafter

3. Demo

Related Links

BibTeX

📭 Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
CVPR 2025

Packages