Name	Name	Last commit message	Last commit date
parent directory ..
configs	configs
prompts	prompts
scripts	scripts
.gitignore	.gitignore
README.md	README.md

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

[Website][Paper][Nunchaku Inference System]

Diffusion models have been proven highly effective at generating high-quality images. However, as these models grow larger, they require significantly more memory and suffer from higher latency, posing substantial challenges for deployment. In this work, we aim to accelerate diffusion models by quantizing their weights and activations to 4 bits. At such an aggressive level, both weights and activations are highly sensitive to quantization, where conventional post-training quantization methods for large language models like smoothing become insufficient. To overcome this limitation, we propose SVDQuant, a new 4-bit quantization paradigm. Different from smoothing which redistributes outliers between weights and activations, our approach absorbs these outliers using a low-rank branch. We first shift the outliers from activations into the weights, then employ a high-precision low-rank branch to take in the outliers in the weights with SVD. This process eases the quantization on both sides. However, naively running the low-rank branch independently incurs significant overhead due to extra data movement of activations, negating the quantization speedup. To address this, we co-design an inference engine Nunchaku that fuses the kernels in the low-rank branch into thosein the low-bit branch to cut off redundant memory access. It can also seamlessly support off-the-shelf low-rank adapters (LoRAs) without the requantization. Extensive experiments on SDXL, PixArt-Sigma, and FLUX.1 validate the effectiveness of SVDQuant in preserving image quality. We reduce the memory usage for the 12B FLUX.1 models by 3.6×, achieving 3.5× speedup over the 4-bit weight-only quantized baseline on a 16GB RTX-4090 GPU, paving the way for more interactive applications on PCs.

Usage

We use Flux.1-schnell as an example.

Step 1: Evaluation Baselines Preparation

In order to evaluate the similarity metrics, we have to first prepare the reference images generated by unquantized models by running the following command:

python -m deepcompressor.app.diffusion.ptq configs/model/flux.1-schnell.yaml --output-dirname reference

In this command,

configs/model/flux.1-schnell.yaml specifies the model configurations including evaluation setups.
By setting flag --output-dirname to reference, the output directory will be automatically redirect to the ref_root in the evaluation configuration.

Step 2: Calibration Dataset Preparation

Before quantizing diffusion models, we randomly sample 128 prompts in COCO Captions 2024 to generate calibration dataset by running the following command:

python -m deepcompressor.app.diffusion.dataset.collect.calib \
    configs/model/flux.1-schnell.yaml configs/collect/qdiff.yaml

In this command,

configs/collect/qdiff.yaml specifies the calibration dataset configurations, including the path to the prompt yaml (i.e., --collect-prompt-path prompts/qdiff.yaml), the number of prompts to be sampled (i.e., --collect-num-samples 128), and the root directory of the calibration datasets (which should be in line with the quantization configuration).

Step 3: Model Quantization

The following command will perform INT4 SVDQuant and evaluate the quantized model on 1024 samples from MJHQ-30K:

python -m deepcompressor.app.diffusion.ptq \
    configs/model/flux.1-schnell.yaml configs/svdquant/int4.yaml \
    --eval-benchmarks MJHQ --eval-num-samples 1024

In this command,

The positional arguments are configuration files which are loaded in order. configs/svdquant/int4.yaml contains the quantization configurations specialized in INT4 SVDQuant. Please make sure all configuration files are under a subfolder of the working directory where you run the command.
- You can add configs/svdquant/fast.yaml to for faster quantization, i.e.,
```
python -m deepcompressor.app.diffusion.ptq \
    configs/model/flux.1-schnell.yaml configs/svdquant/int4.yaml configs/svdquant/fast.yaml \
    --eval-benchmarks MJHQ --eval-num-samples 1024
```
- You can add configs/svdquant/gptq.yaml to perform gptq after svdquant, i.e.,
```
python -m deepcompressor.app.diffusion.ptq \
    configs/model/flux.1-schnell.yaml configs/svdquant/int4.yaml configs/svdquant/gptq.yaml \
    --eval-benchmarks MJHQ --eval-num-samples 1024
```
All configurations can be directly set in either YAML file or command line. Please refer to configs/__default__.yaml and python -m deepcompressor.app.diffusion.ptq -h.
The default evaluation datasets are 1024 samples from MJHQ and DCI.
If you would like to save quantized model checkpoint, please add --save-model true or --save-model /PATH/TO/CHECKPOINT/DIR in the command.

Deployment

If you save the SVDQuant W4A4 quantized model checkpoint, you can easily to deploy quantized model with Nunchaku engine.

Please run the following command to convert the saved checkpoint to Nunchaku-compatible checkpoint:

python -m deepcompressor.backend.nunchaku.convert \
  --quant-path /PATH/TO/CHECKPOINT/DIR \
  --output-root /PATH/TO/OUTPUT/ROOT \
  --model-name MODEL_NAME

After we have the Nunchaku-compatible checkpoint, please switch to Nunchaku conda environment and refer to Nunchaku for further deployment on GPU system.

If you want to integrate LoRA, please run the following command to convert LoRA to Nunchaku-compatible checkpoint:

python -m deepcompressor.backend.nunchaku.convert_lora \
  --quant-path /PATH/TO/NUNCHAKU/TRANSFORMER_BLOCKS/SAFETENSORS_FILE \
  --lora-path /PATH/TO/DIFFUSERS/LORA/SAFETENSORS_FILE \
  --output-root /PATH/TO/OUTPUT/ROOT \
  --lora-name LORA_NAME

Evaluation Resutls

Quality Evaluation

Below is the quality and similarity evaluated with 5000 samples from MJHQ-30K dataset. IR means ImageReward. Our 4-bit results outperform other 4-bit baselines, effectively preserving the visual quality of 16-bit models.

Model	Precision	Method	FID ($\downarrow$)	IR ($\uparrow$)	LPIPS ($\downarrow$)	PSNR( $\uparrow$)
FLUX.1-dev (50 Steps)	BF16	--	20.3	0.953	--	--
	INT W8A8	SVDQ	20.4	0.948	0.089	27.0
	W4A16	NF4	20.6	0.910	0.272	19.5
	INT W4A4		20.2	0.908	0.322	18.5
	INT W4A4	SVDQ	20.1	0.926	0.256	20.1
	INT W4A4	SVDQ+GPTQ	19.9	0.935	0.223	21.0
	NVFP4		20.3	0.961	0.345	16.3
	NVFP4	SVDQ	20.7	0.934	0.222	21.0
	NVFP4	SVDQ+GPTQ	20.3	0.942	0.205	21.5
FLUX.1-schnell (4 Steps)	BF16	--	19.2	0.938	--	--
	INT W8A8	SVDQ	19.2	0.966	0.120	22.9
	W4A16	NF4	18.9	0.943	0.257	18.2
	INT W4A4		18.1	0.962	0.345	16.3
	INT W4A4	SVDQ	18.3	0.957	0.289	17.6
	INT W4A4	SVDQ+GPTQ	18.3	0.951	0.257	18.3
	NVFP4		19.0	0.952	0.276	17.6
	NVFP4	SVDQ	19.0	0.976	0.247	18.4
	NVFP4	SVDQ+GPTQ	18.9	0.964	0.229	19.0
SANA-1.6b (20 Steps)	BF16	--	20.6	0.952	--	--
	INT W4A4		20.5	0.894	0.339	15.3
	INT W4A4	GPTQ	19.9	0.881	0.288	16.4
	INT W4A4	SVDQ	19.9	0.922	0.234	17.4
	INT W4A4	SVDQ+GPTQ	19.3	0.935	0.220	17.8
	NVFP4		19.7	0.929	0.236	17.4
	NVFP4	GPTQ	19.7	0.925	0.202	18.3
	NVFP4	SVDQ	20.2	0.951	0.190	18.6
	NVFP4	SVDQ+GPTQ	20.2	0.941	0.176	19.0
PixArt-Sigma (20 Steps)	FP16	--	16.6	0.944	--	--
	INT W8A8	ViDiT-Q	15.7	0.944	0.137	22.5
	INT W8A8	SVDQ	16.3	0.955	0.109	23.7
	INT W4A8	ViDiT-Q	37.3	0.573	0.611	12.0
	INT W4A4	SVDQ	19.9	0.858	0.356	17.0
	INT W4A4	SVDQ+GPTQ	19.2	0.878	0.323	17.6
	NVFP4		31.8	0.660	0.517	14.8
	NVFP4	GPTQ	27.2	0.691	0.482	15.6
	NVFP4	SVDQ	17.3	0.945	0.290	18.0
	NVFP4	SVDQ+GPTQ	16.6	0.940	0.271	18.5

Reference

If you find deepcompressor useful or relevant to your research, please kindly cite our paper:

@inproceedings{
  li2024svdquant,
  title={SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models},
  author={Li*, Muyang and Lin*, Yujun and Zhang*, Zhekai and Cai, Tianle and Li, Xiuyu and Guo, Junxian and Xie, Enze and Meng, Chenlin and Zhu, Jun-Yan and Han, Song},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diffusion

diffusion

README.md

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Usage

Step 1: Evaluation Baselines Preparation

Step 2: Calibration Dataset Preparation

Step 3: Model Quantization

Deployment

Evaluation Resutls

Quality Evaluation

Reference

Files

diffusion

Directory actions

More options

Directory actions

More options

Latest commit

History

diffusion

Folders and files

parent directory

README.md

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Usage

Step 1: Evaluation Baselines Preparation

Step 2: Calibration Dataset Preparation

Step 3: Model Quantization

Deployment

Evaluation Resutls

Quality Evaluation

Reference