Cache applied with torch.compile #453

Binary2355 · 2025-02-24T03:38:08Z

This PR:

Add the Performance of Teacache and FBCache on 1xH20
Add the Performance of Teacache and FBCache on 1xH20 and 4xH20 with torch.compile
refracture the code to make torch.compile as efficient as possible
Fix the bug, postpone the logic to update modulated_inputs of FBCache (which is equivalent to the paraAttn)

The performance table is listed below:

Method	Latency (s)
Method	without torch.compile		with torch.compile
	4xH20	1xH20	4xH20	1xH20
Baseline	2.02s	6.10s	1.81s	5.02s
use_teacache	1.60s	4.67s	1.50s	3.92s
use_fbcache	0.93s	2.51s	0.85s	2.09s

feifeibear · 2025-02-25T01:57:01Z

examples/flux_example.py

+        if engine_config.runtime_config.use_torch_compile:
+            pipe.transformer = torch.compile(apply_cache_on_transformer(pipe.original_transformer, **cache_args))


你现在遇到的问题是在_convert_transformer_backbone 先 compile 的 transformers formward。然后你apply_cache_on_transformer又改了逻辑，所以 compile 失效。

apply_cache_on_transformer 放在 _convert_transformer_backbone 函数中吧。
别在 example 中操作了。

我在_convert_transformer_backbone里进行torch compile之前保存了original_transformer, 所以apply_cache_on_transformer之后的torch compile是针对原始的transformer做的操作

xfuser/model_executor/pipelines/base_pipeline.py

- add 4xH20 performance and 1xH20 performance with torch.compile

feifeibear approved these changes Feb 24, 2025

View reviewed changes

Binary2355 force-pushed the main branch 2 times, most recently from 526c180 to a74e42a Compare February 24, 2025 15:15

Binary2355 changed the title ~~add 1xH20 performance~~ add 1xH20 performance and 1xH20 performance, 4xH20 performance with torch.compile Feb 24, 2025

Binary2355 force-pushed the main branch from a74e42a to a50dcfd Compare February 24, 2025 15:23

feifeibear reviewed Feb 25, 2025

View reviewed changes

feifeibear changed the title ~~add 1xH20 performance and 1xH20 performance, 4xH20 performance with torch.compile~~ Cache applied with torch.compile Feb 25, 2025

feifeibear reviewed Feb 25, 2025

View reviewed changes

xfuser/model_executor/pipelines/base_pipeline.py Show resolved Hide resolved

Binary2355 force-pushed the main branch 6 times, most recently from 5bf79c9 to 2c75c4e Compare February 25, 2025 05:25

feifeibear reviewed Feb 25, 2025

View reviewed changes

xfuser/model_executor/pipelines/base_pipeline.py Show resolved Hide resolved

- add 1xH20 performance

b8fbec6

- add 4xH20 performance and 1xH20 performance with torch.compile

Binary2355 force-pushed the main branch from 2c75c4e to b8fbec6 Compare February 25, 2025 05:59

feifeibear merged commit 248abec into xdit-project:main Feb 25, 2025
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache applied with torch.compile #453

Cache applied with torch.compile #453

Binary2355 commented Feb 24, 2025 •

edited

Loading

feifeibear Feb 25, 2025

Binary2355 Feb 25, 2025

		if engine_config.runtime_config.use_torch_compile:
		pipe.transformer = torch.compile(apply_cache_on_transformer(pipe.original_transformer, **cache_args))

Cache applied with torch.compile #453

Cache applied with torch.compile #453

Conversation

Binary2355 commented Feb 24, 2025 • edited Loading

feifeibear Feb 25, 2025

Choose a reason for hiding this comment

Binary2355 Feb 25, 2025

Choose a reason for hiding this comment

Binary2355 commented Feb 24, 2025 •

edited

Loading