From dc75a3ae32165b7dc530da7ac6a0e830d146395a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jiarui=20Fang=EF=BC=88=E6=96=B9=E4=BD=B3=E7=91=9E=EF=BC=89?=
 <fangjiarui123@gmail.com>
Date: Thu, 23 Jan 2025 13:33:48 +0800
Subject: [PATCH] [doc] ray launch parallel inferene (#442)

---
 README.md              |  8 +++++++-
 examples/ray/README.md | 22 ++++++++++++++++++++++
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 examples/ray/README.md

diff --git a/README.md b/README.md
index 3fb91c1b..19e790a9 100644
--- a/README.md
+++ b/README.md
@@ -269,7 +269,13 @@ The warmup step impacts the efficiency of PipeFusion as it cannot be executed in
 We observed that a warmup of 0 had no effect on the PixArt model.
 Users can tune this value according to their specific tasks.
 
-### 5. Launch an HTTP Service
+### 5. Launch parallel inference example with ray
+
+We also provide a ray example to launch parallel inference. With ray, we can disaggregate the VAE module and DiT backbone, and allocate different GPU parallelism for them.
+
+[Launch parallel inference example with ray](./examples/ray/README.md)
+
+### 6. Launch an HTTP Service
 
 You can also launch an HTTP service to generate images with xDiT.
 
diff --git a/examples/ray/README.md b/examples/ray/README.md
new file mode 100644
index 00000000..c918d4d3
--- /dev/null
+++ b/examples/ray/README.md
@@ -0,0 +1,22 @@
+## Running DiT Backbone and VAE Module Separately
+
+The DiT model typically consists of DiT backbone (encoder + transformers) and VAE module.
+The DiT backbone module has high computational requirements but stable memory usage.
+For high-resolution images, the VAE module has high memory consumption due to temporary memory spikes from convolution operators, despite its low computational requirements. This often leads to OOM (Out of Memory) issues caused by the VAE module.
+
+Therefore, separating the encoder + DiT backbone from the VAE module can effectively alleviate OOM issues.
+We use Ray to implement the separation of backbone and VAE functionality, and allocate different GPU parallelism for VAE and DiT backbone.
+
+In `ray_run.sh`, we define different model configurations.
+For example, if we use 3 GPUs and want to allocate 1 GPU for VAE and 2 GPUs for DiT backbone, the settings in `ray_run.sh` would be:
+
+```
+N_GPUS=3 # world size
+PARALLEL_ARGS="--pipefusion_parallel_degree 2 --ulysses_degree 1 --ring_degree 1"
+VAE_PARALLEL_SIZE=1
+DIT_PARALLEL_SIZE=2
+```
+
+Here, `VAE_PARALLEL_SIZE` specifies the parallelism for VAE, DIT_PARALLEL_SIZE defines DiT parallelism, and PARALLEL_ARGS contains the parallel configuration for DiT backbone, which in this case uses PipeFusion to run on 2 GPUs.
+
+