Add ray disaggregated serving support #87

FanhaiLu1 · 2024-05-23T17:30:09Z

This PR add ray disaggregated serving support in JetStream, the underline engine implementation will be pytorch and maxtext side.

This PR doesn't not impact any current interleave or Pathway disaggregated behavior.

allenwang28

Looks good, thanks Fanhai! Just minor nits

jetstream/core/orchestrator.py

JoeZijunZhou

LGTM! May need some unit tests.

JoeZijunZhou · 2024-05-23T18:36:25Z

jetstream/core/server_lib.py

@@ -97,6 +97,7 @@ def run(
    metrics_server_config: config_lib.MetricsServerConfig | None = None,
    enable_jax_profiler: bool = False,
    jax_profiler_port: int = 9999,
+    ray_multiple_host: bool = False,


Maybe move this flag to ServerConfig, since it's used to control server mode?

Thanks! done.

vipannalla

Unitests please.

vipannalla · 2024-05-23T18:53:56Z

jetstream/core/server_lib.py

@@ -97,6 +97,7 @@ def run(
    metrics_server_config: config_lib.MetricsServerConfig | None = None,
    enable_jax_profiler: bool = False,
    jax_profiler_port: int = 9999,
+    ray_multiple_host: bool = False,


vipannalla · 2024-05-23T18:57:44Z

jetstream/core/orchestrator.py

+  def _ray_transfer_prefill_result(self, new_request, target_idx):
+    self._generate_engines[target_idx].transfer(new_request.prefill_result)


I don't see anything ray specfic here, the transfer code is abstracted into engine.transfer(). Can we use a generic name such as "non_jax_transfer" instead of "ray_transfer" here?
Jetstream doesn't need to know if its Ray or some other mechanism used. Also, lets move this setting to server config for better control.

Thanks, done for the server config part.

The code here is interface level, the actually logic (or real implementation) is in engine side (pytorch or maxtext). The implementation logic would be:

Gather prefill result from TPU chips in ray worker

Transfer all gathered result from TPU to CPU ram though PCIE in ray worker

Transfer prefill result from prefill server to decode server though DCN by ray head.

Right now or in near future, I don't see we will introduce another transfer mechanism beside ray or jax, I feel name it as ray_transffer is clear and straightforward.

I agree that Jetstream doesn't know it its ray or other mechanism, even future more, Jetstream doesn't need to know whether it's jax, ray or any other mechanism, the engine should handle it, but right now Pathy way hanlde transfer in orchestrator, we have to decide which method need to be called. Ideal case is that just call enginer.transfer() even with Pathyway, this need more effort to explore it.

Based on the implementation of _ray_transfer_prefill_result, could we assign the responsibility of implementing transfer to the generate engine?

What I mean by this is that within the orchestrator, we can generically use:

def _transfer_prefill_result( self, new_request: ActiveRequest, target_idx: int ): self._generate_engines[target_idx].transfer(new_request.prefill_result)

and so your Pathways engine needs to implement transfer (the jax.device_put logic) and your Ray engine needs to implement transfer with ray.remote primitives.

Or does this break some assumption? I think having JAX involved in the orchestrator is a somewhat leaky abstraction

Agree. @JoeZijunZhou May have more insights, but based on my understanding, this could break Pathway.

Synced with @vipannalla offline, we should removed jax dependencies in JetStream in long term. Current concerns is what is the impact to Pathway, we will figure out and make sure Patyway can adopt no jax in jetstream.

FanhaiLu1 · 2024-05-23T19:22:51Z

LGTM! May need some unit tests.

Let's explore how to add disaggregated serving unit test and add disaggregated tests before enable disaggregated serving.

JoeZijunZhou · 2024-05-23T19:49:06Z

LGTM! May need some unit tests.

Let's explore how to add disaggregated serving unit test and add disaggregated tests before enable disaggregated serving.

We could add a simple unit test in test_orchestrator.py and test_server.py just to cover the flag if else logic. The e2e functional test could be added later I guess?

add ray dissagregated serving support

3ae6ce2

FanhaiLu1 requested a review from JoeZijunZhou May 23, 2024 17:30

FanhaiLu1 requested a review from vipannalla as a code owner May 23, 2024 17:30

FanhaiLu1 added 2 commits May 23, 2024 18:05

function fix

3e9c9cf

fix lint error

a389368

FanhaiLu1 requested a review from allenwang28 May 23, 2024 18:28

allenwang28 reviewed May 23, 2024

View reviewed changes

jetstream/core/orchestrator.py Outdated Show resolved Hide resolved

jetstream/core/orchestrator.py Outdated Show resolved Hide resolved

JoeZijunZhou approved these changes May 23, 2024

View reviewed changes

vipannalla reviewed May 23, 2024

View reviewed changes

FanhaiLu1 added 2 commits May 23, 2024 19:01

refactor parameter

1850549

add ActiveRequest annotation in function

b4d5563

FanhaiLu1 merged commit e19a790 into AI-Hypercomputer:main May 23, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ray disaggregated serving support #87

Add ray disaggregated serving support #87

FanhaiLu1 commented May 23, 2024

allenwang28 left a comment

JoeZijunZhou left a comment

JoeZijunZhou May 23, 2024

vipannalla May 23, 2024

FanhaiLu1 May 23, 2024

vipannalla left a comment

vipannalla May 23, 2024

vipannalla May 23, 2024

FanhaiLu1 May 23, 2024 •

edited

Loading

allenwang28 May 23, 2024

FanhaiLu1 May 23, 2024

FanhaiLu1 May 23, 2024

FanhaiLu1 commented May 23, 2024

JoeZijunZhou commented May 23, 2024

		def _ray_transfer_prefill_result(self, new_request, target_idx):
		self._generate_engines[target_idx].transfer(new_request.prefill_result)

Add ray disaggregated serving support #87

Add ray disaggregated serving support #87

Conversation

FanhaiLu1 commented May 23, 2024

allenwang28 left a comment

Choose a reason for hiding this comment

JoeZijunZhou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vipannalla left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FanhaiLu1 May 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FanhaiLu1 commented May 23, 2024

JoeZijunZhou commented May 23, 2024

FanhaiLu1 May 23, 2024 •

edited

Loading