Prometheus Metrics #71

Bslabe123 · 2024-05-03T16:15:32Z

Added a Gauge for watching the prefill backlog size. Note port 9090 is conventional for prometheus metrics.

vipannalla · 2024-05-06T16:59:53Z

Looks good. Can we do a end-2-end run to ensure there no impact on perf?

Zijun, PTAL .

FanhaiLu1

Could you share the Prometheus Metrics results?

liurupeng · 2024-05-06T20:24:08Z

jetstream/core/orchestrator.py

@@ -421,6 +427,7 @@ def place_request_on_prefill_queue(self, request: ActiveRequest):
    """Used to place new requests for prefilling and generation."""


is the function "place_request_on_prefill_queue" used by the orchestrator to add requests to the prefill queue? @JoeZijunZhou

asking bc I didn't see PetStream or other places is invoking this API

This is adding requests from JetStream's client to JetStream Orchestrator's prefill backlog. It doesn't relate to engines.

I see, but where is the API used? I didn't find the usage... so I'm not sure whether we should record metrics here

Which API? This is part of the workflow of the JetStream Decode API.

liurupeng · 2024-05-06T20:25:54Z

jetstream/core/orchestrator.py

@@ -442,6 +449,8 @@ def _prefill_thread(self, idx: int):
      my_transfer_backlog = self._transfer_backlogs[idx]
      # The prefill thread can just sleep until it has work to do.
      request = self._prefill_backlog.get(block=True)
+      self._prefill_backlog_size_metric.set(self._prefill_backlog.qsize())


qq, should we record the metrics in two places?

I would assume this is the real prefill queue size during runtime @FanhaiLu1 right?

L430 logs the prefill backlog queue size after a request added to the queue; L452 logs the prefill backlog queue size after a request remove/get from the queue (to start prefill operation for the request).

then I would assume we only need to add it in one place right?

IIUC, @Bslabe123 is trying to collect the prefill queue size metric as the first step?

liurupeng · 2024-05-06T20:27:52Z

Could you share the Prometheus Metrics results?

@Bslabe123 yep, it would be good if we could verify the metric is added in the container logs, thanks!

JoeZijunZhou

LGTM! Would also like to see the E2E validation result of this metric.

JoeZijunZhou · 2024-05-06T21:51:13Z

jetstream/core/orchestrator.py

@@ -421,6 +427,7 @@ def place_request_on_prefill_queue(self, request: ActiveRequest):
    """Used to place new requests for prefilling and generation."""


This is adding requests from JetStream's client to JetStream Orchestrator's prefill backlog. It doesn't relate to engines.

JoeZijunZhou · 2024-05-06T21:54:37Z

jetstream/core/orchestrator.py

@@ -442,6 +449,8 @@ def _prefill_thread(self, idx: int):
      my_transfer_backlog = self._transfer_backlogs[idx]
      # The prefill thread can just sleep until it has work to do.
      request = self._prefill_backlog.get(block=True)
+      self._prefill_backlog_size_metric.set(self._prefill_backlog.qsize())


L430 logs the prefill backlog queue size after a request added to the queue; L452 logs the prefill backlog queue size after a request remove/get from the queue (to start prefill operation for the request).

JoeZijunZhou · 2024-05-09T00:04:49Z

jetstream/core/orchestrator.py

@@ -242,6 +246,9 @@ def __init__(
    # Stage 1
    # At first, a request is placed here in order to get prefilled.
    self._prefill_backlog = queue.Queue()
+    self._prefill_backlog_size_metric = prometheus_client.Gauge(


You can get the decode slots size at line L556 and L587. @liurupeng @Bslabe123 It's similar to the prefill backlog metric.

Bslabe123 · 2024-05-09T17:34:02Z

Could you share the Prometheus Metrics results?

@Bslabe123 yep, it would be good if we could verify the metric is added in the container logs, thanks!

Able to validate via also deploying a prometheus server to the cluster to scrape the newly emitted metrics, will follow up with an ai-on-gke PR with validation instructions/demo

Bslabe123 · 2024-05-09T17:39:21Z

LGTM! Would also like to see the E2E validation result of this metric.

Deployed on GKE with this maxtext setup plus a prometheus server, these are metrics after 100 concurrent curls to the maxtext http endpoint via seq 100 | xargs -P 100 -n 1 curl --request POST --header "Content-type: application/json" -s localhost:8000/generate --data '{ "prompt": "What are the top 5 programming languages", "max_tokens": 200 }'

Bslabe123 · 2024-05-09T17:50:11Z

Looks good. Can we do a end-2-end run to ensure there no impact on perf?

Zijun, PTAL .

Timed the above 100 requests with bashs time and got the following

real	0m26.914s
user	0m0.940s
sys	0m1.313s

Can do more extensive benchmarking if these numbers seem questionable

liurupeng · 2024-05-09T19:26:34Z

lgtm, thanks @Bslabe123 !

FanhaiLu1

Is Prometheus Metrics mandatory? If the port already been used or something wrong crash Prometheus server, what would be impact to jet stream sever? In current logic, it would also crash jetstream sever.

Can you add flag and only enable Prometheus Metrics sever and collection if the flag is enabled?

initial commit

8cb2c75

Bslabe123 requested a review from vipannalla as a code owner May 3, 2024 16:15

Bslabe123 marked this pull request as draft May 3, 2024 16:15

Merge branch 'main' into prometheus-initial

5be58b0

Bslabe123 changed the title ~~initial commit~~ Prometheus Metrics May 3, 2024

Bslabe123 and others added 3 commits May 3, 2024 21:05

updated requirements.txt

54bbf2c

correct parameters for Gauge

2db32df

Update orchestrator.py

9f0bddf

vipannalla requested a review from JoeZijunZhou May 6, 2024 16:58

FanhaiLu1 self-requested a review May 6, 2024 20:02

FanhaiLu1 reviewed May 6, 2024

View reviewed changes

liurupeng reviewed May 6, 2024

View reviewed changes

JoeZijunZhou reviewed May 6, 2024

View reviewed changes

Bslabe123 added 4 commits May 6, 2024 21:22

Update orchestrator.py

28afc73

Update server_lib.py

b3a0b79

Merge branch 'google:main' into prometheus-initial

23c075a

Fix prometheus client port collision

12b706d

JoeZijunZhou reviewed May 9, 2024

View reviewed changes

prometheus port 9100 -> 9090

16eb689

Bslabe123 marked this pull request as ready for review May 9, 2024 19:37

Pyink formatting

a3687b7

Bslabe123 requested a review from FanhaiLu1 May 9, 2024 21:32

FanhaiLu1 approved these changes May 9, 2024

View reviewed changes

Bslabe123 added 2 commits May 9, 2024 22:03

Addressed above comment

1b1a534

logging in wrong place

9c2bb15

JoeZijunZhou approved these changes May 9, 2024

View reviewed changes

Bslabe123 and others added 3 commits May 10, 2024 15:02

Conditional fix

54552d7

Update server_lib.py

fca13e8

reformat

019daa3

FanhaiLu1 merged commit 2f8924d into AI-Hypercomputer:main May 10, 2024
3 checks passed

Bslabe123 deleted the prometheus-initial branch May 10, 2024 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus Metrics #71

Prometheus Metrics #71

Bslabe123 commented May 3, 2024 •

edited

Loading

vipannalla commented May 6, 2024

FanhaiLu1 left a comment

liurupeng May 6, 2024

liurupeng May 6, 2024

JoeZijunZhou May 6, 2024

liurupeng May 7, 2024

JoeZijunZhou May 8, 2024

liurupeng May 6, 2024 •

edited

Loading

liurupeng May 6, 2024

JoeZijunZhou May 6, 2024 •

edited

Loading

liurupeng May 7, 2024

JoeZijunZhou May 8, 2024

liurupeng commented May 6, 2024

JoeZijunZhou left a comment

JoeZijunZhou May 6, 2024

JoeZijunZhou May 6, 2024 •

edited

Loading

JoeZijunZhou May 9, 2024

Bslabe123 commented May 9, 2024

Bslabe123 commented May 9, 2024 •

edited

Loading

Bslabe123 commented May 9, 2024 •

edited

Loading

liurupeng commented May 9, 2024

FanhaiLu1 left a comment

		@@ -421,6 +427,7 @@ def place_request_on_prefill_queue(self, request: ActiveRequest):
		"""Used to place new requests for prefilling and generation."""

Prometheus Metrics #71

Prometheus Metrics #71

Conversation

Bslabe123 commented May 3, 2024 • edited Loading

vipannalla commented May 6, 2024

FanhaiLu1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liurupeng May 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoeZijunZhou May 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liurupeng commented May 6, 2024

JoeZijunZhou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoeZijunZhou May 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bslabe123 commented May 9, 2024

Bslabe123 commented May 9, 2024 • edited Loading

Bslabe123 commented May 9, 2024 • edited Loading

liurupeng commented May 9, 2024

FanhaiLu1 left a comment

Choose a reason for hiding this comment

Bslabe123 commented May 3, 2024 •

edited

Loading

liurupeng May 6, 2024 •

edited

Loading

JoeZijunZhou May 6, 2024 •

edited

Loading

JoeZijunZhou May 6, 2024 •

edited

Loading

Bslabe123 commented May 9, 2024 •

edited

Loading

Bslabe123 commented May 9, 2024 •

edited

Loading