Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Metrics #71

Merged
merged 16 commits into from
May 10, 2024
Merged

Conversation

Bslabe123
Copy link
Contributor

@Bslabe123 Bslabe123 commented May 3, 2024

Added a Gauge for watching the prefill backlog size. Note port 9090 is conventional for prometheus metrics.

@Bslabe123 Bslabe123 requested a review from vipannalla as a code owner May 3, 2024 16:15
@Bslabe123 Bslabe123 marked this pull request as draft May 3, 2024 16:15
@Bslabe123 Bslabe123 changed the title initial commit Prometheus Metrics May 3, 2024
@vipannalla vipannalla requested a review from JoeZijunZhou May 6, 2024 16:58
@vipannalla
Copy link
Collaborator

Looks good. Can we do a end-2-end run to ensure there no impact on perf?

  • Zijun, PTAL .

@FanhaiLu1 FanhaiLu1 self-requested a review May 6, 2024 20:02
Copy link
Contributor

@FanhaiLu1 FanhaiLu1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share the Prometheus Metrics results?

@@ -421,6 +427,7 @@ def place_request_on_prefill_queue(self, request: ActiveRequest):
"""Used to place new requests for prefilling and generation."""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the function "place_request_on_prefill_queue" used by the orchestrator to add requests to the prefill queue? @JoeZijunZhou

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asking bc I didn't see PetStream or other places is invoking this API

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is adding requests from JetStream's client to JetStream Orchestrator's prefill backlog. It doesn't relate to engines.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but where is the API used? I didn't find the usage... so I'm not sure whether we should record metrics here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which API? This is part of the workflow of the JetStream Decode API.

@@ -442,6 +449,8 @@ def _prefill_thread(self, idx: int):
my_transfer_backlog = self._transfer_backlogs[idx]
# The prefill thread can just sleep until it has work to do.
request = self._prefill_backlog.get(block=True)
self._prefill_backlog_size_metric.set(self._prefill_backlog.qsize())
Copy link

@liurupeng liurupeng May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq, should we record the metrics in two places?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume this is the real prefill queue size during runtime @FanhaiLu1 right?

Copy link
Collaborator

@JoeZijunZhou JoeZijunZhou May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L430 logs the prefill backlog queue size after a request added to the queue; L452 logs the prefill backlog queue size after a request remove/get from the queue (to start prefill operation for the request).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I would assume we only need to add it in one place right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, @Bslabe123 is trying to collect the prefill queue size metric as the first step?

@liurupeng
Copy link

Could you share the Prometheus Metrics results?

@Bslabe123 yep, it would be good if we could verify the metric is added in the container logs, thanks!

Copy link
Collaborator

@JoeZijunZhou JoeZijunZhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Would also like to see the E2E validation result of this metric.

@@ -421,6 +427,7 @@ def place_request_on_prefill_queue(self, request: ActiveRequest):
"""Used to place new requests for prefilling and generation."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is adding requests from JetStream's client to JetStream Orchestrator's prefill backlog. It doesn't relate to engines.

@@ -442,6 +449,8 @@ def _prefill_thread(self, idx: int):
my_transfer_backlog = self._transfer_backlogs[idx]
# The prefill thread can just sleep until it has work to do.
request = self._prefill_backlog.get(block=True)
self._prefill_backlog_size_metric.set(self._prefill_backlog.qsize())
Copy link
Collaborator

@JoeZijunZhou JoeZijunZhou May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L430 logs the prefill backlog queue size after a request added to the queue; L452 logs the prefill backlog queue size after a request remove/get from the queue (to start prefill operation for the request).

@@ -242,6 +246,9 @@ def __init__(
# Stage 1
# At first, a request is placed here in order to get prefilled.
self._prefill_backlog = queue.Queue()
self._prefill_backlog_size_metric = prometheus_client.Gauge(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get the decode slots size at line L556 and L587. @liurupeng @Bslabe123 It's similar to the prefill backlog metric.

@Bslabe123
Copy link
Contributor Author

Could you share the Prometheus Metrics results?

@Bslabe123 yep, it would be good if we could verify the metric is added in the container logs, thanks!

Able to validate via also deploying a prometheus server to the cluster to scrape the newly emitted metrics, will follow up with an ai-on-gke PR with validation instructions/demo

@Bslabe123
Copy link
Contributor Author

Bslabe123 commented May 9, 2024

LGTM! Would also like to see the E2E validation result of this metric.

Deployed on GKE with this maxtext setup plus a prometheus server, these are metrics after 100 concurrent curls to the maxtext http endpoint via seq 100 | xargs -P 100 -n 1 curl --request POST --header "Content-type: application/json" -s localhost:8000/generate --data '{ "prompt": "What are the top 5 programming languages", "max_tokens": 200 }'

Screenshot 2024-05-09 at 10 35 55 AM

@Bslabe123
Copy link
Contributor Author

Bslabe123 commented May 9, 2024

Looks good. Can we do a end-2-end run to ensure there no impact on perf?

  • Zijun, PTAL .

Timed the above 100 requests with bashs time and got the following

real	0m26.914s
user	0m0.940s
sys	0m1.313s

Can do more extensive benchmarking if these numbers seem questionable

@liurupeng
Copy link

lgtm, thanks @Bslabe123 !

@Bslabe123 Bslabe123 marked this pull request as ready for review May 9, 2024 19:37
@Bslabe123 Bslabe123 requested a review from FanhaiLu1 May 9, 2024 21:32
Copy link
Contributor

@FanhaiLu1 FanhaiLu1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Prometheus Metrics mandatory? If the port already been used or something wrong crash Prometheus server, what would be impact to jet stream sever? In current logic, it would also crash jetstream sever.

Can you add flag and only enable Prometheus Metrics sever and collection if the flag is enabled?

@FanhaiLu1 FanhaiLu1 merged commit 2f8924d into AI-Hypercomputer:main May 10, 2024
3 checks passed
@Bslabe123 Bslabe123 deleted the prometheus-initial branch May 10, 2024 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants