-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus Metrics #71
Prometheus Metrics #71
Conversation
Looks good. Can we do a end-2-end run to ensure there no impact on perf?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you share the Prometheus Metrics results?
@@ -421,6 +427,7 @@ def place_request_on_prefill_queue(self, request: ActiveRequest): | |||
"""Used to place new requests for prefilling and generation.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the function "place_request_on_prefill_queue" used by the orchestrator to add requests to the prefill queue? @JoeZijunZhou
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asking bc I didn't see PetStream or other places is invoking this API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is adding requests from JetStream's client to JetStream Orchestrator's prefill backlog. It doesn't relate to engines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but where is the API used? I didn't find the usage... so I'm not sure whether we should record metrics here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which API? This is part of the workflow of the JetStream Decode API.
@@ -442,6 +449,8 @@ def _prefill_thread(self, idx: int): | |||
my_transfer_backlog = self._transfer_backlogs[idx] | |||
# The prefill thread can just sleep until it has work to do. | |||
request = self._prefill_backlog.get(block=True) | |||
self._prefill_backlog_size_metric.set(self._prefill_backlog.qsize()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq, should we record the metrics in two places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume this is the real prefill queue size during runtime @FanhaiLu1 right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L430 logs the prefill backlog queue size after a request added to the queue; L452 logs the prefill backlog queue size after a request remove/get from the queue (to start prefill operation for the request).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then I would assume we only need to add it in one place right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, @Bslabe123 is trying to collect the prefill queue size metric as the first step?
@Bslabe123 yep, it would be good if we could verify the metric is added in the container logs, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Would also like to see the E2E validation result of this metric.
@@ -421,6 +427,7 @@ def place_request_on_prefill_queue(self, request: ActiveRequest): | |||
"""Used to place new requests for prefilling and generation.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is adding requests from JetStream's client to JetStream Orchestrator's prefill backlog. It doesn't relate to engines.
@@ -442,6 +449,8 @@ def _prefill_thread(self, idx: int): | |||
my_transfer_backlog = self._transfer_backlogs[idx] | |||
# The prefill thread can just sleep until it has work to do. | |||
request = self._prefill_backlog.get(block=True) | |||
self._prefill_backlog_size_metric.set(self._prefill_backlog.qsize()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L430 logs the prefill backlog queue size after a request added to the queue; L452 logs the prefill backlog queue size after a request remove/get from the queue (to start prefill operation for the request).
@@ -242,6 +246,9 @@ def __init__( | |||
# Stage 1 | |||
# At first, a request is placed here in order to get prefilled. | |||
self._prefill_backlog = queue.Queue() | |||
self._prefill_backlog_size_metric = prometheus_client.Gauge( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get the decode slots size at line L556 and L587. @liurupeng @Bslabe123 It's similar to the prefill backlog metric.
Able to validate via also deploying a prometheus server to the cluster to scrape the newly emitted metrics, will follow up with an ai-on-gke PR with validation instructions/demo |
Deployed on GKE with this maxtext setup plus a prometheus server, these are metrics after 100 concurrent |
Timed the above 100 requests with
Can do more extensive benchmarking if these numbers seem questionable |
lgtm, thanks @Bslabe123 ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Prometheus Metrics mandatory? If the port already been used or something wrong crash Prometheus server, what would be impact to jet stream sever? In current logic, it would also crash jetstream sever.
Can you add flag and only enable Prometheus Metrics sever and collection if the flag is enabled?
Added a Gauge for watching the prefill backlog size. Note port 9090 is conventional for prometheus metrics.