-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus Metrics #71
Changes from 5 commits
8cb2c75
5be58b0
54bbf2c
2db32df
9f0bddf
28afc73
b3a0b79
23c075a
12b706d
16eb689
a3687b7
1b1a534
9c2bb15
54552d7
fca13e8
019daa3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -93,8 +93,9 @@ | |
from jetstream.core.proto import jetstream_pb2_grpc | ||
from jetstream.core.utils import async_multifuture | ||
from jetstream.engine import engine_api | ||
import numpy as np | ||
|
||
import numpy as np | ||
import prometheus_client | ||
|
||
root = logging.getLogger() | ||
root.setLevel(logging.DEBUG) | ||
|
@@ -209,6 +210,9 @@ class Driver: | |
# todo: remove jax_padding after all then engine migrate to np padding | ||
_jax_padding = True | ||
|
||
# Record metrics for prefill_backlog size | ||
_prefill_backlog_size_metric: prometheus_client.Gauge | ||
|
||
def __init__( | ||
self, | ||
prefill_engines: Optional[list[engine_api.Engine]] = None, | ||
|
@@ -242,6 +246,8 @@ def __init__( | |
# Stage 1 | ||
# At first, a request is placed here in order to get prefilled. | ||
self._prefill_backlog = queue.Queue() | ||
self._prefill_backlog_size_metric = prometheus_client.Gauge("jetstream_prefill_backlog_size", "Size of prefill queue") | ||
|
||
# Stage 2 | ||
# After prefilling, it is placed here in order to get transferred to | ||
# one of the generate backlogs. | ||
|
@@ -421,6 +427,7 @@ def place_request_on_prefill_queue(self, request: ActiveRequest): | |
"""Used to place new requests for prefilling and generation.""" | ||
# Don't block so we can fail and shed load when the queue is full. | ||
self._prefill_backlog.put(request, block=False) | ||
self._prefill_backlog_size_metric.set(self._prefill_backlog.qsize()) | ||
|
||
def _load_cache_history(self, path: str) -> Union[None, Any]: | ||
"""Loads previous kv cache for a longer conversation.""" | ||
|
@@ -442,6 +449,8 @@ def _prefill_thread(self, idx: int): | |
my_transfer_backlog = self._transfer_backlogs[idx] | ||
# The prefill thread can just sleep until it has work to do. | ||
request = self._prefill_backlog.get(block=True) | ||
self._prefill_backlog_size_metric.set(self._prefill_backlog.qsize()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. qq, should we record the metrics in two places? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would assume this is the real prefill queue size during runtime @FanhaiLu1 right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. L430 logs the prefill backlog queue size after a request added to the queue; L452 logs the prefill backlog queue size after a request remove/get from the queue (to start prefill operation for the request). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. then I would assume we only need to add it in one place right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC, @Bslabe123 is trying to collect the prefill queue size metric as the first step? |
||
|
||
if request is None: | ||
break | ||
# Tokenize, and introduce a leading dimension | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,7 +6,8 @@ jax | |
jaxlib | ||
numpy | ||
portpicker | ||
prometheus-client | ||
pytest | ||
seqio | ||
tiktoken | ||
blobfile | ||
blobfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the function "place_request_on_prefill_queue" used by the orchestrator to add requests to the prefill queue? @JoeZijunZhou
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asking bc I didn't see PetStream or other places is invoking this API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is adding requests from JetStream's client to JetStream Orchestrator's prefill backlog. It doesn't relate to engines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but where is the API used? I didn't find the usage... so I'm not sure whether we should record metrics here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which API? This is part of the workflow of the JetStream Decode API.