Fix Triton configuration that was causing CPU memory errors #60

blaise-muhirwa · 2023-11-29T18:37:35Z

Triton uses a Python backend stub process to connect the model.py file to their cpp core. It's designed such that there will be a single stub process for every loaded model.

Previously, we had an issue where during blue/green deployment updates, the stub process corresponding to the old model does not get cleaned up once a new model is loaded up in memory. In addition, if you have other models(detectors) under /var/groundlight/serving/model_repository that are not serving inference requests, Triton would attempt to load them up as well. This resulted in having so many zombie stub processes even with a few active inference deployments.

This shows a ridiculous number of stub processes when I only had 2 active inference deployments.

The solution to this was to use --model-control-mode=explicit to instruct Triton to only load the model that we need. After making this change, we should now see only a single stub process per tritonserver process like this:

Wed Nov 29 18:37:46 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   29C    P0    26W /  70W |    836MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   2040588      C   tritonserver                      164MiB |
|    0   N/A  N/A   2040590      C   tritonserver                      164MiB |
|    0   N/A  N/A   2041330      C   ...riton_python_backend_stub      252MiB |
|    0   N/A  N/A   2041349      C   ...riton_python_backend_stub      252MiB |
+-----------------------------------------------------------------------------+

blaise-muhirwa · 2023-11-29T18:40:19Z

app/core/kubernetes_management.py

-            f"--backend-config=python,shm-region-prefix-name={deployment_name}-{prefixed_ksuid()}",
+            "--allow-cpu-metrics=true",
+            "--allow-gpu-metrics=true",
+            "--model-control-mode=explicit",


This was the key line. This function was getting called during blue/green updates leading Triton to load up models for every detector under /var/groundlight/serving/model_repository.

tyler-romero

LGTM, great job tracking this down!

* set model-control-mode=explicit so that we load only the models we need * update image * Automatically reformatting code with black and isort --------- Co-authored-by: Auto-format Bot <autoformatbot@groundlight.ai>

blaise-muhirwa and others added 3 commits November 29, 2023 18:23

set model-control-mode=explicit so that we load only the models we need

7280f93

update image

ab67869

Automatically reformatting code with black and isort

087a198

blaise-muhirwa commented Nov 29, 2023

View reviewed changes

blaise-muhirwa requested review from tyler-romero and robotrapta November 29, 2023 18:40

tyler-romero approved these changes Nov 29, 2023

View reviewed changes

blaise-muhirwa merged commit 9b7f393 into main Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Triton configuration that was causing CPU memory errors #60

Fix Triton configuration that was causing CPU memory errors #60

blaise-muhirwa commented Nov 29, 2023 •

edited

Loading

blaise-muhirwa Nov 29, 2023

tyler-romero left a comment

Fix Triton configuration that was causing CPU memory errors #60

Fix Triton configuration that was causing CPU memory errors #60

Conversation

blaise-muhirwa commented Nov 29, 2023 • edited Loading

blaise-muhirwa Nov 29, 2023

Choose a reason for hiding this comment

tyler-romero left a comment

Choose a reason for hiding this comment

blaise-muhirwa commented Nov 29, 2023 •

edited

Loading