Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Triton configuration that was causing CPU memory errors #60

Merged
merged 3 commits into from
Nov 29, 2023

Conversation

blaise-muhirwa
Copy link
Contributor

@blaise-muhirwa blaise-muhirwa commented Nov 29, 2023

Triton uses a Python backend stub process to connect the model.py file to their cpp core. It's designed such that there will be a single stub process for every loaded model.

Previously, we had an issue where during blue/green deployment updates, the stub process corresponding to the old model does not get cleaned up once a new model is loaded up in memory. In addition, if you have other models(detectors) under /var/groundlight/serving/model_repository that are not serving inference requests, Triton would attempt to load them up as well. This resulted in having so many zombie stub processes even with a few active inference deployments.

This shows a ridiculous number of stub processes when I only had 2 active inference deployments.

stubs

The solution to this was to use --model-control-mode=explicit to instruct Triton to only load the model that we need. After making this change, we should now see only a single stub process per tritonserver process like this:

Wed Nov 29 18:37:46 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   29C    P0    26W /  70W |    836MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   2040588      C   tritonserver                      164MiB |
|    0   N/A  N/A   2040590      C   tritonserver                      164MiB |
|    0   N/A  N/A   2041330      C   ...riton_python_backend_stub      252MiB |
|    0   N/A  N/A   2041349      C   ...riton_python_backend_stub      252MiB |
+-----------------------------------------------------------------------------+

f"--backend-config=python,shm-region-prefix-name={deployment_name}-{prefixed_ksuid()}",
"--allow-cpu-metrics=true",
"--allow-gpu-metrics=true",
"--model-control-mode=explicit",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the key line. This function was getting called during blue/green updates leading Triton to load up models for every detector under /var/groundlight/serving/model_repository.

Copy link
Member

@tyler-romero tyler-romero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great job tracking this down!

@blaise-muhirwa blaise-muhirwa merged commit 9b7f393 into main Nov 29, 2023
tyler-romero pushed a commit that referenced this pull request Dec 7, 2023
* set model-control-mode=explicit so that we load only the models we need

* update image

* Automatically reformatting code with black and isort

---------

Co-authored-by: Auto-format Bot <autoformatbot@groundlight.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants