Fix Triton configuration that was causing CPU memory errors #60
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Triton uses a Python backend stub process to connect the
model.py
file to their cpp core. It's designed such that there will be a single stub process for every loaded model.Previously, we had an issue where during blue/green deployment updates, the stub process corresponding to the old model does not get cleaned up once a new model is loaded up in memory. In addition, if you have other models(detectors) under
/var/groundlight/serving/model_repository
that are not serving inference requests, Triton would attempt to load them up as well. This resulted in having so many zombie stub processes even with a few active inference deployments.This shows a ridiculous number of stub processes when I only had 2 active inference deployments.
The solution to this was to use
--model-control-mode=explicit
to instruct Triton to only load the model that we need. After making this change, we should now see only a single stub process per tritonserver process like this: