-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compatible with llama-server docker #35
Comments
I found podman to be a better option than docker. I run qwen-2vl-7B with VLLM and podman and it works. Try that. It should be able to run docker containers, and be compatible with the way llama-swap shuts down process (sends a SIGTERM). |
Here is a configuration snippet I use to run vllm, podman and llama-swap together: models:
# run VLLM in podman on the 3090
"qwen2-vl-7B-gptq-int8":
aliases:
- gpt-4-vision
proxy: "http://127.0.0.1:9797"
cmd: >
podman run --rm
-v /mnt/nvme/models:/models
--device nvidia.com/gpu=GPU-<redacted>
-p 9797:8000 --ipc=host
--security-opt=label=disable
docker.io/vllm/vllm-openai:v0.6.4
--model "/models/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8"
--served-model-name gpt-4-vision qwen2-vl-7B-gptq-int8
--disable-log-stats
--enforce-eager |
I am currently using it with docker, and it has been working great! Example below:
|
Any issues with swapping between two models with docker you’ve seen? |
Nope, swapping is pretty seamless, no lingering container after swapping. I have llama-swap running as a systemctl service, and even forcibly restarting it with a Docker container loaded doesn't result in the container lingering around. |
Thanks @knguyen298. Closing this issue. I think the
|
I played around with this a bit and I couldn't get it to swap correctly. 🤔 Was there some other setup you did? |
What happens when you try to swap? I also saw that you didn't have the |
@mostlygeek @aiseei
systemctl daemon-reload |
I've messed around with this quite a bit and I still have no idea how you all got it working! 😅 llama-swap sends a SIGTERM to stop before swapping. This is incompatible with docker's client/server model. The Here is a testing config I'm using:
I can get So I'm thinking right now:
For native docker support the config could look like:
In this case if |
I pushed #40 which provides a |
Add `cmd_stop` to model configuration to run a command instead of sending a SIGTERM to shutdown a process before swapping.
@zenabius Would you mind trying out the new release (v84) that I pushed with cmd_stop? Also, as a bit of irony I wrote llama-swap because ollama didn't support the nvidia P40s. So discovering someone running it inside llama-swap is pretty neat! |
This is "old style" shutdown, without "cmd_stop: docker stop"
|
@mostlygeek. |
@mostlygeek forget to ask and could you add please endpoint for OpenWebUI audio "/audio/transcriptions"? and there is more problem, llama-swap don't understand OPTIONS query
|
|
@mostlygeek sorry, /audio/transcriptions is for OpenWebUI audio transcriptions, I wrote it wrong. |
I'm genuinely not sure what is different about my config that it works for me. I believe I am running v83, based on the timestamp on my llama-swap file. Checked my dockerd config file, nothing there except the Nvidia container runtime config. I do see that my systemctl has llama-swap running as a local user, who is a part of the
I'm running Ubuntu Server 22.04.5, with Docker version 27.4.1, build b9d17ea. |
I did as much research as I could and the docs say the docker client will proxy signals and send them to PID 1 in the container. I haven’t been able to get it working as expected. Either way, two ways to shut down containers cleaning now, with the SIGTERM and the cmd. I do prefer just one way though 🤷🏻♂️. |
I figured out my problems was. My docker version was too old ~v24. Thanks @zenabius @knguyen298 for info. |
thanks for this cool project !
Anyway - we could use this to setup, start , stop docker based servers?
The text was updated successfully, but these errors were encountered: