-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensor Parallel? Multiple GPU #7
Comments
Yes, you need to adjust a few lines of code, in /~https://github.com/multimodal-art-projection/YuE/blob/main/inference/infer.py See hugging face tutorial: @hf-lin we should support this |
Thank you for the answer. Good day! |
@a43992899 I spun it both ways, but I was unable to get it to start. One of the bugs I have is that I start loading the model on both GPUs at the same time and it crashes on OOM, although when I run it the standard way like you do - everything works but it just takes unbearable time |
I tried several ways to distribute the model across multiple GPUs.
Guys, if you have the opportunity to give an example for the correct distribution of the model to different GPUs, I would be very grateful. |
I used two gpu to run it successfully, but didn't increase the speed |
I have 2x3090 on my server. It is possible to run YuE on both cards in one time?
Also I have this warning:
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with
model.to('cuda').
but why? My GPU is load model
FULL LOG:
The text was updated successfully, but these errors were encountered: