Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiGPU support #184

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

MultiGPU support #184

wants to merge 2 commits into from

Conversation

MrReclusive
Copy link

This just adds the device selection to multiple nodes for those with multi gpu's so we can set nodes to run on specific cuda device(s).

@kijai
Copy link
Owner

kijai commented Dec 23, 2024

Thank you for this, but I'm still hesitant to merge this as I can't test it myself, and I don't know what happens for non-cuda users when trying to populate device selection like this? Also it should not be a required input as that will force everyone to remake the node in old workflows. I would rather have it as either separate node, or maybe best would be an optional device selection input to the nodes so it won't have any effect for everyone not using it, which is still the vast majority of the users.

@MrReclusive
Copy link
Author

I know I do need to update this work with the video enhancer, and what was changed today (Haven't looked yet.)
I've been thinking about the issue with non cuda users myself, and those who don't have multiple gpu's

right now for 1 gpu it just only lists 1 device, not a big deal.
but I have been thinking of another way to do this so it doesn't effect anyone not using multiple gpu's, or those not running cuda.
I was leaning towards the optional, and then just adding a stand alone gpu selection node that would just connect to an optional input.
and this just run an if around this device = mm.get_torch_device() so it runs that if no input is provided.
Does that sound okay?

@kijai
Copy link
Owner

kijai commented Dec 24, 2024

I know I do need to update this work with the video enhancer, and what was changed today (Haven't looked yet.)
I've been thinking about the issue with non cuda users myself, and those who don't have multiple gpu's

right now for 1 gpu it just only lists 1 device, not a big deal.
but I have been thinking of another way to do this so it doesn't effect anyone not using multiple gpu's, or those not running cuda.
I was leaning towards the optional, and then just adding a stand alone gpu selection node that would just connect to an optional input.
and this just run an if around this device = mm.get_torch_device() so it runs that if no input is provided.
Does that sound okay?

Yeah exactly what I was thinking, it's the most non-invasive way to add it I can think of. Whatever way the given node currently chooses the device shouldn't change, and when the optional input is given it just would override it.

@MrReclusive
Copy link
Author

I know I do need to update this work with the video enhancer, and what was changed today (Haven't looked yet.)
I've been thinking about the issue with non cuda users myself, and those who don't have multiple gpu's
right now for 1 gpu it just only lists 1 device, not a big deal.
but I have been thinking of another way to do this so it doesn't effect anyone not using multiple gpu's, or those not running cuda.
I was leaning towards the optional, and then just adding a stand alone gpu selection node that would just connect to an optional input.
and this just run an if around this device = mm.get_torch_device() so it runs that if no input is provided.
Does that sound okay?

Yeah exactly what I was thinking, it's the most non-invasive way to add it I can think of. Whatever way the given node currently chooses the device shouldn't change, and when the optional input is given it just would override it.

alright, will do, have it ready in a day or so.
Thanks!

@zazoum-art
Copy link

I downloaded you repo, erased kijai's, opened your civit workflow and queue is giving an error that the cuda: doesn't belong in the group of the input. If I fix or Fix v2 or reload returns to kijai's normal nodes. I have left the custom nodes git to its original name.

I have a 4090 and a 3060 and can't test it with your civit workflow for the above reasons.

@zazoum-art
Copy link

zazoum-art commented Dec 24, 2024

@MrReclusive There is in manager a multi gpu nodes by pollockjj . Should I use this? The nodes normally detect my gpus but I have to re-create your workflows. Maybe put in your repo a notice readme with bold "FOR TESTING ONLY" and post safe steps on to test it.

EDIT: OK, I found it, I downloaded the zip and didn;t notice the missing files!
Gonna test the HECK out of it!!!
This is very important, especially before 5090 and i2v by Tencent,

@kendrick90
Copy link

I have a 4090 and quadro M6000 I can test with when the next update with the separate optional node is ready. If it's helpful let me know.

@Subarasheese
Copy link

Subarasheese commented Dec 28, 2024

I have a dual 3090 setup and I am getting this:

Exception Message: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+

Will this only work on newer Ada arch cards? Thank you

@MrReclusive
Copy link
Author

MrReclusive commented Dec 31, 2024

Hey everyone, I have the new setup coded for optional so it doesn't interfere with existing setup, just going through other nodes I don't use to see if they benefit from it to, and testing what can and can't be moved (expecting same device errors.)
I have been kind of taking some time off for the holidays so haven't been at my computer much.

I can't test anything besides the 4xxxx series of cards, as I gave away all my 3xxx series cards last year, but nothing in what I've changed in code should effect the ability to use other cards if they already work in this video wrapper.
all its doing is telling it what device to use.

and for the polluckjj multigpu, I looked at it, its based off neuratech-ai's multigpu code, which is what I based this on as well.

Ill have the new fork and new template up tomorrow, just need to remove all my other custom nodes from the template I use. (i've been digging way into custom prompting on this.)

image

@MrReclusive MrReclusive reopened this Jan 1, 2025
@MrReclusive
Copy link
Author

Fully up to date.
All device selection is optional with defaults so ignored if cuda device selector isn't connected.
text encoder example.
image
Full example that ill be uploading to civitai.
Screenshot 2024-12-31 232121

@MrReclusive
Copy link
Author

I am also working on a image splitter that accounts for the required frames when splitting for the purpose of rendering initial video with high frame count at low resolution, then splitting it into 2/3/4 separate batches so you can upscale, if anyone is interested in that ill be posting that soon.
I thought about doing it at the latent level, but that first latent confuses me, this 4*X + 1 is becoming the bane of my existence.

Updated with new sampler stuff.
@zazoum-art
Copy link

@MrReclusive
Your civitai workflow works like a charm with 4090 + 3060 (clip+LLM fit in 3060). Can you explain this latest change [81d87dc]? Is it stable?

@zazoum-art
Copy link

zazoum-art commented Jan 6, 2025

I give a try with o1 and o1-pro-mode in PRO plan to combine inference in a sampler (both GPUs working "paraller" NOT sequenceal). I have no idea on what I am doing, o1 explains it to me and suggests. From what I understood, it "breaks the latents"? I just do the debugging. Its a living hell just as o1 told me it would be.
I posted to it the WHOLE kijai's wrapper, one file by one. You can only upload images to o1. So, yes, I did that. Is it OK LICENSE-wise @kijai ?
I take as a constant my specific set-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants