Skip to content

FIX set_lora_device when target layers differ #11844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

BenjaminBossan
Copy link
Member

What does this PR do?

Resolves #11833

Fixes a bug that occurs after calling set_lora_device when multiple LoRA adapters are loaded that target different layers.

Note: Technically, the accompanying test does not require a GPU because the bug is triggered even if the parameters are already on the corresponding device, i.e. loading on CPU and then changing the device to CPU is sufficient to cause the bug. However, this may be optimized away in the future, so I decided to test with GPU.

I ran the test locally with GPU and it passes.

Before submitting

Resolves huggingface#11833

Fixes a bug that occurs after calling set_lora_device when multiple LoRA
adapters are loaded that target different layers.

Note: Technically, the accompanying test does not require a GPU because
the bug is triggered even if the parameters are already on the
corresponding device, i.e. loading on CPU and then changing the device
to CPU is sufficient to cause the bug. However, this may be optimized
away in the future, so I decided to test with GPU.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan
Copy link
Member Author

@sayakpaul Please review. The failing CI appears to be unrelated.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for quick resolve! I left some comments.

def test_integration_set_lora_device_different_target_layers(self):
# fixes a bug that occurred when calling set_lora_device with multiple adapters loaded that target different
# layers, see #11833
from peft import LoraConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could move it here:

if is_peft_available():

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean in case we move the test there?

Copy link
Member

@sayakpaul sayakpaul Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant moving the import there (i.e., under is_peft_available()).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the import to another file would not help us avoiding it here, or am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, in my mind, I had already that this test is under tests/lora/utils.py. As it seems like we are going to move the test there, let's make use to move the LoraConfig import under the block specified?

Comment on lines +243 to +256
for name, module in pipe.unet.named_modules():
if "adapter-0" in name and not isinstance(module, (nn.Dropout, nn.Identity)):
self.assertTrue(module.weight.device == torch.device("cpu"))
elif "adapter-1" in name and not isinstance(module, (nn.Dropout, nn.Identity)):
self.assertTrue(module.weight.device == torch.device("cpu"))

# setting both at once also works
pipe.set_lora_device(["adapter-0", "adapter-1"], torch_device)

for name, module in pipe.unet.named_modules():
if "adapter-0" in name and not isinstance(module, (nn.Dropout, nn.Identity)):
self.assertTrue(module.weight.device != torch.device("cpu"))
elif "adapter-1" in name and not isinstance(module, (nn.Dropout, nn.Identity)):
self.assertTrue(module.weight.device != torch.device("cpu"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also test inference by invoking pipe(...)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add inference. I didn't do it because it is not done in test_integration_move_lora_cpu either and I thought it might distract from the main point of the test, but I can add it still.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay then. Let's keep inference out of the question. But IMO, it is just helpful to also test that the pipeline isn't impacted as far as running inference is concerned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having another test for inference would make sense, I guess it could be similar but only check for inference. Presumably, that would be an actual @slow test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't suggesting another test, though. Just adding the pipe(**inputs) line won't likely be too much of an unpleasantness in the current test?

@@ -208,6 +208,53 @@ def test_integration_move_lora_dora_cpu(self):
if "lora_" in name:
self.assertNotEqual(param.device, torch.device("cpu"))

@slow
@require_torch_accelerator
def test_integration_set_lora_device_different_target_layers(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes sense to have this as a fast test and have it available in tests/lora/utils.py?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what qualifies as a slow test in diffusers and what doesn't. As this test is very similar to test_integration_move_lora_cpu, which is marked as @slow, I opted to do the same here. As to where to put the test, I can move it to utils.py but I chose to keep put it here because test_integration_move_lora_cpu is also here.

If we decide to move it, would you still keep @require_torch_accelerator or do everything on CPU (as I mentioned, GPU is not strictly required to trigger the bug).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test_integration_move_lora_cpu was added like that to quickly fix the underlying issue and validate the solution while doing so. Yes, that test also deserved to be moved to tests/lora/utils.py. Do you see any reason not to?

I think a majority of the slow tests should essentially be "integration" tests (i.e., the ones that test with real checkpoints), barring a few where fast testing isn't really possible.

If we decide to move it, would you still keep @require_torch_accelerator or do everything on CPU (as I mentioned, GPU is not strictly required to trigger the bug).

I won't keep the @require_torch_accelerator then if CPU (or more generally, torch_device) is enough to test it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test_integration_move_lora_cpu was added like that to quickly fix the underlying issue and validate the solution while doing so. Yes, that test also deserved to be moved to tests/lora/utils.py.

Okay, I can do that in this PR. Same for test_integration_move_lora_dora_cpu?

Do you see any reason not to?

As I'm not sure what the organization of tests is here, I can't say.

I won't keep the @require_torch_accelerator then if CPU (or more generally, torch_device) is enough to test it.

Okay, so I'd remove the @require_torch_accelerator decorator but keep on using torch_device as is presently done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can do that in this PR. Same for test_integration_move_lora_dora_cpu?

Yeah sure.

I am also happy to merge the PR after #11844 (comment) and do the organizing myself in a future PR, and ask for your reviews.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I just saw your comment above:

# Keeping this test here makes sense because it doesn't look any integration
# (value assertions on logits).

So it seems the test should stay here after all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still didn't make sense to keep "keeping" it there after the number of pipelines supporting LoRAs increased from just SD and SDXL. So, I still abide by what I said in the above comments i.e., moving those tests to tests/lora/utils.py. But as also mentioned in #11844 (comment), it's fine if you don't do that and we tackle it in an immediate future PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

set_lora_device reports KeyError even though lora is loaded
3 participants