(WIP) Support targeting the embedding layer for LoRA #501

ajtejankar · 2024-06-06T21:07:40Z

What does this PR do?

Re-organize the code in BatchLoraWeights.load. This function was a bit hard to understand as there were multiple list comprehensions with almost same looping logic. So, merged all of them into two loops for improved clarity. @tgaddair Can you confirm if this looks good? I can revert back to the original code in case this change can cause problems.
(WIP) Support embedding layer as a target module. This is mostly done except multi-gpu inference.

This function was a bit hard to understand as there were multiple list comprehensions with almost same looping logic. So, merged all of them into a single for loop so for improved clarity.

It's the same as the one used in outer for loop which can cause confusion

ajtejankar · 2024-06-08T01:53:05Z

@tgaddair I am pushing a partially done commit that supports embedding layer loras.

Similar to HF implementation, lora_A is used for embedding lookup while lora_B is multiplied
Prevents lora_A transpose when in BGMV mode
Contains two implementations to replace two kernels: SGMV and BGMV
Both are implemented with for loops. How can we optimize them?
Cannot handle multi-GPU. I will need some help understanding sharding in LoRAX as I found it confusing. :(
Tested crudely by comparing with generation from HF, but need to add a proper test case.

tgaddair · 2024-06-11T04:37:28Z

server/lorax_server/adapters/lora.py

@@ -40,14 +41,20 @@ def map_weights_for_model(
        adapter_weight_names = set()
        module_map = {}
        for weight_name in weight_names:
-            lora_a_name = f"base_model.model.{weight_name}.lora_A.weight"
-            lora_b_name = f"base_model.model.{weight_name}.lora_B.weight"
+            if EMBED_TOKENS in weight_name:


We might need to make this embed_tokens name a property of the model rather than a constant, as I imagine it will vary from one architecture to the next.

Makes sense. I will make the change.

tgaddair · 2024-06-11T04:42:24Z

server/lorax_server/adapters/lora.py

            module_map[weight_name] = {
-                "lora_A": (adapter_weights[lora_a_name], lora_a_name),
-                "lora_B": (adapter_weights[lora_b_name], lora_b_name),
+                "lora_A": (adapter_weights.pop(lora_a_name), lora_a_name),


Not sure I understand the purpose of using pop here, as it doesn't look like the adapter_weights are used below (unless it's used from the caller). In general, it's good to avoid modifying input objects unless it's clear that the function does that from the name, etc.

In this case, I would suggest cloning the adapter_weights dict at the top to avoid modifying the input, and then returning the modified adapter_weights if the caller needs to check which elements haven't been popped.

Sounds good. I just realized that I may not even need to change this part since adapter_weight_names captures which weights were consumed, and I can use it in the caller to figure out if all weights were consumed.

tgaddair · 2024-06-11T04:52:13Z

server/lorax_server/adapters/lora.py

                batch_indices = [adapter_to_segment[idx] for idx in meta.adapter_indices.tolist()]
-                batch_indices = [idx if idx in rank_indices else -1 for idx in batch_indices]
+                batch_indices = [idx if idx in set(indices) else -1 for idx in batch_indices]


Would rather keep the separate variable as the call to set(indices) each iteration of the loop is unnecessary.

Oh, yes. I will revert back to the original code. The reason to change it was to not have rank_indices variable inside the for loop since the for loop itself loops over another rank_indices variable. Maybe, I can rename the rank_indices here.

tgaddair · 2024-06-11T04:53:47Z

server/lorax_server/utils/adapter.py

+
+    # note(ajinkya): adapter weights are consumed during above mapping but if some are not then we may not be
+    # supporting all the weights in the adapter which should be an error but for now just logging it
+    if len(adapter_weights) > 0:


Per above comment, would return the modified adapter weights as unused_adapter_weights or similar rather than relying on the input to be modified.

Sounds good.

tgaddair · 2024-06-11T04:59:36Z

server/lorax_server/utils/layers.py

+
+        return result
+
+    # def collect_lora_a(self, a_out: torch.Tensor) -> torch.Tensor:


Why was this commented out? Was it raising an error?

I believe an all-reduce should be correct here, as the TensorParallelEmbedding implementation is row parallel.

No, it wasn't raising an error. I deliberately left it out since I didn't fully understand whether it would work. In TensorParallelEmbedding we're sharding the embedding weight matrix but we don't do that for linear layers so I wasn't sure that the weights would be sharded for TensorParallelAdapterRowEmbedding.

tgaddair · 2024-06-11T05:01:34Z

server/lorax_server/adapters/lora.py

-            if adapter_idx not in adapter_weights:
-                continue
-            rank_indices[adapter_weights[adapter_idx].lora_a_r].append(segment_idx)
+            adapter_to_segment[adapter_idx] = segment_idx


Definitely looks cleaner. I believe we have a few unit tests to verify this is working correctly, right?

I saw some test cases, but I am planning to add missing ones as well. I anyway need to add test cases to make sure that ours and HF implementation match.

I will take a proper look at the test cases.

1. Make embedding weight name a property of the model 2. Do not pop the adapter weight names 3. Uncomment collect_lora method

refactor(lora): reorganize the code in BatchLoraWeights.load

03679f3

This function was a bit hard to understand as there were multiple list comprehensions with almost same looping logic. So, merged all of them into a single for loop so for improved clarity.

ajtejankar requested a review from tgaddair June 6, 2024 21:07

ajtejankar added 4 commits June 6, 2024 14:11

refactor(lora): fix max_rank issue

ddc996e

refactor(lora): fix adapter config issue

16d9ebf

refactor(lora): fix weights spelling

1541517

refactor(lora): remove the rank_indices variable

6e3cbc0

It's the same as the one used in outer for loop which can cause confusion

ajtejankar self-assigned this Jun 8, 2024

ajtejankar added 2 commits June 7, 2024 18:53

feat(embed_tokens): Support embed_tokens as a target module

a7ee1c5

bug(ruff): make ruff happy

b72957c

ajtejankar linked an issue Jun 8, 2024 that may be closed by this pull request

Supporting LmHead and Embedding Layers for Adapters #231

Open

4 tasks

bug(tests): fix lora test case

d40c52b

tgaddair reviewed Jun 11, 2024

View reviewed changes

ajtejankar added 5 commits June 26, 2024 11:44

refactor : lora.load function for clarity

f73317e

Merge branch 'main' into target-emb

4f1086e

test: fix tests and refactor a bit more

891049f

tests: expand test cases to check for correct adapter pointers

63019b9

refactor : incorporate suggestions from PR review

65a4c6a

1. Make embedding weight name a property of the model 2. Do not pop the adapter weight names 3. Uncomment collect_lora method

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(WIP) Support targeting the embedding layer for LoRA #501

(WIP) Support targeting the embedding layer for LoRA #501

ajtejankar commented Jun 6, 2024 •

edited

Loading

ajtejankar commented Jun 8, 2024

tgaddair Jun 11, 2024

ajtejankar Jun 11, 2024

tgaddair Jun 11, 2024

ajtejankar Jun 11, 2024

tgaddair Jun 11, 2024

ajtejankar Jun 11, 2024

tgaddair Jun 11, 2024

ajtejankar Jun 11, 2024

tgaddair Jun 11, 2024

ajtejankar Jun 11, 2024 •

edited

Loading

tgaddair Jun 11, 2024

ajtejankar Jun 11, 2024

ajtejankar Jun 11, 2024


		return result

		# def collect_lora_a(self, a_out: torch.Tensor) -> torch.Tensor:

(WIP) Support targeting the embedding layer for LoRA #501

Are you sure you want to change the base?

(WIP) Support targeting the embedding layer for LoRA #501

Conversation

ajtejankar commented Jun 6, 2024 • edited Loading

What does this PR do?

ajtejankar commented Jun 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajtejankar Jun 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajtejankar commented Jun 6, 2024 •

edited

Loading

ajtejankar Jun 11, 2024 •

edited

Loading