Some questions about the implementions and results, particularly related to ROME/MEND/SERAC #417

StarLooo · 2024-11-11T02:06:29Z

Hello!
After thoroughly reviewing the codes related to the implementation of using AdaLoRA and ROME to edit models, I have a few questions:

StarLooo · 2024-11-11T02:06:49Z

The if-else process after single edit seems to be a little confusing:

                if self.alg_name == 'KN' or self.alg_name == 'GRACE' or self.alg_name == 'WISE':
                    with torch.no_grad():
                        weights_copy()
                elif self.alg_name == 'LoRA' or self.alg_name == 'QLoRA' or self.alg_name == 'DPO':
                    edited_model.unload()
                    del self.model.peft_config
                elif self.alg_name == 'MELO':
                    self.model = edited_model
                elif self.alg_name == 'LoRA' or self.alg_name == 'QLoRA' or self.alg_name == 'DPO':
                    self.model = edited_model
                else:
                    with torch.no_grad():
                        for k, v in weights_copy.items():
                            nethook.get_parameter(self.model, k)[...] = v.to(f"cuda:{self.hparams.device}")

As you can see, here are two elif branch for LoRA, and I think the second one is redundant. Besides, based on my understanding, an important thing is that the self.model and edited_model are bound together in execute_lora function. That's why we need to unload without merge and del peft_config after every edit in the single edit scenario.

StarLooo · 2024-11-11T02:07:15Z

Question about the layer to excute ROME.

In the ROME hparams configs, I found that you set the default hyper-parameter values of layers (which means the layers to conduct ROME), such as [5] for Llama. I am somewhat puzzled about how this hyperparameter was obtained, as the original paper first employed a complex Casual Tracing method which is hard to re-implement to determine the layer where ROME should be executed. And intuitively, the 5th layer seems too low in the total 32 layers architecture of Llama.

StarLooo · 2024-11-11T02:14:07Z

The correspondence between compute_u and compute_v and the formulas in the original ROME paper.

In your implemention of ROME, you use compute_u and compute_v to compute the left vector and right_vector respectively. But I don't find any description of u/v and left vector/right_vector in the original ROME paper. Where are they come from?

After carefully reading the relevant codes, I attempted to understand the functions of these two methods and their correspondence to the formulas (Eq.2) in ROME paper. I have a general grasp, but there are still a few details that are unclear:

Firstly, let us review the solution results of ROME, that is, the Equation 2 in the paper:

So, the first step is choosing k* to Select the Subject, as the Equation 3 in the paper:

This is what the compute_u implements.

Then, the second step is to optimize v*, as the Equation 4 in the paper:

which is implemented in the compute_v.

The last step is to update weights using k* and v* according the Equation 2.
If we compare the final product of the left vector and right_vector with the Equation 2:
upd_matrix = left_vector.unsqueeze(1) @ right_vector.unsqueeze(0)
where the right_vector is computed by:
right_vector = (target - cur_output) / torch.dot(cur_input, left_vector)
It seems that the usage of compute_v is to compute $Λ$ in the Equation 2:

Here, target corresponds to v*; cur_output corresponds to Wk*; cur_input corresponds to k*; and left_vector corresponds to C^(-1)k*.
And the usage of comput_u is to compute C^(-1)k*.

This is true when mom2_adjustment=true. But as the default setting in hparams config of ROME is mom2_adjustment=false, there are differences. Actually, target still corresponds to v*; cur_output still corresponds to Wk*; cur_input still corresponds to k*; nut left_vector only corresponds to k* instead of C^(-1)k*.
And this is why the ROME do not use external text (such as the wikipedia) to esitimate the second moment statistics C=KK^T in the default hparams setting. Related codes can be found in get_inv_cov in compute_u.py and layer_stats in layer_stats.py.

StarLooo · 2024-11-11T02:14:59Z

The correspondence between compute_u and compute_v and the formulas in the original ROME paper.

I also try to set mom2_adjustment to be true, and fix the bug that cannot load_dataset 20200501.en by changing it to 20220301.en (inspired by #309), but the edit results on wikidata_recent and wikidata_counterfact are similar to the default setting with slightly increase on Edit Succ and Locality and slightly decrease on Portability.

Additionally, I don't understand the necessity of normalizing u when returning it in compute_u:
return u / u.norm()

littlefive5 · 2024-11-11T02:38:50Z

You're right, the second one is redundant.
This layer is also conducted by casual analysis; you can do this from ROME's original code. For LLAMA, we just get the information from FastEdit' author.
The computation is the same as the ROME's original code, we keep it unchanged.
The error load_dataset 20200501.en is due to the version of the datasets, I think it would affect the results but we won't modify it as the computing is time-consuming, we just use our cached computing for the npm file. Setting mom2_adjustment would affect the results. We set it false for a quick compute as many users do not want to compute npm locally and we cannot provide npm for each model of each version. But if you want to get the original ROME's results, you should set it as true and compute the npm locally. Meanwhile, when you use MEMIT, you need to use the 'npm' for all the required layers.

StarLooo · 2024-11-11T03:01:04Z

You're right, the second one is redundant.

This layer is also conducted by casual analysis; you can do this from ROME's original code. For LLAMA, we just get the information from FastEdit' author.

The computation is the same as the ROME's original code, we keep it unchanged.

The error load_dataset 20200501.en is due to the version of the datasets, I think it would affect the results but we won't modify it as the computing is time-consuming, we just use our cached computing for the npm file. Setting mom2_adjustment would affect the results. We set it false for a quick compute as many users do not want to compute npm locally and we cannot provide npm for each model of each version. But if you want to get the original ROME's results, you should set it as true and compute the npm locally. Meanwhile, when you use MEMIT, you need to use the 'npm' for all the required layers.

Thanks for your timely reply, and I still have some questions:

What is the meaning of npm you mentioned above? I think there should be some caching mechanism (such as persistent the inv_mom2_cache as a pickle file) here to store the C needed for get_inv_cov, similar to the pre-edit files, but I haven't found the corresponding implementation in the current code.
Which version of datasets should I use to get the 20200501.en? Since the original ROME paper didn't conduct experiment using LLama, how to choose the dataset (and dataset cut off date) to estimate the second moment statics C=KK^T?
Does the mom2_adjustment setting have a significant impact on the results?
I guess the normalization k / k.norm() may be a feasible alternative to C^(-1)k when mom2_adjustment=False.

StarLooo · 2024-11-11T04:07:46Z

Another new question:
I find that the layers used in FT-L/FT-M and ROME are different:
In ROME, the default setting of layer is the 5th layer of llama2;
But in FT, the default setting of layer is the 21th layer of llama2.

littlefive5 · 2024-11-11T04:26:15Z

Sorry, it's npz is the cache file of the covariance https://github.com/zjunlp/EasyEdit/blob/main/easyeditor/models/rome/layer_stats.py#L163
I think it would have an influence but we're not sure whether it is significant and I recommend you set it as true as in our edit test in Chinses editing, set it as false would damage the performance. You can decide it based on your choice.
I think it is datasets==1.18.3 but I'm not sure, it's ok for you to use the new version, from our experience, this would not make much difference.
Yes, even in the original ROME paper it is different, in our setting for LLAMA2 we just select it randomly, you can try different layers.

StarLooo · 2024-11-11T04:40:12Z

Sorry, it's npz is the cache file of the covariance https://github.com/zjunlp/EasyEdit/blob/main/easyeditor/models/rome/layer_stats.py#L163

I think it would have an influence but we're not sure whether it is significant and I recommend you set it as true as in our edit test in Chinses editing, set it as false would damage the performance. You can decide it based on your choice.

I think it is datasets==1.18.3 but I'm not sure, it's ok for you to use the new version, from our experience, this would not make much difference.

Yes, even in the original ROME paper it is different, in our setting for LLAMA2 we just select it randomly, you can try different layers.

Well, in my exploration experiments, setting mom2_adjustment=true does not make a significant difference. By the way, are the ROME results reported by you conducted with mom2_adjustment=true or mom2_adjustment=false?
Do you mean the layer used for FT is just heuristically selected. Intuitively, I think 21th layer is a feasible position, but 5th layer for ROME seems to be a little low. I'll check the influence of layer, but it's not a crucial topic.

StarLooo · 2024-11-11T04:43:37Z

I'm more curious about how to support MEND and SERAC, which require training before editing, in run_knowedit_llama2.py. I see that the EasyEdit framework actually implements these two methods, but I am not sure how to incorporate them into the existing run_knowedit_llama2.py codes and using them on the knowedit benchmark. Are there any difficulties or complications in implementing this? I noticed that you also reported these two edit methods' results on your survey papers. Thanks!

littlefive5 · 2024-11-11T04:51:57Z

You just need to first train the MEND and SERAC module and set them in the hparam files.
https://github.com/zjunlp/EasyEdit/blob/main/hparams/MEND/llama-7b.yaml#L3
https://github.com/zjunlp/EasyEdit/blob/main/hparams/SERAC/llama-7b.yaml#L3

For training, refer to https://github.com/zjunlp/EasyEdit?tab=readme-ov-file#trainer

StarLooo · 2024-11-11T06:29:51Z

You just need to first train the MEND and SERAC module and set them in the hparam files. https://github.com/zjunlp/EasyEdit/blob/main/hparams/MEND/llama-7b.yaml#L3 https://github.com/zjunlp/EasyEdit/blob/main/hparams/SERAC/llama-7b.yaml#L3

For training, refer to https://github.com/zjunlp/EasyEdit?tab=readme-ov-file#trainer

Thank you and I'll check it.

StarLooo · 2024-11-11T06:45:16Z

You just need to first train the MEND and SERAC module and set them in the hparam files. https://github.com/zjunlp/EasyEdit/blob/main/hparams/MEND/llama-7b.yaml#L3 https://github.com/zjunlp/EasyEdit/blob/main/hparams/SERAC/llama-7b.yaml#L3

For training, refer to https://github.com/zjunlp/EasyEdit?tab=readme-ov-file#trainer

I have observed that the ZSRE datasets employed for MEND training might exhibit certain discrepancies when compared to the ZSRE test set in the KnowEdit benchmark. Specifically, there are three JSON files involved: zsre_mend_eval.json, zsre_mend_train.json, and zsre_mend_train_10000.json.

The zsre_mend_train.json file is notably large(82MB). Given this, it is probable that the training process should utilize the zsre_mend_train_10000.json file instead. This smaller dataset likely corresponds to the training split size of 10,000 instances that you previously mentioned here: https://github.com/zjunlp/EasyEdit?tab=readme-ov-file#dataset

But when I try to train on the zsre_mend_train_10000.json, the training process seems never satisfy the early exit demand. So I'll change the setting and try again.

StarLooo · 2024-11-12T02:10:10Z

Well, in my exploration experiments, setting mom2_adjustment=true does not make a significant difference. By the way, are the ROME results reported by you conducted with mom2_adjustment=true or mom2_adjustment=false?

Do you mean the layer used for FT is just heuristically selected. Intuitively, I think 21th layer is a feasible position, but 5th layer for ROME seems to be a little low. I'll check the influence of layer, but it's not a crucial topic.

After I change the layer from 5 to 21, with mom2_adjustment=false, I observed that the Edit Succ and Portability have a slight decrease, and the Locality increases by a large margin.

StarLooo · 2024-11-13T05:50:56Z

You just need to first train the MEND and SERAC module and set them in the hparam files. https://github.com/zjunlp/EasyEdit/blob/main/hparams/MEND/llama-7b.yaml#L3 https://github.com/zjunlp/EasyEdit/blob/main/hparams/SERAC/llama-7b.yaml#L3

For training, refer to https://github.com/zjunlp/EasyEdit?tab=readme-ov-file#trainer

I have tried to firstly train MEDN on zsre_mend_train.json with the default hparams (I just casually pick a ckpt after training step 50000 since the early stop condition seems will not be satisfied until the max training steps 100000, but I don't know why, this is another question), and then use the trained model to conduct single edit on Wikidata_recent, here is my result:

Edit_Succ: 96.36
Overall_portability: 62.23
Overall_locality: 68.03
Fluency: 564.16

The result seems to be rational but somewhat higher than the reported values.
I'll also try SERAC latter.

littlefive5 · 2024-11-17T06:02:48Z

Sorry for being late as I was in EMNLP last week.
We trained the model with zsre_mend_train_10000.json. I'm not sure what happened here for the early stop condition. We just selected the last checkpoint.

zxlzr · 2024-11-17T07:06:38Z

Hi, have you solved your issue yet?

StarLooo · 2024-11-18T01:28:52Z

Sorry for being late as I was in EMNLP last week. We trained the model with zsre_mend_train_10000.json. I'm not sure what happened here for the early stop condition. We just selected the last checkpoint.

Thanks for your reply. I also have another question, when I try a similar way to use SERAC, I met some run time errors:
The main reason I think is that in the apply_to_model (in serac_main.py), the self.alg.edit returns a wrapped SERAC class instead of a raw model class, and will cause error in the generate_fast :

            model_out = model(
                input_ids=input_ids[:, cur_context],
                attention_mask=None if 'llama' in model.name_or_path.lower() or 'baichuan' in model.name_or_path.lower()
                else attention_mask[:, cur_context],
                past_key_values=past_key_values,
                use_cache=True,
            )

here, the model is a SERAC class and has no name_or_path.
Additionally, I noticed some noteworthy values in the SERAC training logs that appear to be unusually low:

loss/loc_val        :  0.00245
edit/acc_val        :  0.29113
edit/log_prob_val   : -5.94367
edit/prob_val       :  0.25925
acc/pre_val         :  0.01336
acc/post_val        :  0.01336
nll/pre_val         :  13.23586
perplexity/pre_val  :  560094.25000
nll/post_val        :  11.91173
perplexity/post_val :  149003.45312
n_tokens/pre_val    :  7.30417
n_tokens/post_val   :  7.30417
time/edit_val       :  0.00480
loss/total_val      :  0.59681
loss/total_edit_val :  0.59681
memory/alloc_max_val:  32896796160.00000
memory/res_max_val  :  33632026624.00000
eval_time/elapsed   :  4866.71193
eval_time/average   :  0.25602

StarLooo · 2024-11-18T01:29:33Z

Hi, have you solved your issue yet?

not yet, I still have some difficulty running the SERAC.

tbozhong · 2024-11-25T13:24:04Z

The error with generate_fast is due to class SERAC only implementing the generate method and not being compatible with generate_fast. Please set vanilla_generation=True when evaluating SERAC with generate_fast, and it should work correctly. We will adapt SERAC to generate_fast as soon as possible.
I re-cloned EasyEdit and ran it, but I was unable to reproduce the performance you demonstrated. We are reproducing the main table of the survey again, and we will notify you immediately if there are any updates.

zxlzr added the question Further information is requested label Nov 11, 2024

StarLooo changed the title ~~Some questions about the implemention, particularly related to ROME~~ Some questions about the implemention, particularly related to ROME/MEND/SERAC Nov 13, 2024

StarLooo changed the title ~~Some questions about the implemention, particularly related to ROME/MEND/SERAC~~ Some questions about the implementions and results, particularly related to ROME/MEND/SERAC Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about the implementions and results, particularly related to ROME/MEND/SERAC #417

Some questions about the implementions and results, particularly related to ROME/MEND/SERAC #417

StarLooo commented Nov 11, 2024

StarLooo commented Nov 11, 2024

StarLooo commented Nov 11, 2024

StarLooo commented Nov 11, 2024

StarLooo commented Nov 11, 2024

littlefive5 commented Nov 11, 2024

StarLooo commented Nov 11, 2024 •

edited

Loading

StarLooo commented Nov 11, 2024

littlefive5 commented Nov 11, 2024

StarLooo commented Nov 11, 2024

StarLooo commented Nov 11, 2024 •

edited

Loading

littlefive5 commented Nov 11, 2024

StarLooo commented Nov 11, 2024

StarLooo commented Nov 11, 2024 •

edited

Loading

StarLooo commented Nov 12, 2024

StarLooo commented Nov 13, 2024

littlefive5 commented Nov 17, 2024

zxlzr commented Nov 17, 2024

StarLooo commented Nov 18, 2024

StarLooo commented Nov 18, 2024

tbozhong commented Nov 25, 2024

Some questions about the implementions and results, particularly related to ROME/MEND/SERAC #417

Some questions about the implementions and results, particularly related to ROME/MEND/SERAC #417

Comments

StarLooo commented Nov 11, 2024

StarLooo commented Nov 11, 2024

StarLooo commented Nov 11, 2024

StarLooo commented Nov 11, 2024

StarLooo commented Nov 11, 2024

littlefive5 commented Nov 11, 2024

StarLooo commented Nov 11, 2024 • edited Loading

StarLooo commented Nov 11, 2024

littlefive5 commented Nov 11, 2024

StarLooo commented Nov 11, 2024

StarLooo commented Nov 11, 2024 • edited Loading

littlefive5 commented Nov 11, 2024

StarLooo commented Nov 11, 2024

StarLooo commented Nov 11, 2024 • edited Loading

StarLooo commented Nov 12, 2024

StarLooo commented Nov 13, 2024

littlefive5 commented Nov 17, 2024

zxlzr commented Nov 17, 2024

StarLooo commented Nov 18, 2024

StarLooo commented Nov 18, 2024

tbozhong commented Nov 25, 2024

StarLooo commented Nov 11, 2024 •

edited

Loading

StarLooo commented Nov 11, 2024 •

edited

Loading

StarLooo commented Nov 11, 2024 •

edited

Loading