Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about the implementions and results, particularly related to ROME/MEND/SERAC #417

Open
StarLooo opened this issue Nov 11, 2024 · 20 comments
Labels
question Further information is requested

Comments

@StarLooo
Copy link

Hello!
After thoroughly reviewing the codes related to the implementation of using AdaLoRA and ROME to edit models, I have a few questions:

@StarLooo
Copy link
Author

  1. The if-else process after single edit seems to be a little confusing:
                if self.alg_name == 'KN' or self.alg_name == 'GRACE' or self.alg_name == 'WISE':
                    with torch.no_grad():
                        weights_copy()
                elif self.alg_name == 'LoRA' or self.alg_name == 'QLoRA' or self.alg_name == 'DPO':
                    edited_model.unload()
                    del self.model.peft_config
                elif self.alg_name == 'MELO':
                    self.model = edited_model
                elif self.alg_name == 'LoRA' or self.alg_name == 'QLoRA' or self.alg_name == 'DPO':
                    self.model = edited_model
                else:
                    with torch.no_grad():
                        for k, v in weights_copy.items():
                            nethook.get_parameter(self.model, k)[...] = v.to(f"cuda:{self.hparams.device}")

As you can see, here are two elif branch for LoRA, and I think the second one is redundant. Besides, based on my understanding, an important thing is that the self.model and edited_model are bound together in execute_lora function. That's why we need to unload without merge and del peft_config after every edit in the single edit scenario.

@StarLooo
Copy link
Author

  1. Question about the layer to excute ROME.

In the ROME hparams configs, I found that you set the default hyper-parameter values of layers (which means the layers to conduct ROME), such as [5] for Llama. I am somewhat puzzled about how this hyperparameter was obtained, as the original paper first employed a complex Casual Tracing method which is hard to re-implement to determine the layer where ROME should be executed. And intuitively, the 5th layer seems too low in the total 32 layers architecture of Llama.

@StarLooo
Copy link
Author

  1. The correspondence between compute_u and compute_v and the formulas in the original ROME paper.

In your implemention of ROME, you use compute_u and compute_v to compute the left vector and right_vector respectively. But I don't find any description of u/v and left vector/right_vector in the original ROME paper. Where are they come from?

After carefully reading the relevant codes, I attempted to understand the functions of these two methods and their correspondence to the formulas (Eq.2) in ROME paper. I have a general grasp, but there are still a few details that are unclear:

Firstly, let us review the solution results of ROME, that is, the Equation 2 in the paper:
image
So, the first step is choosing k* to Select the Subject, as the Equation 3 in the paper:
image
This is what the compute_u implements.

Then, the second step is to optimize v*, as the Equation 4 in the paper:
image
which is implemented in the compute_v.

The last step is to update weights using k* and v* according the Equation 2.
If we compare the final product of the left vector and right_vector with the Equation 2:
upd_matrix = left_vector.unsqueeze(1) @ right_vector.unsqueeze(0)
where the right_vector is computed by:
right_vector = (target - cur_output) / torch.dot(cur_input, left_vector)
It seems that the usage of compute_v is to compute $Λ$ in the Equation 2:
image
Here, target corresponds to v*; cur_output corresponds to Wk*; cur_input corresponds to k*; and left_vector corresponds to C^(-1)k*.
And the usage of comput_u is to compute C^(-1)k*.

This is true when mom2_adjustment=true. But as the default setting in hparams config of ROME is mom2_adjustment=false, there are differences. Actually, target still corresponds to v*; cur_output still corresponds to Wk*; cur_input still corresponds to k*; nut left_vector only corresponds to k* instead of C^(-1)k*.
And this is why the ROME do not use external text (such as the wikipedia) to esitimate the second moment statistics C=KK^T in the default hparams setting. Related codes can be found in get_inv_cov in compute_u.py and layer_stats in layer_stats.py.

@StarLooo
Copy link
Author

  1. The correspondence between compute_u and compute_v and the formulas in the original ROME paper.

I also try to set mom2_adjustment to be true, and fix the bug that cannot load_dataset 20200501.en by changing it to 20220301.en (inspired by #309), but the edit results on wikidata_recent and wikidata_counterfact are similar to the default setting with slightly increase on Edit Succ and Locality and slightly decrease on Portability.

Additionally, I don't understand the necessity of normalizing u when returning it in compute_u:
return u / u.norm()

@littlefive5
Copy link
Collaborator

  1. You're right, the second one is redundant.
  2. This layer is also conducted by casual analysis; you can do this from ROME's original code. For LLAMA, we just get the information from FastEdit' author.
  3. The computation is the same as the ROME's original code, we keep it unchanged.
  4. The error load_dataset 20200501.en is due to the version of the datasets, I think it would affect the results but we won't modify it as the computing is time-consuming, we just use our cached computing for the npm file. Setting mom2_adjustment would affect the results. We set it false for a quick compute as many users do not want to compute npm locally and we cannot provide npm for each model of each version. But if you want to get the original ROME's results, you should set it as true and compute the npm locally. Meanwhile, when you use MEMIT, you need to use the 'npm' for all the required layers.

@StarLooo
Copy link
Author

StarLooo commented Nov 11, 2024

  1. You're right, the second one is redundant.
  2. This layer is also conducted by casual analysis; you can do this from ROME's original code. For LLAMA, we just get the information from FastEdit' author.
  3. The computation is the same as the ROME's original code, we keep it unchanged.
  4. The error load_dataset 20200501.en is due to the version of the datasets, I think it would affect the results but we won't modify it as the computing is time-consuming, we just use our cached computing for the npm file. Setting mom2_adjustment would affect the results. We set it false for a quick compute as many users do not want to compute npm locally and we cannot provide npm for each model of each version. But if you want to get the original ROME's results, you should set it as true and compute the npm locally. Meanwhile, when you use MEMIT, you need to use the 'npm' for all the required layers.

Thanks for your timely reply, and I still have some questions:

  1. What is the meaning of npm you mentioned above? I think there should be some caching mechanism (such as persistent the inv_mom2_cache as a pickle file) here to store the C needed for get_inv_cov, similar to the pre-edit files, but I haven't found the corresponding implementation in the current code.
  2. Which version of datasets should I use to get the 20200501.en? Since the original ROME paper didn't conduct experiment using LLama, how to choose the dataset (and dataset cut off date) to estimate the second moment statics C=KK^T?
  3. Does the mom2_adjustment setting have a significant impact on the results?
  4. I guess the normalization k / k.norm() may be a feasible alternative to C^(-1)k when mom2_adjustment=False.

@StarLooo
Copy link
Author

Another new question:
I find that the layers used in FT-L/FT-M and ROME are different:
In ROME, the default setting of layer is the 5th layer of llama2;
But in FT, the default setting of layer is the 21th layer of llama2.

@littlefive5
Copy link
Collaborator

  1. Sorry, it's npz is the cache file of the covariance https://github.com/zjunlp/EasyEdit/blob/main/easyeditor/models/rome/layer_stats.py#L163
  2. I think it would have an influence but we're not sure whether it is significant and I recommend you set it as true as in our edit test in Chinses editing, set it as false would damage the performance. You can decide it based on your choice.
  3. I think it is datasets==1.18.3 but I'm not sure, it's ok for you to use the new version, from our experience, this would not make much difference.
  4. Yes, even in the original ROME paper it is different, in our setting for LLAMA2 we just select it randomly, you can try different layers.

@StarLooo
Copy link
Author

  1. Sorry, it's npz is the cache file of the covariance https://github.com/zjunlp/EasyEdit/blob/main/easyeditor/models/rome/layer_stats.py#L163
  2. I think it would have an influence but we're not sure whether it is significant and I recommend you set it as true as in our edit test in Chinses editing, set it as false would damage the performance. You can decide it based on your choice.
  3. I think it is datasets==1.18.3 but I'm not sure, it's ok for you to use the new version, from our experience, this would not make much difference.
  4. Yes, even in the original ROME paper it is different, in our setting for LLAMA2 we just select it randomly, you can try different layers.
  1. Well, in my exploration experiments, setting mom2_adjustment=true does not make a significant difference. By the way, are the ROME results reported by you conducted with mom2_adjustment=true or mom2_adjustment=false?
  2. Do you mean the layer used for FT is just heuristically selected. Intuitively, I think 21th layer is a feasible position, but 5th layer for ROME seems to be a little low. I'll check the influence of layer, but it's not a crucial topic.

@StarLooo
Copy link
Author

StarLooo commented Nov 11, 2024

I'm more curious about how to support MEND and SERAC, which require training before editing, in run_knowedit_llama2.py. I see that the EasyEdit framework actually implements these two methods, but I am not sure how to incorporate them into the existing run_knowedit_llama2.py codes and using them on the knowedit benchmark. Are there any difficulties or complications in implementing this? I noticed that you also reported these two edit methods' results on your survey papers. Thanks!

@zxlzr zxlzr added the question Further information is requested label Nov 11, 2024
@littlefive5
Copy link
Collaborator

@StarLooo
Copy link
Author

You just need to first train the MEND and SERAC module and set them in the hparam files. https://github.com/zjunlp/EasyEdit/blob/main/hparams/MEND/llama-7b.yaml#L3 https://github.com/zjunlp/EasyEdit/blob/main/hparams/SERAC/llama-7b.yaml#L3

For training, refer to https://github.com/zjunlp/EasyEdit?tab=readme-ov-file#trainer

Thank you and I'll check it.

@StarLooo
Copy link
Author

StarLooo commented Nov 11, 2024

You just need to first train the MEND and SERAC module and set them in the hparam files. https://github.com/zjunlp/EasyEdit/blob/main/hparams/MEND/llama-7b.yaml#L3 https://github.com/zjunlp/EasyEdit/blob/main/hparams/SERAC/llama-7b.yaml#L3

For training, refer to https://github.com/zjunlp/EasyEdit?tab=readme-ov-file#trainer

I have observed that the ZSRE datasets employed for MEND training might exhibit certain discrepancies when compared to the ZSRE test set in the KnowEdit benchmark. Specifically, there are three JSON files involved: zsre_mend_eval.json, zsre_mend_train.json, and zsre_mend_train_10000.json.

The zsre_mend_train.json file is notably large(82MB). Given this, it is probable that the training process should utilize the zsre_mend_train_10000.json file instead. This smaller dataset likely corresponds to the training split size of 10,000 instances that you previously mentioned here: https://github.com/zjunlp/EasyEdit?tab=readme-ov-file#dataset

But when I try to train on the zsre_mend_train_10000.json, the training process seems never satisfy the early exit demand. So I'll change the setting and try again.

@StarLooo
Copy link
Author

  1. Well, in my exploration experiments, setting mom2_adjustment=true does not make a significant difference. By the way, are the ROME results reported by you conducted with mom2_adjustment=true or mom2_adjustment=false?
  2. Do you mean the layer used for FT is just heuristically selected. Intuitively, I think 21th layer is a feasible position, but 5th layer for ROME seems to be a little low. I'll check the influence of layer, but it's not a crucial topic.

After I change the layer from 5 to 21, with mom2_adjustment=false, I observed that the Edit Succ and Portability have a slight decrease, and the Locality increases by a large margin.

@StarLooo StarLooo changed the title Some questions about the implemention, particularly related to ROME Some questions about the implemention, particularly related to ROME/MEND/SERAC Nov 13, 2024
@StarLooo StarLooo changed the title Some questions about the implemention, particularly related to ROME/MEND/SERAC Some questions about the implementions and results, particularly related to ROME/MEND/SERAC Nov 13, 2024
@StarLooo
Copy link
Author

You just need to first train the MEND and SERAC module and set them in the hparam files. https://github.com/zjunlp/EasyEdit/blob/main/hparams/MEND/llama-7b.yaml#L3 https://github.com/zjunlp/EasyEdit/blob/main/hparams/SERAC/llama-7b.yaml#L3

For training, refer to https://github.com/zjunlp/EasyEdit?tab=readme-ov-file#trainer

I have tried to firstly train MEDN on zsre_mend_train.json with the default hparams (I just casually pick a ckpt after training step 50000 since the early stop condition seems will not be satisfied until the max training steps 100000, but I don't know why, this is another question), and then use the trained model to conduct single edit on Wikidata_recent, here is my result:

Edit_Succ: 96.36
Overall_portability: 62.23
Overall_locality: 68.03
Fluency: 564.16

The result seems to be rational but somewhat higher than the reported values.
I'll also try SERAC latter.

@littlefive5
Copy link
Collaborator

Sorry for being late as I was in EMNLP last week.
We trained the model with zsre_mend_train_10000.json. I'm not sure what happened here for the early stop condition. We just selected the last checkpoint.

@zxlzr
Copy link
Contributor

zxlzr commented Nov 17, 2024

Hi, have you solved your issue yet?

@StarLooo
Copy link
Author

Sorry for being late as I was in EMNLP last week. We trained the model with zsre_mend_train_10000.json. I'm not sure what happened here for the early stop condition. We just selected the last checkpoint.

Thanks for your reply. I also have another question, when I try a similar way to use SERAC, I met some run time errors:
The main reason I think is that in the apply_to_model (in serac_main.py), the self.alg.edit returns a wrapped SERAC class instead of a raw model class, and will cause error in the generate_fast :

            model_out = model(
                input_ids=input_ids[:, cur_context],
                attention_mask=None if 'llama' in model.name_or_path.lower() or 'baichuan' in model.name_or_path.lower()
                else attention_mask[:, cur_context],
                past_key_values=past_key_values,
                use_cache=True,
            )

here, the model is a SERAC class and has no name_or_path.
Additionally, I noticed some noteworthy values in the SERAC training logs that appear to be unusually low:

loss/loc_val        :  0.00245
edit/acc_val        :  0.29113
edit/log_prob_val   : -5.94367
edit/prob_val       :  0.25925
acc/pre_val         :  0.01336
acc/post_val        :  0.01336
nll/pre_val         :  13.23586
perplexity/pre_val  :  560094.25000
nll/post_val        :  11.91173
perplexity/post_val :  149003.45312
n_tokens/pre_val    :  7.30417
n_tokens/post_val   :  7.30417
time/edit_val       :  0.00480
loss/total_val      :  0.59681
loss/total_edit_val :  0.59681
memory/alloc_max_val:  32896796160.00000
memory/res_max_val  :  33632026624.00000
eval_time/elapsed   :  4866.71193
eval_time/average   :  0.25602

@StarLooo
Copy link
Author

Hi, have you solved your issue yet?

not yet, I still have some difficulty running the SERAC.

@tbozhong
Copy link
Collaborator

  1. The error with generate_fast is due to class SERAC only implementing the generate method and not being compatible with generate_fast. Please set vanilla_generation=True when evaluating SERAC with generate_fast, and it should work correctly. We will adapt SERAC to generate_fast as soon as possible.

  2. I re-cloned EasyEdit and ran it, but I was unable to reproduce the performance you demonstrated. We are reproducing the main table of the survey again, and we will notify you immediately if there are any updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants