You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I repeated the results in Table 2 in this paper of WISE, in which ROME and MEMIT encountered several problems when editing the sequence:
I found that when T=1, the result in ROME's accuracy table is 0.85. However, in my own experiment, the result of rewrite_acc is 1 every time. Is this correct? I found that the accuracy of single editing is 100%. Why is the accuracy of T=1 in WISE paper so low?"
I use the following code to summary_metrics the evaluation results of each edit in sequential_edit. my code:
def sequential_edit_summary_metrics(all_metrics):
if isinstance(all_metrics, dict):
all_metrics = [all_metrics, ]
logs_dir = './logs'
if not os.path.exists(logs_dir):
os.makedirs(logs_dir)
output_file = os.path.join(logs_dir, 'results.json')
with open(output_file, 'w', encoding="utf-8") as f:
json.dump(all_metrics, f, ensure_ascii=False, indent=4)
mean_metrics = dict()
for eval in ["pre", "post"]:
mean_metrics[eval] = dict()
for key in ["rewrite_acc", "rephrase_acc", 'rewrite_ppl']:
if key in all_metrics[0][eval].keys():
mean_metrics[eval][key] = np.mean([metric[eval][key] for metric in all_metrics])
for key in ["locality", "portability"]:
if key in all_metrics[0][eval].keys() and all_metrics[0][eval][key] != {}:
mean_metrics[eval][key] = dict()
for lkey in get_all_acc_keys(all_metrics):
metrics = [np.mean(metric[eval][key][lkey]) for metric in all_metrics if lkey in metric[eval][key].keys()]
if len(metrics) > 0:
mean_metrics[eval][key][lkey] = np.mean(metrics)
# mean_metrics[eval][key][lkey] = np.mean(
# [metric[eval][key][lkey] for metric in all_metrics])
# mean_metrics["time"] = np.mean([metric["time"] for metric in all_metrics])
print("Metrics Summary: ", mean_metrics)
return mean_metrics
But I found that I missed the metrics fluency. Should I just add fluency to for key in ["locality", "portability"]?
I found the result of editing Rome _ zsre _ llama-2-7b-HF _ sequential _ edit = true, in which the result of locality is 0.25, which is very different from the 0.75 shown in table 2 of WISE. I wonder if I have any misunderstanding? What about those omissions? In addition, how to calculate the result of portability with multiple values? :
my mean metrics result by upon code :
What I understand is that Metrics Rel. (A.K.A edit success rate [10]) corresponds to the result of rewrite_acc.
Loc. (localization success rate [55]) corresponds to the average of all the numbers in locality.
Gen. (generalization success rate [55]), how should I calculate it?
I edited the ZSRE dataset by run_knowedit_llama2.py, and then evaluated it by edit_evaluation, but I didn't get the corresponding evaluation index. What should I do?
5.In addition, I want to know if there is any place where I can get the evaluation result num={1,10,100,500,1000} edited by ROME, MEMIT and WISE in sequence on ZSRE and WikiData_counterfact data sets on LLMA2-7B-HF model.
Thank you and look forward to your advice.
The text was updated successfully, but these errors were encountered:
I repeated the results in Table 2 in this paper of WISE, in which ROME and MEMIT encountered several problems when editing the sequence:
mean result as follows:
But I found that I missed the metrics fluency. Should I just add fluency to
for key in ["locality", "portability"]
?Rome _ zsre _ llama-2-7b-HF _ sequential _ edit = true
, in which the result of locality is 0.25, which is very different from the 0.75 shown in table 2 of WISE. I wonder if I have any misunderstanding? What about those omissions? In addition, how to calculate the result of portability with multiple values? :my mean metrics result by upon code :
Loc. (localization success rate [55]) corresponds to the average of all the numbers in locality.
Gen. (generalization success rate [55]), how should I calculate it?
I edited the ZSRE dataset by run_knowedit_llama2.py, and then evaluated it by edit_evaluation, but I didn't get the corresponding evaluation index. What should I do?
5.In addition, I want to know if there is any place where I can get the evaluation result num={1,10,100,500,1000} edited by ROME, MEMIT and WISE in sequence on ZSRE and WikiData_counterfact data sets on LLMA2-7B-HF model.
Thank you and look forward to your advice.
The text was updated successfully, but these errors were encountered: