WISE result #493

LiuJinzhe-Keepgoing · 2025-03-20T03:11:34Z

I repeated the results in Table 2 in this paper of WISE, in which ROME and MEMIT encountered several problems when editing the sequence:

I found that when T=1, the result in ROME's accuracy table is 0.85. However, in my own experiment, the result of rewrite_acc is 1 every time. Is this correct? I found that the accuracy of single editing is 100%. Why is the accuracy of T=1 in WISE paper so low?"

        "post": {
            "rewrite_acc": [
                1.0
            ],
            "locality": {
                "Relation_Specificity_acc": [
                    0.0,
                    0.0
                ]
            },
            "portability": {
                "reasoning_acc": [
                    0.2
                ]
            },
            "fluency": {
                "ngram_entropy": 5.174247202938084
            }
        }

I use the following code to summary_metrics the evaluation results of each edit in sequential_edit. my code：

def sequential_edit_summary_metrics(all_metrics):
    if isinstance(all_metrics, dict):
        all_metrics = [all_metrics, ]
    logs_dir = './logs'
    if not os.path.exists(logs_dir):
        os.makedirs(logs_dir)
    output_file = os.path.join(logs_dir, 'results.json')
    with open(output_file, 'w', encoding="utf-8") as f:
        json.dump(all_metrics, f, ensure_ascii=False, indent=4)

    mean_metrics = dict()
    for eval in ["pre", "post"]:
        mean_metrics[eval] = dict()
        for key in ["rewrite_acc", "rephrase_acc", 'rewrite_ppl']:
            if key in all_metrics[0][eval].keys():
                mean_metrics[eval][key] = np.mean([metric[eval][key] for metric in all_metrics])
        for key in ["locality", "portability"]:
            if key in all_metrics[0][eval].keys() and all_metrics[0][eval][key] != {}:
                mean_metrics[eval][key] = dict()
                for lkey in get_all_acc_keys(all_metrics):
                    metrics = [np.mean(metric[eval][key][lkey]) for metric in all_metrics if lkey in metric[eval][key].keys()]
                    if len(metrics) > 0:
                        mean_metrics[eval][key][lkey] = np.mean(metrics)
                    # mean_metrics[eval][key][lkey] = np.mean(
                    #     [metric[eval][key][lkey] for metric in all_metrics])
    # mean_metrics["time"] = np.mean([metric["time"] for metric in all_metrics])
    print("Metrics Summary: ", mean_metrics)

    return mean_metrics

mean result as follows:

     "post": {
        "rewrite_acc": 1.0,
        "locality": {
            "Relation_Specificity_acc": 0.0
        },
        "portability": {
            "reasoning_acc": 0.2
        }
    }

But I found that I missed the metrics fluency. Should I just add fluency to for key in ["locality", "portability"]?

I found the result of editing Rome _ zsre _ llama-2-7b-HF _ sequential _ edit = true, in which the result of locality is 0.25, which is very different from the 0.75 shown in table 2 of WISE. I wonder if I have any misunderstanding? What about those omissions? In addition, how to calculate the result of portability with multiple values? ：
my mean metrics result by upon code ：

      "post": {
        "rewrite_acc": 0.9888888888888889,
        "locality": {
            "Relation_Specificity_acc": 0.25625
        },
        "portability": {
            "reasoning_acc": 0.51,
            "Subject_Aliasing_acc": 0.3333333333333333,
            "Logical_Generalization_acc": 0.48809523809523814
        }
    }

What I understand is that Metrics Rel. (A.K.A edit success rate [10]) corresponds to the result of rewrite_acc.
Loc. (localization success rate [55]) corresponds to the average of all the numbers in locality.
Gen. (generalization success rate [55]), how should I calculate it?
I edited the ZSRE dataset by run_knowedit_llama2.py, and then evaluated it by edit_evaluation, but I didn't get the corresponding evaluation index. What should I do?

5.In addition, I want to know if there is any place where I can get the evaluation result num={1,10,100,500,1000} edited by ROME, MEMIT and WISE in sequence on ZSRE and WikiData_counterfact data sets on LLMA2-7B-HF model.

Thank you and look forward to your advice.

The text was updated successfully, but these errors were encountered:

zxlzr · 2025-03-20T03:13:52Z

Thank you for your interest in EasyEdit and WISE. We will arrange for a team member to respond to your questions as soon as possible.

zxlzr added the question Further information is requested label Mar 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WISE result #493

WISE result #493

LiuJinzhe-Keepgoing commented Mar 20, 2025 •

edited

Loading

zxlzr commented Mar 20, 2025

WISE result #493

WISE result #493

Comments

LiuJinzhe-Keepgoing commented Mar 20, 2025 • edited Loading

zxlzr commented Mar 20, 2025

LiuJinzhe-Keepgoing commented Mar 20, 2025 •

edited

Loading