Reproducing SERAC and MEND results #447

shariqahn · 2024-12-13T16:50:24Z

I understand from #442 that you provided the checkpoints for SERAC and MEND trained on the CounterFact dataset, but I am not seeing CounterFact results for Llama in any of your papers. Do you happen to have those results or the checkpoints trained on ZsRE so I can ensure I have reproduced your solution? I am getting the following results for ZsRE with the provided SERAC checkpoint:

Metrics Summary:  {'pre': {'rewrite_acc': 0.40287348593761324, 'rephrase_acc': 0.39428592943539903, 'portability': {'one_hop_acc': 0.566645032103213}}, 'post': {'rewrite_acc': 0.9630600327562718, 'rephrase_acc': 0.6961480793930168, 'locality': {'neighborhood_acc': 0.9986392371156113}, 'portability': {'one_hop_acc': 0.5736743852938706}}}

The rephrase_acc is low.

The text was updated successfully, but these errors were encountered:

XeeKee · 2024-12-17T16:59:22Z

Previously, some users did not have the necessary resource to complete the training of SERAC and MEND on CounterFact.
As a result, I found a ckpt I had previously used on my local server and uploaded it to Google Drive.
To make it easier for other users, we included the link in the README.
Please note that we did not specify that this checkpoint is intended for reproducing the results in the paper.

zxlzr · 2024-12-18T00:25:42Z

Sorry, due to limited computing and storage resources, we will run it as soon as possible and upload the complete checkpoint to help you. Thank you for using EasyEdit!

shariqahn · 2024-12-21T20:07:59Z

I actually did manage to run SERAC training on ZsRE, but only with data/zsre/zsre_mend_train_10000.json rather than the entire ZsRE training set.

My portability results are different from what was reported in the README - 'portability': {'one_hop_acc': 0.3953153117763729} as opposed to 57.82 that was listed here. I see that similar results were reported here that you discuss match expectations for a smaller model, but I do not understand why that does not match the reported values.

I did have a similar issue to #123 and switched over to the https://huggingface.co/Cheng98/llama-160m model as you suggest, but I would not expect this change to lead to such different final values.

zxlzr · 2024-12-22T02:50:48Z

Sorry, we will try to handle this ASAP. Recently, computing resources have been extremely tight, making it very difficult to have machines available for debugging.

zxlzr added the question Further information is requested label Dec 14, 2024

shariqahn changed the title ~~Reproducing SERAC results~~ Reproducing SERAC and MEND results Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing SERAC and MEND results #447

Reproducing SERAC and MEND results #447

shariqahn commented Dec 13, 2024 •

edited

Loading

XeeKee commented Dec 17, 2024

zxlzr commented Dec 18, 2024

shariqahn commented Dec 21, 2024 •

edited

Loading

zxlzr commented Dec 22, 2024

Reproducing SERAC and MEND results #447

Reproducing SERAC and MEND results #447

Comments

shariqahn commented Dec 13, 2024 • edited Loading

XeeKee commented Dec 17, 2024

zxlzr commented Dec 18, 2024

shariqahn commented Dec 21, 2024 • edited Loading

zxlzr commented Dec 22, 2024

shariqahn commented Dec 13, 2024 •

edited

Loading

shariqahn commented Dec 21, 2024 •

edited

Loading