Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent POPE expected number of examples #15

Open
iancovert opened this issue Oct 9, 2024 · 0 comments
Open

Inconsistent POPE expected number of examples #15

iancovert opened this issue Oct 9, 2024 · 0 comments

Comments

@iancovert
Copy link

iancovert commented Oct 9, 2024

Thank you for your work, this package has been very helpful! However, I noticed an issue with the expected number of examples for POPE when using the full version:

  • As you can see here in the dataset configurations file, pope-full expects 9000 total examples, including 3000 each from the adversarial/popular/random splits.
  • However, the dataset registry's download URL links to a file with 2910 examples for POPE's random split, not 3000. When we download this file, we encounter an inconsistency at the scoring step when checking for the expected number of examples.
  • The official POPE repo's file for the random split actually does contain 3000 examples. They had a commit from October 2023 that seems to have updated from 2910 -> 3000 examples (here), so maybe that's how this inconsistency came up. I tried overriding the dataset registry URL to download this file instead, but then encountered an issue elsewhere in your repo: the POPE eval harness (correctly?) expects just 2910 examples for the random split, which is hardcoded here.

It seems like the best solution would be to use the most up-to-date POPE dataset, and then update this line from the eval harness to reflect the correct number of examples. I can make a PR if that sounds right, but I thought I'd run it by you first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant