Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor data error in PAWS-X #15

Open
jvamvas opened this issue Jan 4, 2022 · 1 comment
Open

Minor data error in PAWS-X #15

jvamvas opened this issue Jan 4, 2022 · 1 comment

Comments

@jvamvas
Copy link

jvamvas commented Jan 4, 2022

Thanks for creating this great resource!

I noticed that a small number of samples in the translated dev and tests contain the placeholder "NS". Here's an example from the dev set.

id label sentence1 sentence2
684 1 The road continues west through 5th Street as Delphos . The road continues as Delphos further west through 5th Street .
684 0 Die Straße verläuft weiter nach Westen über die 5. Straße nach Delphos. NS

Incidentally, the sample counts reported in Table 2 of the PAWS-X paper and on https://github.com/google-research-datasets/paws/tree/master/pawsx#data-format-and-statistics are lower than the actual size of the splits. The difference nearly corresponds to the number of "NS" samples.

So it seems that a cleaning step was planned to be made, but was omitted from the published dataset.

@yuanzh
Copy link
Collaborator

yuanzh commented Jan 4, 2022

Good catch. Thanks for reporting this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants