Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About statistics of News14 #1

Open
kimwongyuda opened this issue Jun 1, 2023 · 0 comments
Open

About statistics of News14 #1

kimwongyuda opened this issue Jun 1, 2023 · 0 comments

Comments

@kimwongyuda
Copy link

Thank you for your nice work.

I compute # of documents and # of stories by using dataset from https://github.com/Priberam/news-clustering.

The number of documents is same with that in paper, 16136.
However, the number of stories is different with that in paper, 733 vs 788.
I counted unique values of "cluster" keys from the dataset.
image

Also, since dev set and test set in the dataset are time-independent, if there are same cluster values between dev set and testset, i regard them as distinct values.

But there are just 7 overlapping cluster values.

Could you tell me how you counted the number of stories?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant