-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LongEval Retrieval (used at CLEF 2023) #234
Comments
I have started to work on this and have a first prototype locally that uses TrecDocs and TsvQueries, so it should be not much code that is needed here. |
Awesome! Given LongEval's focus on the temporal, I think it should be encoded at a higher level in the dataset ids, e.g.:
Though maybe I'm missing something about how the task is structured? |
Yes, makes perfect sense, I can implement this ticket? (I already have a prototype, it is not much code as LongEval comes in formats already supported in ir_datasets) |
That would be awesome! I love when folks release data in standard formats :-) |
If I may add something, the LongEval collection is subject to a custom license from Qwant (https://lindat.mff.cuni.cz/repository/xmlui/page/Qwant_LongEval_BY-NC-SA_License, this is basically an extension of the CC-BY-NC License) that requires an explicit agreement as well as providing contact information. |
Dear Romain, Thanks for reaching out. The ir-datasets integration would expect that the user manually downloads the data (I already have a prototype implementation that assumes this). Best regards, Maik |
Dataset Information:
The goal would be to integrate the data of LongEval for the task 1 on retrieval.
The information from the official task description:
Links to Resources:
https://clef-longeval.github.io/
Dataset ID(s) & supported entities:
longeval/en/train
: docs, queries, qrelslongeval/en/heldout
: docs, querieslongeval/en/a-short-july
: docs, querieslongeval/en/b-long-september
: docs, querieslongeval/fr/train
: docs, queries, qrelslongeval/fr/heldout
: docs, querieslongeval/fr/a-short-july
: docs, querieslongeval/fr/b-long-september
: docs, queriesChecklist
Mark each task once completed. All should be checked prior to merging a new dataset.
ir_datasets/datasets/[topid].py
)tests/integration/[topid].py
)ir_datasets generate_metadata
command, should appear inir_datasets/etc/metadata.json
)ir_datasets/etc/[topid].yaml
)ir_datasets/etc/downloads.json
).github/workflows/verify_downloads.yml
). Only one needed pertopid
.downloads.json
.Additional comments/concerns/ideas/etc.
The text was updated successfully, but these errors were encountered: