Add EntryPoints compatible with the (currently in alpha development) PyTerrier Artifacts API #659

mam10eks · 2024-08-29T09:56:51Z

The goal here would be to be compatible with this potentially upcoming pull request for PyTerrier: terrier-org/pyterrier#436

#659

mam10eks · 2024-08-29T11:07:52Z

I.e., see a first test here: https://github.com/mam10eks/pyterrier-alpha/blob/main/tests/test_artifacts_from_tira.py

mam10eks · 2024-08-29T12:20:30Z

If we have a pyterrier artifact tira://clueweb12/touche-2020-task-2/fschlatt/sparse-cross-encoder-4-512 resolved via the name: we point this to https://data.tira.io/clueweb12/touche-2020-task-2/fschlatt/sparse-cross-encoder-4-512 etc.

mam10eks · 2024-08-29T22:09:05Z

@TheMrSheldon, @potthast: One thing that just came to my mind:

when we implement things like this (pt being pyterrier):

pt.io.read_results('tira:clueweb12/touche-2020-task-2/<team>/<approach>)
pt.io.read_results('tira:clueweb12/touche-2020-task-2/<team>)
pt.io.read_results('tira:clueweb12/touche-2020-task-2/)

we could render the corresponding pages, i.e.,

https://data.tira.io/clueweb12/touche-2020-task-2/<team>/<approach>
https://data.tira.io/clueweb12/touche-2020-task-2/<team>
https://data.tira.io/clueweb12/touche-2020-task-2/

where all the endpoints directly allow to browse, i.e., https://data.tira.io/clueweb12/touche-2020-task-2/ would show all teams with all approaches, etc. This would be especially helpful in the case where something does not exist, as the error message shown to users could show the next fallback. I.e., if pt.io.read_results('tira:clueweb12/touche-2020-task-2/) does not exist, the error message could point users to the next higher level for browsing, i.e., https://data.tira.io/clueweb12/.

I think this could be a first use-case for the new V1 REST API. I.e., I think it would make sense to move the metadata on which approaches have been archived to Zenodo to the tira database (currently I had this in the code of the python client to remove dependencies to the live system, but data.tira.io will be statically hosted, so this should be no problem). And when we have this in the tira database, we can make the endpoint on accessing what is archived where public (as it is public anyway) and traverse this endpoint during the build of the static https://data.tira.io.

What do you think?

This would also allow things like pt.io.read_results('tira:clueweb12/touche-2020-task-2/<team>/<approach>', verbose=True) to point to https://data.tira.io, which would be very cool, e.g., you show this to someone, where this tira:... is a bit of a magic string, but as soon as someone wants to have more knowledge on this, we add this verbose=True flag and have a very good explainability by default. especially, because we would have the things like "what is in this artifact", etc. by default, as we already store all the metadata on what is contained in a run (I mean our browser that shows "your run output contains files x, y, z").

I think this would combine very well.

mam10eks · 2024-08-30T00:36:44Z

On that note: if we use data.tira.io also to integrate visualizations (which I would be a big fan of), e.g., via DiffIR, we should use ChatNoir as links for the full texts. E.g., https://chatnoir-webcontent.web.webis.de/?index=cw22&uuid=CWLafZMrWbCnXKvqA7IKZg

I think this would be a good idea because we already have random-document access in ChatNoir, especially for large corpora like the ClueWebs, hence it would allow us to reduce the size of the static part that we host in data.tira.io and we do not have to maintain it twice.

potthast · 2024-08-30T06:47:37Z

I would like this URL to be more like https://static.chatnoir.eu/?index=cw22&uuid=CWLafZMrWbCnXKvqA7IKZg or similar.

mam10eks · 2024-08-30T06:51:11Z

Indeed, we can change the Url under which ChatNoir provides the random document access, for the proof of concept, we could stick with the existing URL for the moment I think.

mam10eks · 2024-08-30T07:08:25Z

This is highly related to this issue: #594

mam10eks added the enhancement New feature or request label Aug 29, 2024

mam10eks self-assigned this Aug 29, 2024

mam10eks added a commit that referenced this issue Aug 29, 2024

rough prototype that would be compatible with alpha pyterrier artifacts

1a5fe8d

#659

mam10eks added a commit that referenced this issue Sep 24, 2024

prepare command for export for data.tira.io #659

fef5a28

mam10eks linked a pull request Dec 5, 2024 that will close this issue

Introduce archive.tira.io to prepare for usage in pyterrier-artifacts #671

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EntryPoints compatible with the (currently in alpha development) PyTerrier Artifacts API #659

Add EntryPoints compatible with the (currently in alpha development) PyTerrier Artifacts API #659

mam10eks commented Aug 29, 2024

mam10eks commented Aug 29, 2024

mam10eks commented Aug 29, 2024

mam10eks commented Aug 29, 2024 •

edited

Loading

mam10eks commented Aug 30, 2024

potthast commented Aug 30, 2024

mam10eks commented Aug 30, 2024

mam10eks commented Aug 30, 2024

Add EntryPoints compatible with the (currently in alpha development) PyTerrier Artifacts API #659

Add EntryPoints compatible with the (currently in alpha development) PyTerrier Artifacts API #659

Comments

mam10eks commented Aug 29, 2024

mam10eks commented Aug 29, 2024

mam10eks commented Aug 29, 2024

mam10eks commented Aug 29, 2024 • edited Loading

mam10eks commented Aug 30, 2024

potthast commented Aug 30, 2024

mam10eks commented Aug 30, 2024

mam10eks commented Aug 30, 2024

mam10eks commented Aug 29, 2024 •

edited

Loading