Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EntryPoints compatible with the (currently in alpha development) PyTerrier Artifacts API #659

Open
mam10eks opened this issue Aug 29, 2024 · 7 comments · May be fixed by #671
Open

Add EntryPoints compatible with the (currently in alpha development) PyTerrier Artifacts API #659

mam10eks opened this issue Aug 29, 2024 · 7 comments · May be fixed by #671
Assignees
Labels
enhancement New feature or request

Comments

@mam10eks
Copy link
Member

The goal here would be to be compatible with this potentially upcoming pull request for PyTerrier: terrier-org/pyterrier#436

@mam10eks mam10eks added the enhancement New feature or request label Aug 29, 2024
@mam10eks mam10eks self-assigned this Aug 29, 2024
@mam10eks
Copy link
Member Author

@mam10eks
Copy link
Member Author

If we have a pyterrier artifact tira://clueweb12/touche-2020-task-2/fschlatt/sparse-cross-encoder-4-512 resolved via the name: we point this to https://data.tira.io/clueweb12/touche-2020-task-2/fschlatt/sparse-cross-encoder-4-512 etc.

@mam10eks
Copy link
Member Author

mam10eks commented Aug 29, 2024

@TheMrSheldon, @potthast: One thing that just came to my mind:

when we implement things like this (pt being pyterrier):

  • pt.io.read_results('tira:clueweb12/touche-2020-task-2/<team>/<approach>)
  • pt.io.read_results('tira:clueweb12/touche-2020-task-2/<team>)
  • pt.io.read_results('tira:clueweb12/touche-2020-task-2/)

we could render the corresponding pages, i.e.,

  • https://data.tira.io/clueweb12/touche-2020-task-2/<team>/<approach>
  • https://data.tira.io/clueweb12/touche-2020-task-2/<team>
  • https://data.tira.io/clueweb12/touche-2020-task-2/

where all the endpoints directly allow to browse, i.e., https://data.tira.io/clueweb12/touche-2020-task-2/ would show all teams with all approaches, etc. This would be especially helpful in the case where something does not exist, as the error message shown to users could show the next fallback. I.e., if pt.io.read_results('tira:clueweb12/touche-2020-task-2/) does not exist, the error message could point users to the next higher level for browsing, i.e., https://data.tira.io/clueweb12/.

I think this could be a first use-case for the new V1 REST API. I.e., I think it would make sense to move the metadata on which approaches have been archived to Zenodo to the tira database (currently I had this in the code of the python client to remove dependencies to the live system, but data.tira.io will be statically hosted, so this should be no problem). And when we have this in the tira database, we can make the endpoint on accessing what is archived where public (as it is public anyway) and traverse this endpoint during the build of the static https://data.tira.io.

What do you think?

This would also allow things like pt.io.read_results('tira:clueweb12/touche-2020-task-2/<team>/<approach>', verbose=True) to point to https://data.tira.io, which would be very cool, e.g., you show this to someone, where this tira:... is a bit of a magic string, but as soon as someone wants to have more knowledge on this, we add this verbose=True flag and have a very good explainability by default. especially, because we would have the things like "what is in this artifact", etc. by default, as we already store all the metadata on what is contained in a run (I mean our browser that shows "your run output contains files x, y, z").

I think this would combine very well.

@mam10eks
Copy link
Member Author

On that note: if we use data.tira.io also to integrate visualizations (which I would be a big fan of), e.g., via DiffIR, we should use ChatNoir as links for the full texts. E.g., https://chatnoir-webcontent.web.webis.de/?index=cw22&uuid=CWLafZMrWbCnXKvqA7IKZg

I think this would be a good idea because we already have random-document access in ChatNoir, especially for large corpora like the ClueWebs, hence it would allow us to reduce the size of the static part that we host in data.tira.io and we do not have to maintain it twice.

@potthast
Copy link
Member

I would like this URL to be more like https://static.chatnoir.eu/?index=cw22&uuid=CWLafZMrWbCnXKvqA7IKZg or similar.

@mam10eks
Copy link
Member Author

Indeed, we can change the Url under which ChatNoir provides the random document access, for the proof of concept, we could stick with the existing URL for the moment I think.

@mam10eks
Copy link
Member Author

This is highly related to this issue: #594

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants