Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce load time for the leaderboard #1972

Open
2 tasks
KennethEnevoldsen opened this issue Feb 5, 2025 · 2 comments
Open
2 tasks

Reduce load time for the leaderboard #1972

KennethEnevoldsen opened this issue Feb 5, 2025 · 2 comments
Labels
enhancement New feature or request leaderboard issues related to the leaderboard

Comments

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Feb 5, 2025

So I have at least a few times gone to the leaderboard and seen that it was restarted.

This takes quite a while, but once it is done the leaderboard is reasonably fast.

There are two solutions to this problem:

  • limit restarts (23 people can now rebuild and someone might rebuild the space if they see some oddities not knowing how long it takes)
  • reduce restart time, there are multiple ways to do this
    • speed up result loading (mteb.load_results())
        1. reduce the results that are loaded by deleting unused files. E.g. if model 1 is run with ArxivClustering both with and without a revision, we might choose to keep only the one with a revision
        1. another solution is the speed up the loading of the results.
    • Avoid recomputing the cache:
      • Once the cache is created loading the data is quite quick. However, the cache is recomputed on rebuilt. We could have the cache updated daily instead.

Edit: As mentioned in #1983 the leaderboard also takes up a decent amount of memory on at least Linux machines

@x-tabdeveloping has been the main person working on this. @x-tabdeveloping feel free to edit this issue if there are solutions you would like to add or remove.

@KennethEnevoldsen KennethEnevoldsen added enhancement New feature or request leaderboard issues related to the leaderboard labels Feb 5, 2025
@x-tabdeveloping
Copy link
Collaborator

Build time is actually quite fast, it's the startup that takes a while, sometimes even when a cache is present

@KennethEnevoldsen KennethEnevoldsen changed the title Reduce built time for the leaderboard Reduce start-up time for the leaderboard Feb 5, 2025
@KennethEnevoldsen
Copy link
Contributor Author

rephrased - from our conversation it seems like improving mteb.load_results() is the big thing that is needed.

@KennethEnevoldsen KennethEnevoldsen changed the title Reduce start-up time for the leaderboard Reduce load time for the leaderboard Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request leaderboard issues related to the leaderboard
Projects
None yet
Development

No branches or pull requests

2 participants