Model Evaluations - Consistency and Potential Requirement #135461

Infinitay · 2024-08-11T23:36:15Z

Infinitay
Aug 11, 2024

Select Topic Area

Product Feedback

Body

Between all these other platforms hosting machine learning models such as HuggingFace or websites that aggregates and lists research such as PapersWithCode, they all fail to strike the need for model evaluations. There are lots of inconsistencies ranging from models in the same domain not having common benchmark metrics, not updating their benchmarks, cherry-picking benchmarks, or completely avoiding evaluating their model entirely. Seeing as GitHub is now entering the space of hosting machine learning models, I would appreciate it if you all could pioneer an emphasis on accuracy with regards to evaluations. Even now when looking at some of the existing models already listed on the GitHub models page, some are missing evaluations - and even from companies with billions in valuations.

I want to make a request from the team at GitHub to at the least offer an incentive to researchers and companies sharing these models to evaluate their models. A simple suggestion would be to tag models as verified should they have accurate evaluations. Furthermore, it would be nice to force companies who wish to verify their models and benchmarks to include a benchmark for at least one common benchmark (and dataset) that will be a shared benchmark requirement across the the task domain of the model.

I feel like currently everyone is publishing models and no platform hosting the models focuses on consistency and transparency. Models are missing evaluations or other models in the same domain lack shared benchmarks on similar datasets. Sometimes you have to dive deep into the published papers for the given model to find benchmarks, but even then you would have to find a common benchmark buried somewhere in a paper. Furthermore, you have to hope that the benchmark(s) you find for model A have at least one benchmark in common with model B so that you can compare the two.

It seems like hard work, but I think it would be beneficial for GitHub, researchers, and the general public. For GitHub, as much as I hate to suggest this, you could pay-wall the detailed evaluations or abilities to compare models. Additionally, you can make it so that your Azure platform is able to select the SOTA models automatically as long as you supply the use-case methods making pushing to production faster by bypassing the research stage for developers on finding the best model (inference speed vs accuracy) for their task. You could even create your own Newsletter for GitHub Pro members to automatically be notified of new SOTA models. For researchers, this would be good because all the data is aggregated onto one platform here on GitHub. Researchers will be able to find accurate and consistent (benchmark method/dataset) evaluations across a machine learning task and various models that accomplishes the respective task. The general public and developers would also benefit from the same thing.

It's hard work no doubt because not only do you have to consider how to implement this, but you would also have to go with the expectations companies and researchers will follow suit and benchmark their models on that one common benchmark.

2024-08-11T23:36:39Z

github-actions[bot]
bot Aug 11, 2024

💬 Your Product Feedback Has Been Submitted 🎉

Thank you for taking the time to share your insights with us! Your feedback is invaluable as we build a better GitHub experience for all our users.

Here's what you can expect moving forward ⏩

Your input will be carefully reviewed and cataloged by members of our product teams.
- Due to the high volume of submissions, we may not always be able to provide individual responses.
- Rest assured, your feedback will help chart our course for product improvements.
Other users may engage with your post, sharing their own perspectives or experiences.
GitHub staff may reach out for further clarification or insight.
- We may 'Answer' your discussion if there is a current solution, workaround, or roadmap/changelog post related to the feedback.

Where to look to see what's shipping 👀

Read the Changelog for real-time updates on the latest GitHub features, enhancements, and calls for feedback.
Explore our Product Roadmap, which details upcoming major releases and initiatives.

What you can do in the meantime 💻

Upvote and comment on other user feedback Discussions that resonate with you.
Add more information at any point! Useful details include: use cases, relevant labels, desired outcomes, and any accompanying screenshots.

As a member of the GitHub community, your participation is essential. While we can't promise that every suggestion will be implemented, we want to emphasize that your feedback is instrumental in guiding our decisions and priorities.

Thank you once again for your contribution to making GitHub even better! We're grateful for your ongoing support and collaboration in shaping the future of our platform. ⭐

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Community

Model Evaluations - Consistency and Potential Requirement #135461

{{title}}

Replies: 1 comment

{{title}}

Select a reply

GitHub Community

Model Evaluations - Consistency and Potential Requirement #135461

Infinitay Aug 11, 2024

Select Topic Area

Body

Replies: 1 comment

github-actions[bot] bot Aug 11, 2024

Infinitay
Aug 11, 2024

github-actions[bot]
bot Aug 11, 2024