Replies: 1 comment
-
💬 Your Product Feedback Has Been Submitted 🎉 Thank you for taking the time to share your insights with us! Your feedback is invaluable as we build a better GitHub experience for all our users. Here's what you can expect moving forward ⏩
Where to look to see what's shipping 👀
What you can do in the meantime 💻
As a member of the GitHub community, your participation is essential. While we can't promise that every suggestion will be implemented, we want to emphasize that your feedback is instrumental in guiding our decisions and priorities. Thank you once again for your contribution to making GitHub even better! We're grateful for your ongoing support and collaboration in shaping the future of our platform. ⭐ |
Beta Was this translation helpful? Give feedback.
-
Select Topic Area
Product Feedback
Body
Between all these other platforms hosting machine learning models such as HuggingFace or websites that aggregates and lists research such as PapersWithCode, they all fail to strike the need for model evaluations. There are lots of inconsistencies ranging from models in the same domain not having common benchmark metrics, not updating their benchmarks, cherry-picking benchmarks, or completely avoiding evaluating their model entirely. Seeing as GitHub is now entering the space of hosting machine learning models, I would appreciate it if you all could pioneer an emphasis on accuracy with regards to evaluations. Even now when looking at some of the existing models already listed on the GitHub models page, some are missing evaluations - and even from companies with billions in valuations.
I want to make a request from the team at GitHub to at the least offer an incentive to researchers and companies sharing these models to evaluate their models. A simple suggestion would be to tag models as verified should they have accurate evaluations. Furthermore, it would be nice to force companies who wish to verify their models and benchmarks to include a benchmark for at least one common benchmark (and dataset) that will be a shared benchmark requirement across the the task domain of the model.
I feel like currently everyone is publishing models and no platform hosting the models focuses on consistency and transparency. Models are missing evaluations or other models in the same domain lack shared benchmarks on similar datasets. Sometimes you have to dive deep into the published papers for the given model to find benchmarks, but even then you would have to find a common benchmark buried somewhere in a paper. Furthermore, you have to hope that the benchmark(s) you find for model A have at least one benchmark in common with model B so that you can compare the two.
It seems like hard work, but I think it would be beneficial for GitHub, researchers, and the general public. For GitHub, as much as I hate to suggest this, you could pay-wall the detailed evaluations or abilities to compare models. Additionally, you can make it so that your Azure platform is able to select the SOTA models automatically as long as you supply the use-case methods making pushing to production faster by bypassing the research stage for developers on finding the best model (inference speed vs accuracy) for their task. You could even create your own Newsletter for GitHub Pro members to automatically be notified of new SOTA models. For researchers, this would be good because all the data is aggregated onto one platform here on GitHub. Researchers will be able to find accurate and consistent (benchmark method/dataset) evaluations across a machine learning task and various models that accomplishes the respective task. The general public and developers would also benefit from the same thing.
It's hard work no doubt because not only do you have to consider how to implement this, but you would also have to go with the expectations companies and researchers will follow suit and benchmark their models on that one common benchmark.
Beta Was this translation helpful? Give feedback.
All reactions