-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Domain-specific large model benchmarks: the edge perspective #177
Comments
Hi @MooreZheng, I was looking into the issues, and found this interesting issue!
Selecting Test DatasetsFew proposed datasets in my opinion that we should add:
(*) although I just found the paper for this, but as this was just a superficial search, we can find better things if we search in depth (of course after your approval) I have a question about these datasets:
BenchmarkingEach domain has its own challenges, so we need to modify the benchmarking accordingly We might need to create domain-specific strategies:
Adjust Every domain will have their own constraints, like:
So according to me, we first need to decide the domains we would like to work on and then progress to find relevant datasets and benchmarking techniques. This is what I was able to make out of the issue, Please review it and see if I am thinking in the right direction.. Thank You |
hey @phoeenniixx Welcome! Thank you so much for your attention and information on this issue.
The domain selection is up to LFX mentee candidates to propose. For the KubeEdge mentor side, we would like to see One Selected Domain related to edge computing with high values when an LLM applies. Besides, a brand new dataset is even more appreciated than integrating an existing dataset. |
Hi @MooreZheng , could you tell me the last date for application to this mentorship ? Could you please let me know the selection criteria and selection date for the same ? |
https://docs.linuxfoundation.org/lfx/mentorship/mentorship-program-timelines |
Hey @MooreZheng ! I'm Aryan Prakhar. I’m a third-year B.Tech student at IIT (BHU) Varanasi, and this LLM benchmarking project immediately caught my attention. It resonated with my experience in benchmarking and evaluation, and I loved the idea of proposing a domain and even building a dataset from scratch. I'd love to highlight how I can contribute to the community. After getting clarity about project requirements in today's international meeting, I went on to research an appropriate domain to build an LLM benchmark for maximum immediate utility on the edge computing front. What I’ve Been Up ToLLMs & Benchmarking
Skills & Tools
Why I’m Excited About This ProjectLLM benchmarking is still an evolving space, and every benchmark project requires deep industry understanding. That side learning keeps me engaged. Plus, this project’s practical impact—helping Edge AI developers select the right models—makes it even more exciting. Domains for Edge AI BenchmarksHere are the domains which should be the pick. Reason? I feel these are the domains where most of the LLM-Edge AI application would happen. Thus, providing benchmarks in this area would allow better decisions in these areas. 1. Autonomous Vehicles (Top Pick)LLMs + Edge AI can improve autonomous driving:
Why It’s Important: Autonomous driving is a fast-growing field for Edge AI. A benchmark here would have immediate real-world value. 📌 Key Challenges to Test:
2. LogisticsI saw firsthand how tough route planning can be during a Himalayan trip where landslides disrupted navigation. What Edge AI Can Do:
📌 Why I’d Be a Good Fit: 3. EnergyWhy Edge AI is Useful:
Would love to hear your thoughts! Do any of these domains align with what you're looking for? Open to feedback and excited to contribute! |
Thanks for the reply @MooreZheng .
So can we assume that 20th Feb is the last date before you start reviewing the applications ? |
hey @AryanPrakhar Welcome! Thank you so much for your attention and thoughts on this issue. As discussed, a domain that fits edge computing with high market value and brand new datasets is particularly welcome. The domain selection is up to LFX mentee candidates to propose. More studies are also welcome. |
That date looks good to me. The current expectations are as discussed above in this issue. Also considering launching a pre-test around 20th Feb, depending on the number of candidates applied for this issue. If so, the pretest will be announced in this issue and candidates might want to keep an eye on it. @ggold7046 |
Hi @MooreZheng |
Hi @MooreZheng, I'm excited about this project! I would like to propose adding a new domain for evaluation: Camera-Reidentification Surveillance System. Given the growing need for real-time, edge-based surveillance solutions—especially with challenges like occlusion, varying lighting, and diverse environmental conditions—I believe this domain offers a rich testbed for domain-specific large models. In this scenario, we could:
I'm enthusiastic about collaborating on this and can contribute using my experience with PyTorch, KubeEdge-Ianvs, and relevant models. Looking forward to discussing further how we can integrate this domain into the benchmark framework. Thank You! |
Pre-testDomain-specific large model benchmarks: the edge perspective (2025 Term 1) Brief introductionThank you all for your attention to this issue! Those who wish to apply for the LFX mentorship for this project may try out this pre-test. The pre-test result will be applied to help to better select the final mentee. TasksThe pre-test mainly contains two tasks:
Submit methodAfter completing these tasks, the work should be submitted by the following.
We will publish all received report links under this issue after the submission deadline. Rating
Task 1 Test dataset
Task 2 Research report
Pre-test deadlineAccording to the official schedule of the LFX Mentorship, candidates need to complete registration and project applications between February 5 and February 18. The mentors will confirm candidates between February 19 and February 25. To ensure sufficient time for review, please complete this pre-test and send the report email by February 23, 2025 11 AM (PST). |
Welcome @Abioye-Bolaji and @AtalGupta ! Thank you for your attention and you might want to take a look at the pre-test. |
Hello @MooreZheng I have submitted the tasks and report to the email mentioned in the tasks. Hope you received it |
Thanks for your participation in the pretest of the LFX mentorship project: ”CNCF - KubeEdge: Domain-specific large model benchmarks: the edge perspective (2025 Term 1)”, We received 48 applications. The following 7 outstanding candidates have done a great job of completing the research report, ranking Top 14.58% out of all 48 candidates! As promised in the pretest, after the submission deadline, we hereby publish all received pretest report links: Robin Chen: https://docs.google.com/document/d/1UpCy70VnbvvOKCiwyluLm9SimeLzPrDgfzVIPennyug/edit?usp=sharing We also would like to take this opportunity to acknowledge your outstanding performance in the pre-test and invite all the above candidates to join KubeEdge SIG AI. There will be more events coming, and we look forward to seeing your contribution in the future~ |
What would you like to be added/modified:
The issue aims to build an advanced benchmark for edge-oriented domain-specific large models on KubeEdge-Ianvs. It aims to help all Edge AI application developers validate and select the best-matched domain-specific large models. For Edge AI service providers, it also helps identify which scenarios, edge nodes, or even locations could have the best performance or improvement for their models. This issue includes:
Why is this needed:
Common large-model benchmarks in the industry tend to focus on the cloud. As the era of scaled applications comes for large models, the cloud has already provided infrastructure and services for these large models. Relevant customers have further proposed targeted application requirements on the edge side, including personalization, data compliance, and real-time capabilities, making AI services with cloud-edge collaboration a major trend. Different institutions from edges often build their own large models or knowledge bases. However, benchmark tests for domain-specific large models with edge data have not yet been well developed. Due to the data distribution at different edges, it is expected that the performance of general large models could be varied significantly at edges. This work aims to pinpoint those performance fluctuations for Edge AI services and applications.
Recommended Skills:
KubeEdge-Ianvs, Python, LLMs
Useful links:
Introduction to Ianvs
Quick Start
How to test algorithms with Ianvs
Testing incremental learning in industrial defect detection
Benchmarking for embodied AI
KubeEdge-Ianvs
Example LLMs Benchmark List
Ianvs v0.1 documentation
=====
Those who wish to apply for the LFX mentorship for this project might to take a look at the pre-test.
The text was updated successfully, but these errors were encountered: