Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conduct Benchmark Test of Candidate LLMs #20

Open
grayJiaaoLi opened this issue May 7, 2024 · 0 comments
Open

Conduct Benchmark Test of Candidate LLMs #20

grayJiaaoLi opened this issue May 7, 2024 · 0 comments
Labels
SP 05 User Story Label for User Stories

Comments

@grayJiaaoLi
Copy link
Contributor

grayJiaaoLi commented May 7, 2024

User story

  1. As a Software Developer
  2. I want to make quantitative comparisons of candidate LLMs
  3. So that we know how well each model performs relative to the others

Acceptance criteria

  • Incorporating model training and obtained data
  • choose relevant benchmarks for CNCF-focused tasks, for example:
    • Natural Language Questions (ARC or HellaSwag)
    • Explanatory Tasks (MMLU)
  • Log the result of each LLM
  • The benchmark can refer to Eleuther AI Language Model Evaluation Harness

Definition of done (DoD)

  • At least three candidate LLMs are tested
  • The results of tests are logged and organised
  • The results provide quantitative comparisons between different models

DoD general criteria

  • Feature has been fully implemented
  • Feature has been merged into the mainline
  • All acceptance criteria were met
  • Product owner approved features
  • All tests are passing
  • Developers agreed to release
@grayJiaaoLi grayJiaaoLi added the User Story Label for User Stories label May 7, 2024
@grayJiaaoLi grayJiaaoLi changed the title Conduct Benchmark Testing of Candidate LLMs Conduct Benchmark Test of Candidate LLMs May 7, 2024
@grayJiaaoLi grayJiaaoLi changed the title Conduct Benchmark Test of Candidate LLMs Implement Benchmark Test of Candidate LLMs May 7, 2024
@grayJiaaoLi grayJiaaoLi changed the title Implement Benchmark Test of Candidate LLMs Conduct Benchmark Test of Candidate LLMs May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SP 05 User Story Label for User Stories
Projects
Archived in project
Development

No branches or pull requests

1 participant